Tech Support > Computers & Technology > Programming > Linked List confusion
Linked List confusion
Posted by arnuld on May 6th, 2008


PROBLEM:

I want to read words from standard input and then sort and print
the words in decreasing order of appearance along with the count
of how many times each word occurred. The language I am using C. I
have come up with following design idea:


1) Create a singly-linked list and store each word and the count in that
structure. when the user enters a word which is already seen before, then
we will just increase the count. That way we will have a list of unique
words.

2.) Create an array of pointers to the elements of linked list
(which are structure actually). Sort the pointers comparing the count of
each word and print the elements pointed by these pointers. That way
program will be highly efficient as we will not be sorting any structures,
we will be sorting pointers only . (originally K&R2's idea but I love
this idea)

I want this program to be fast and consume less memory. I have 2
questions:

i.) a double-linked list will be better than this ?
ii.) any other data structure that I can use here ?




--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page

Posted by arnuld on May 6th, 2008



from last 2 hours I am searching comp.lang.c and comp.programming archives
on Google Groups and a few secods before I read your reply I came across
this reply from "Roger Willcocks" to someone who asked the question on
how to reverse the single-linked list:


My take on this is that if you find you need to search a linked
list from its tail, you've got your design wrong. The list
should be chained the other way, or it should be a doubly
linked list. Get the design right and the question just goes away.

One other point - scanning a linked list is inherently
inefficient; if speed is a factor some other data structure - a
(balanced) tree, or hash table for instance - would be
favourite.



original thread:

http://groups.google.com/group/comp.... 1e1d915e179aa



why I gave you this link ?

because banging my head with the function "add_to_snode" I came to know
that it was comparing the word from last node only with the present word
but not with the the words before last node and that is why program was
giving the count of 1 for every word, unless you you enter a word in
succession , say 3 times, but still then because of the array I had, it
will print the word 3 times with count 3 on every line. So I thought I
needed to search the linked list in reverse order in order to see if the
current word was already on the list or not and then I hit that thread.

It only happened when I took a piece of paper and ran the program with my
pen . only then I came to know why the doubly-linked list example from
section 6.5 of K&R2 always returns "p" and never "p->next"

okay, I will use doubly-linked list. Is it fine ?



--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page


Posted by Ben Bacarisse on May 6th, 2008


arnuld <sunrise@see.sigs.invalid> writes:

Fine, but each access involves a linear scan of the list. If you are
learning C (not algorithms and data structures) this would be
perfectly OK. You still need to find a way to print it in order.

There is no point in use both. You may use an array of strings and
then sort that before printing, or use a linked list and sort that.
You don't gain anything from using both! Once you have a linked list
of word counts, sorting that list *could* be done with an array, but
you could also use one of the sort algorithms that work well for lists
(quick sort and merge sort come to mind).

No, but if you think so, say why because it will help to understand
how you think it may help.

Binary search trees and hash tables are the most obvious choices. For
simplicity, you could try to keep a sorted array of string pointers.
This will speed up the accesses when a word is read and will give you
a simple way to print the words at the end.

--
Ben.

Posted by [Jongware] on May 6th, 2008



Ben Bacarisse wrote:
[...]
I wrote exactly that same program, using only a binary tree. Printing
the words was done with a recursive routine -- which is also a pretty
simple way :-)
Despite heavy use of generalized C functions such as strdup, strcmp,
etc., its performance was astounding; a matter of seconds, and most of
this time was input/output. My test file was an ASCII version of the
King James Bible, so now I know how often people used to say "thee" in
those times.

[Jongware]

Posted by thomas.mertes@gmx.at on May 6th, 2008


On 6 Mai, 12:36, arnuld <sunr...@see.sigs.invalid> wrote:
From your description I recognized an example program that I
wrote in Seed7 (sorry I did not use C). The only difference is that
it writes a list of words in increasing order followed by the count.
The wordcnt.sd7 example can be found here:
http://seed7.sourceforge.net/examples/wordcnt.htm

Maybe you can get some inspiration from this example how your
problem can be solved with a hash table.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

Posted by CBFalconer on May 7th, 2008


arnuld wrote:
Take a look at hashlib.zip, especially at the demonstration program
"wdfreq.c". This shows how to apply a hashlist to that type of
application. All in standard C, see:

<http://cbfalconer.home.att.net/download/>

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

** Posted from http://www.teranews.com **

Posted by Gene on May 7th, 2008


On May 6, 6:36*am, arnuld <sunr...@see.sigs.invalid> wrote:
It's a pretty common mistake for beginning programmers to reach for
linked lists because it's usually hard work to understand them. When
you have a hammer in your hand, everything looks like a nail.

But for this problem, you need to do lookups quickly. Lists are poor
for quick lookups. Rather you want a dictionary data structure that
stores [word, count] pairs.

In pseudocode:

set dictionary D empty
for each word W in the input
look up [W, Count] in D
if lookup succeeds, increment Count
else /*lookup failed*/ insert [W,1] in D
end for
Print the pairs in the dictionary in sorted order

For the dictionary, there are a couple of implementation choices. A
balanced tree like AVL or Red-Black trees is a nice choice because you
can print in sorted order merely by traversing in order. No sorting
needed. You might look at libavl. I've never tried it, but it looks
like an extremely high quality C library for balanced trees. Of
course if you are trying to learn, you may want to roll your own.

Another approach is a hash table, which makes the lookups and
insertions faster, but requires a separate sorting pass.

This is a perfect example of a problem that's best solved with a
programming language designed to solve it. Perl is such a language.
Here's a solution in perl.

sub main {
local $/ = undef;
my %counts;
foreach my $word (split(/\s+/, <>)) { $counts{$word}++ }
print map { "$_->[0] $_->[1]\n" }
sort { $b->[1] <=> $a->[1] }
map { [$_, $counts{$_}] } keys %counts;
}
main;

Posted by Gene on May 7th, 2008


On May 6, 6:36*am, arnuld <sunr...@see.sigs.invalid> wrote:
Lists are not what you want here. Lookups will get slower in
proportion to the number of words. A balanced tree or hash table
implementing a dictionary of counts is a better choice. Look at
libavl for a niced balanced tree implementation. Just raverse in
order to print. If you use a hash table, you'll need to sort it in a
separate pass.

I know you said you're using C, but this is a great example of a
problem that's easier to solve in a language meant to solve it. Perl
is such a language:

sub main {
local $/ = undef;
my %counts;
foreach my $word (split(/\s+/, <>)) { $counts{$word}++; }
print map { "$_->[0] $_->[1]\n" }
sort { $b->[1] <=> $a->[1] }
map { [$_, $counts{$_}] } keys %counts;
}
main;

Posted by Gene on May 7th, 2008


On May 6, 10:19*pm, Gene <gene.ress...@gmail.com> wrote:
I'm sorry for the dual posts with similar content. The first one
seemed to go to the bit bucket.
Gene

Posted by arnuld on May 7th, 2008




I really don't understand the difference between a Hash-Table and a Binary
Tree because I never came across them. I want to understand and implement
them but when I tried to read it using "Introduction to Algorithms -
Corment et al. 2/e", I see he is using some weired words like "satellite
data"

I don't get it. In order to print the words in decreasing order of their
occurrences and doing it efficiently in C requires me to understand
satellite data and a thing called Big-O ?




only 9 lines of code, pretty impressive




--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page


Posted by Robert Maas, http://tinyurl.com/uh3t on May 7th, 2008


Here's how I do it:
- Load all the words into a linked list, never mind duplicates at this point.
- Sort the list per alphabetical order, thereby putting duplicate
words into blocks of adjacent positions within the list.
- Traverse the list once, merging adjacent duplicates into single entries.
- Sort the list per descending count, so that it'll be ready to output.
Of course merge-sort is the sorting algorithm used.

I use Lisp, where merge-sort is built-in.

In C you need to find a library for building linked lists and for
doing merge-sort on such lists. Of course it's trivial to build the
linked lists, but it does no good if the way you build them isn't
compatible with the merge-sort algorithm you use, so you might as
well get the merge-sort package first and the either use a
list-builder within it or hand-code your own list-builder to
produce what the merge-sort package expects as input.
Or you can write merge-sort from scratch. It's not really difficult.
The trick is to a pre-pass where you convert a list of N records
into a list of N single-element lists of records. I.e. convert the
list ("hello" "this" "is" "a" "test") into the list-of-lists
(("hello") ("this") ("is") ("a") ("test")) which the main
merge-sort algorithm will interpret as N runs of length 1 which all
need to be merged into a single run. Since all initial runs are of
the same length, there's no need to compute Huffman-optimum merges,
just put all the runs into a revolving FIFO queue and repeat
pop-pop-merge-push until the queue is of length exactly 1.

How does it know whether it's seen a word before or not? If it's
seen before, how does it know where to find it in the list? It seems to
me it takes order(N) per word to search the list, hence order(N**2) total.
Unless you have a very small amount of data, that is too slow.
Merge-sort is order(n*log(n)) total, much better for large datasets.
I'm not talking about premature micro-optimization. I'm talking
about using a decent algorithm that gives decent performance,
orders of magnitude faster than what you propose.

That's not possible. You need to have a reasonable trade-off.
What's more important? Cramming into 2 kilobytes RAM but it takes
two hours to run, or using 2 megabytes RAM and it runs in a few seconds?
You have more than 2 megabytes on your machine, right?
What's a couple megabytes needed to run merge-sort?

Now if you have a really huge dataset, such that the linked list in
RAM is larger than all the RAM you have, so that it thrashes
virtual memory, then you need to modify the above algorithm
slightly to avoid thrashing. Let me know if that is true of your
situation and I'll give you an idea what modification you'll need.
But if the total time thrashing is less than 5 minutes, don't
bother changing the algorithm, because it'll take longer than 5
minutes to modify the code to avoid thrashing.

Posted by [Jongware] on May 7th, 2008


arnuld wrote:
Ah, no, because that requirement wasn't there.
I guess using the binary tree for storing and counting is still the best
way to go (never mind the 9-line perl solutions ...).
That's because your *primary* requirement is reading and counting words
/fast/. You need the huge speed advantage of an alphabetic sorting
binary tree for this -- you tack on the count number for, well, I think
for /free/! (That's because for every word /w/ you already have to walk
down the tree to see if it's there. If it is, increase w->count; if it
isn't, insert it in the tree and initialize w->count to 1.)

So that's the first half. The second part, sorting on the count, has
nothing in common with this binary tree ... so ditch it! If you keep a
running count of how many words you have in your tree, you can create an
empty array and move your data into this, using a recursive function.
Then re-sort, using the count as primary sort key.

[Jw]

Posted by Ben Bacarisse on May 7th, 2008


"[Jongware]" <sorry@no.spam.net> writes:

Another way is to re-use the binary tree code. After reading and
counting the words, make a new tree (sharing the strings of course)
ordered by word count rather than by word. It is interesting to see
how much code you can share between the two parts of the program. I
have just done this as an exercise and I am happy to post the code if
this is not someones coursework.

--
Ben.

Posted by [Jongware] on May 7th, 2008


Ben Bacarisse wrote:
Ahh yes -- you might even be able to create /pointers/ to the relevant
functions, and use these twice in your main loop. Maybe not optimal
(then again, maybe it is), but still a nice exercise.

<g> It's the mental state -- doing stuff by way of flexing your brain,
rather than because it's homework -- that confuses the homework doers.
You might receive a number of mails "can i c ur codes plz".

[Jw]

Posted by Ben Pfaff on May 7th, 2008


arnuld <sunrise@see.sigs.invalid> writes:

About the one thing in common between hash tables and binary
trees is that they both can be used for data storage and
retrieval. I'd suggest typing those terms into a friendly search
engine and trying to understand them again.
--
Ben Pfaff
http://benpfaff.org

Posted by arnuld on May 7th, 2008





I have written the a program that is *nearly* like it before I even posted
my problem but that program stores and prints the words alphabetically and
it uses a binary tree for that.

But now I have to sort and print the input by the number of occurrences in
the input which is not known in advance ( unlike alphabetical sorting
where you already know p < s )


any ideas that you can give me from your code ?




--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page


Posted by arnuld on May 8th, 2008




My employer said the implementation must be efficient enough.

what exactly efficient means ?


1.) the program is fast but consumes lots of memory.
2.) the program consumes less memory but is lower to run.
3.) the program consumes less memory and runs faster too.


??


BTW, your idea of creating a new B-Tree out of the existing B-Tree is too
hard to implement for me. and will it be efficient than Hash-Table or AVL
Tree or Red-Black tree ? ( all of which I need to learn today)


/* creates a new sorted tree from the binary tree given as input.
* The new tree is based on the sorting according to the count
* of each word
*
*/
struct tnode* sort_tree( struct tnode *p, struct tnode *z )
{
sort_tree2( p->left, z );
sort_tree2( p->right, z);

return z;
}


void sort_tree2( struct tnode *p, struct tnode *z )
{
int count_check;
char *w;

w = copy_str( p->word );
count_check = p->count;


if( !p )
{
;
}
else if( p->count == count_check )
{
z = malloc( sizeof(struct tnode) );
z->word = copy_str( p->word );
z->left = z->right = NULL;
}
else if( p->count > count_check )
{
z->right = sort_tree( p, z->right );
}
else
{
z->left = sort_tree( p, z->left );
}
}



--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page


Posted by arnuld on May 8th, 2008


I liked this idea of yours . I will try it and post her if nay problems
occur ( and they always occur



no, I have just started my 1st job as a programmer. I basically work in
C++ but this job also requires C, so I have been asked by the employer to
learn C by choosing problems to work on, on my own. So that is just one of
the challenging exercises I got from K&R2.

BTW, no university/college in India encourages students to do K&R2. Heck,
that is not even on their recommended list of books . they have only
these 2:


(1) "Let Us C" -- Yashwant Kanetkar

http://www.sapnaonline.com/MoreInfoB...cID=EBK0008879


(2) "Programming in ANSI C" -- Balaguruswamy, Tata McGraw-Hill





--
http://lispmachine.wordpress.com/
my email ID is @ the above blog.
just check the "About Myself" page


Posted by Ben Bacarisse on May 10th, 2008


arnuld <sunrise@see.sigs.invalid> writes:

There is something up with you news reader's quoting.

Exactly. You need to ask. The term is used when the speaker is
trying to avoid being specific or the context makes it clear.
E.g. "You use too much RAM, make you code more efficient" clearly
suggest that space is the main issue.

B-Trees are not the same as binary trees, nor are they the same as
binary search trees (which is what I was using).

--
Ben.

Posted by Willem on May 12th, 2008


arnuld wrote:
)>> arnuld <sunrise@see.sigs.invalid> writes:
)>> PROBLEM:
)>> I want to read words from standard input and then sort and print
)>> the words in decreasing order of appearance along with the
)>> count of how many times each word occurred.
)
)>> ....SNIP....
) I got Aho, Hopcraft and Ullman's Data Structures and Algorithms. I checked
) Sedgwick too, which was good but was much expansive. Form
) comp.programming I got these design ideas for an efficient program:
)
) 1.) Use a Hash-Table and then implement an external sorting routine for
) Hash-Table.
)
) 2.) Use an AVL tree or Red-Black tree, that way I do not need a sorting
) routine.
)
) 3.) Is uour design ( creating a new Binary tree form an existing
) Binary-tree) okay from this context of efficiency ?
)
) Though I heard about these Data-Structures, I never ever used them or
) learned them. I am learning about them now.

Here's a #4 for you: use a priority queue.

Oh, and a #5: use a Perl script:

my %cnt;
while (<>) { for (/(\w+)/g) { $cnt{$_}++ } }
for (sort { $cnt{$b} <=> $cnt{$a} } keys %cnt) { print "$_ : $cnt{$_}\n" }

Which, at the least, gives you something to test your efficiency against.
(You may need to tweak it a little to match your exact requirements)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


Similar Posts