CS140 Lecture notes -- Singly Linked Lists

<head>
<title>CS140 Lecture notes -- Singly Linked Lists</title>
<link rel="stylesheet" type="text/css" href="../cs140_notes.css" />
</head>

<h1>CS140 Lecture notes -- Singly Linked Lists</h1>
<LI><a href=http://web.eecs.utk.edu/~jplank>Jim Plank</a> (with additions by
	<a href=http://web.eecs.utk.edu/~bvanderz>Brad Vander Zanden</a>)
<hr>

<h2>Singly Linked Lists</h2>

Linked lists come in two main flavors: singly linked and doubly
linked lists.  Doubly linked lists are more useful, but if you find
yourself tight on memory, you may find yourself forced to use
singly linked lists, so you should know about them.  Also, you may
need to talk to others about your or their data structures, and 
to do so, you must be fluent in the details of singly linked lists.
<p>
The details of singly linked lists are not very pretty, but they 
are slightly clever, so you should know them.  But first the 
API (API is a acronym for ``application programming interface.'').
I have created an API and implementation that is simpler than the one
in the textbook. I feel that the API in the textbook is overly long and
too hard to remember. I also feel that the implementation in the textbook
is overly complicated and should not have used function pointers.
<p>
Look in 
<a href=sllist.h>
<b>sllist.h</b></a>.  This header file
defines the programming interface to singly linked lists.  There
are two structs: a container struct called <strong>Sllist</strong>
that holds information about the
list and a node struct called <strong>Sllist_Node</strong> that 
contains the information for a single node in the list:
<pre>
typedef struct sllist_node {
  struct sllist_node *next;  // pointer to the next element in the list
  void *val;                 // pointer to the data for this node
} Sllist_Node;

typedef struct sllist {
  Sllist_Node *head;  // pointer to the first element in the list
  Sllist_Node *tail;  // pointer to the last element in the list
} Sllist;
</pre>
A few notes about the above two structs:
<p>
<ul>
<li> I declared Sllist after Sllist_Node because it referenced Sllist_Node.
     Some C compilers will complain if you try to reference a type before
     it has been declared.
<li> I could not use an anonymous list type for Sllist_Node because I
     needed to refer to the struct before the type definition was complete.
     In order to do so I had to name it, because I could not yet use
     the name of the type. The C compiler may 
     complain if you try to reference a type before it has been completely
     declared.
     This definition is an example of a "recursive" definition, since the
     definition includes a reference to itself. 
<li> You have not yet seen a <tt>void</tt> pointer. It is a generic pointer
     that can point to anything. You can cast a void pointer to anything that
     you like. You will see examples of how to do this later in the notes. The
     nice thing about using a <tt>void</tt> pointer is that we do not have to
     commit in advance to what kind of data we are pointing to--we can point
     to any data we like as long as we have a pointer to it.
</ul>
<p>
Now, the following procedures create and
manipulate <b>Sllist</b>'s:
<UL>
<LI> <b>Sllist *new_sllist()</b>: Allocates and returns a container
     object for a new singly linked list. 
<LI> <b>void free_sllist(Sllist *l)</b>: Destroys the list, calling <b>free()</b> 
      on all allocated memory in the list.  The list does not have to 
      be empty.
<LI> <b>Sllist_Node *sll_prepend(Sllist *l, void *val)</b>: 
       Adds a new node at the beginning of the list.  
        This node's value is <b>val</b>, and a pointer to the node is returned.
<LI> <b>Sllist_Node *sll_append(Sllist *l, void *val)</b>: 
       Adds a new node to the end of the list.  
        This node's value is <b>val</b>, and a pointer to the node is returned.
<LI> <b>Sllist_Node *sll_insert_after(Sllist *l, Sllist_Node *n, void *val)</b>: 
     Adds a new node to list <b>l</b>
     right after the specified node, <strong>n</strong>.  
      This node's value is <b>val</b>, and a pointer to the node is returned.
      A pointer to the list's container object must be passed as an argument
      because the 
      insertion algorithm may have to update the list's tail pointer, if
      the new node becomes the last element in the list.
<LI> <b>Sllist_Node *sll_first(Sllist *l)</b>: Returns a pointer to the first node
     in the list.  If the list is empty, this returns <tt>NULL</tt>.
<LI> <b>Sllist_Node *sll_last(Sllist *l)</b>: Returns a pointer to the last node
     in the list.  If the list is empty, this returns <tt>NULL</tt>.
<LI> <b>Sllist_Node *sll_next(Sllist_Node *n)</b>: Returns a pointer to the next node
     in the list after <b>n</b>.  If <b>n</b> is the last node on the list,
     then <b>sll_next(n)</b> returns <tt>NULL</tt>.
<li> <strong>void *sll_val(Sllist_Node *n)</strong>: Returns a pointer to
     <tt>n</tt>'s value. You will need to cast this pointer to the appropriate
     type.
<LI> <b>int sll_empty(Sllist *l)</b>: Returns whether <b>l</b> is empty.
</UL>
<p>
Now the structure of an <b>Sllist</b> is important and it illustrates
how data structures are typically designed. <strong>Sllist</strong> is
a container object that contains administrative information about the list.
In this case it contains pointers to the first and last elements in the list.
In the textbook, the container object contains additional fields that keep
track of the size of the list and functions that should be used for destroying
and matching elements in the list. 
<p>
The <strong>Sllist_Node</strong> object
represents the actual nodes in the list. The programmer should not have
to know how Sllist_Node is implemented so we provide <em>accessor</em>
functions, such as <strong>sll_first, sll_next</strong>, and 
<strong>sll_val</strong>, that allow programmers to 
access the important information in each list node. You should always
use the accessor functions to access the values in a list, rather than
the field names (e.g., <tt>val</tt> and <tt>next</tt>), because you do not
want your code to break if the implementor of the sllist library decides
to change the field names at a later date.
<p>
So, for a few examples.  When you call <b>list = new_sllist()</b>, you get a
pointer to a <b>Sllist</b> struct whose head and tail fields are NULL, since
the list is empty.
<pre>
list ------>|-------------|
            | head = NULL |
            | tail = NULL |
            |-------------|
</pre>
<p>
If we next call <b>tmp = sllist_prepend(list, p)</b>, where <b>p</b>
is a pointer to some data value, then
a node will be added to the list. If we assume that <b>p</b> has the
address <tt>0x100</tt>, then the node will look as follows:
<pre>        
tmp----------------------\
                         |
list ------>|---------|  +----->|-------------|
            | head -----/       | next = NULL |
            | tail ----/        | val = 0x100 |
            |---------|         |-------------|
</pre>
Now, suppose you insert a new node after <b>tmp</b> with the call
<b>tmp2 = sllist_insert_after(list, tmp, p1)</b> and that <b>p1</b>
has the address <tt>0x200</tt>.  Now, you get:
<pre>        
tmp2-------------------------------------------\
tmp----------------------\                      \
                         |                       \
list ------>|---------|  +----->|-------------|   +--->|-------------|
            | head -----/       | next = --------/     | next = NULL |
            | tail ------       | val = 0x100 |     -->| val = 0x200 |
            |---------|  \      |-------------|    /   |-------------|
                          \-----------------------/
</pre>
Finally, if you call <b>tmp = sllist_prepend(list, p2)</b>, where 
<b>p2</b> has the address <tt>0x300</tt>
you'll get a new node in the front of the list.
<pre>        
tmp2------------------------------------------------------------------\
tmp----------------------\                                             \
                         |                                              \
list ------>|---------|  +----->|-------------|   +--->|-------------|   +--->|-------------|
            | head -----/       | next = --------/     | next = --------/     | next = NULL |
            | tail ------       | val = 0x300 |        | val = 0x100 |     -->| val = 0x200 |
            |---------|  \      |-------------|        |-------------|    /   |-------------|
                          \----------------------------------------------/
</pre>
<p>
There is an important thing to note about this API.
That is that there are some primitives missing that we would 
probably like, such as <b>sll_insert_before()</b> to insert before a node in the
list,
and <b>sll_delete_node()</b> to delete an item from the list.
As it turns out, you really can't implement these cleanly and efficiently,
so they are best left out.   The book does implement some of these, but
their implementation is not efficient (or, in the case of deleting, 
it is not clean).  The bottom line is that if you need these things,
you would do best to use a different data structure (doubly linked
lists).
<p>

<hr>
<h3>A usage example</h3>

Look at 
<a href=gradefile><b>gradefile</b></a>.
This is a fictitious grade file for a class that has both graduates and
undergraduates.  The format of the file is:
<pre>
first-name last-name U/G score
</pre>
The U/G says whether the person is a graduate or undergraduate.
Now, suppose you want to write a program that takes a grade file on 
standard input, and prints out the average for graduates, and then
a listing of the graduate students plus their grades and their 
distance to the average, and then the average for undergraduates and
a similar listing of students.  
<p>
This is something that can be done fairly well with singly linked lists.
We'll take each person and make a struct for that person that has
the person's name and grade, and then putting that struct onto either
a list for graduate students or for undergraduates.
<p>
The code is in <a href=grader.c><b>grader.c</b></a>.
What it does is create two <b>Sllist</b>s called <b>grad</b> and
<b>ugrad</b>.  It appends students to these lists using <b>sll_append</b>. 
<p>
Each student is put into a <b>struct</b>, which is then entered into
the list as a <b>(void *)</b>.  This is a legal thing to do, since all
pointers in C (in this case <b>(void *)</b>'s and <b>(Person *)</b>'s)
are the same sizel
<p>
Once the students are all read in, we need to calculate the averages
and print out the students.  Since we're doing this twice, it's best
to do these things in procedures.  <b>return_avg()</b> calculates the
average of a list of students, and <b>print_list()</b> prints out the
students.  
<p>
Try out the code and see that it works.  Note that the only reason
that the output is sorted is that the input file is also sorted by grade.
<b>Grader</b> does nothing to actually sort beyond separating the students
into graduates and undergraduates.
<pre>
UNIX> <b>head gradefile</b>
Betty Flintstone U 99.43
Pat Anderson U 98.56
Pat Fulmer U 96.77
Pat Ward G 96.01
Barney Fulmer U 94.80
Phil Rubble G 93.64
Wilma Rubble G 93.05
Bill Flintstone U 92.85
Dino Fulmer U 92.00
Fred Ward G 90.62
UNIX> <b>grader < gradefile</b>
Undergraduates: Average = 74.19
  Betty      Flintstone   99.43   25.24
  Pat        Anderson     98.56   24.37
  Pat        Fulmer       96.77   22.58
  Barney     Fulmer       94.80   20.61
  Bill       Flintstone   92.85   18.66
  Dino       Fulmer       92.00   17.81
  ...

Graduates: Average = 75.53
  Pat        Ward         96.01   20.48
  Phil       Rubble       93.64   18.11
  Wilma      Rubble       93.05   17.52
  Fred       Ward         90.62   15.09
  ...

UNIX>
</pre>
<p>
Let's now place
an additional requirement on the undergraduate and graduate lists that will
require the flexibility of a list. Specifically let's say that the lists
should be printed in ascending order by score. To simplify matters, we will
say that if two people have the same
score then it does not matter in which order they are printed out. 
<p>
The easiest way to satisfy this requirement is to keep the lists in 
ascending order by score. This means that each time we add a person to the
list we will need to insert that person in its proper place in ascending score
order. This in turn means that we will need to insert into the middle of
the list. In order to insert a person we will traverse the list until we
find a node whose score is greater than the person we
are inserting. We will then insert the new person before this node (call it
the <tt>greater</tt> node). 
<p>
When we scan the list of operations provided by sllist.h we find that there
is an <tt>insert_after</tt> operation but no <tt>insert_before</tt> operation.
Hence, we will need to have a pointer to the node immediately preceding the
greater node. In order to have this pointer available we will need to save
a pointer to both the previous node in the list and the current node. Here is
the code that accomplishes this task:
<p>
<pre>
void insert_person(Person *p, Sllist *student_list) {
  Sllist_Node *prev_node = NULL;
  Sllist_Node *current_node;
  Person *current_person;

  if (sll_empty(student_list)) {
    sll_append(student_list, p);
    return;
  }
  for (current_node = sll_first(student_list);
       current_node != NULL;
       current_node = sll_next(current_node)) {
    current_person = (Person *)sll_val(current_node);
    if (p->score < current_person->score)
      break;
    else
      prev_node = current_node;
  }
  if (prev_node == NULL) 
    sll_prepend(student_list, p);
  else
    sll_insert_after(student_list, prev_node, p);
}
</pre>
<p>
Notice that we start by initializing <tt>prev_node</tt> to <tt>NULL</tt>.
If the new person should be first in the list, then
the score comparison operation will return true when the
new person is compared with the first node in the list. <tt>prev_node</tt>
will be NULL, so in this case we will call <strong>sll_prepend</strong> to
prepend the new person to the front of the list.
<p>
We also start by checking whether the list is empty. If so, we use <strong>sll_append</strong>
to make the person be the first node in the list (we could just as easily have
used <strong>sll_prepend</strong>).
<p>
In the most common case the new person will go somewhere in the middle of
the list. In this case the <tt>current_node</tt> pointer will be incremented
several times to point at the next node in the list. Before the pointer
is incremented, <tt>prev_node</tt> is set to the current node so that we
always have a pointer to the previous node. When the code finally finds a
node whose score is greater than the score of the new person, it will
be able to use the previous node pointer to perform the 
<tt>sll_insert_after</tt> operation. 
<p>
So what could go wrong with this code? Suppose that the new person's score
name is greater than any other node in the list. For example
suppose we want to insert 100 into a list whose last score is 97.
<p>
The traversal of the list will come to an end after <tt>current_node</tt>
visits the node with a score of 97
 and the loop will exit without the score comparison ever succeeding.
At this point <tt>prev_node</tt> will point to 
the node whose score is 97. 
That is exactly what we want, since the person with a score of 100 should
be inserted after the person with a score of 97.
 In other words our code works even when the
person should be inserted at the end of the list.
<p>
The code for our sort program can be found in
<a href=sorter.c>sorter.c</a>. Here is the result of running it on
<tt>gradefile</tt>:
<pre>
UNIX> <b>sorter < gradefile</b>
Undergraduates: Average = 74.19
  Betty      Anderson     51.65  -22.54
  Dino       Anderson     52.05  -22.14
  Pat        Flintstone   52.85  -21.34
  Fred       Fulmer       53.70  -20.49
  Barney     Flintstone   54.12  -20.07
  Fred       Flintstone   54.57  -19.62
  Dino       Flintstone   58.92  -15.27
  John       Anderson     62.49  -11.70
  ...
  Barney     Fulmer       94.80   20.61
  Pat        Fulmer       96.77   22.58
  Pat        Anderson     98.56   24.37
  Betty      Flintstone   99.43   25.24

Graduates: Average = 75.53
  Pat        Rubble       51.04  -24.49
  John       Ward         56.09  -19.44
  Phil       Summitt      58.68  -16.85
  John       Summitt      60.22  -15.31
  Fred       Rubble       61.32  -14.21
  ...
  Fred       Ward         90.62   15.09
  Wilma      Rubble       93.05   17.52
  Phil       Rubble       93.64   18.11
  Pat        Ward         96.01   20.48
</pre>
<hr>
<h3>The implementation</h3>

The implementation is in <a href=sllist.c><b>sllist.c</b></a>.
I'll go over each subroutine.  The procedures could be a bit
simpler to read and understand if we did not have to worry about
maintaining a head and tail pointer. When we discuss doubly
linked lists, we will use a technique
involving something called a <em>sentinel node</em> to
eliminate the need for this extra code.
<p>
For the current implementation, <b>new_sllist()</b> merely creates and returns an empty list:
<pre>
list ------>|-------------|
            | head = NULL |
            | tail = NULL |
            |-------------|
</pre>
Here's the code:
<pre>
Sllist *new_sllist()
{
  Sllist *l;
  
  l = (Sllist *) malloc(sizeof(Sllist));
  l->head = NULL;
  l->tail = NULL;
  return l;
}
</pre>
The above diagram should suggest a way to check whether a list is empty--it 
is empty only if <b>head</b> is NULL.
Therefore, <b>sll_empty()</b> is one line:
<pre>
int sll_empty(Sllist *l)
{
  return (l->head == NULL);
}
</pre>
<hr>
<h3> List Insertion Routines </h3>
<p>
To insert a node after another node <tt>n</tt>, the code is the same, regardless of
whether the node is the first node, a middle node, or
the last node.  You simply create a new node, have the new node's <b>next</b>
point to <tt>n's</tt> <b>next</b>, and have <tt>n's</tt> <b>next</b>
point to the new node.  Note, it must be done in that order, or the
rest of the list after the node will get lost! We must also check whether or
not <tt>n</tt> used to be the last node in the list. If so, then the new node
becomes the last node in the list and we must update the list's <tt>tail</tt> pointer:
<pre>
Sllist_Node *sll_insert_after(Sllist *l, Sllist_Node *node, void *val)
{
  Sllist_Node *tmp;

  tmp = (Sllist_Node *) malloc(sizeof(Sllist_Node));
  tmp->val = val;
  tmp->next = node->next;
  node->next = tmp;

  /* if node was the previous tail of the list, then the new element becomes
     the new tail of the list */
  if (l->tail == node)
    l->tail = tmp;
  return tmp;
}
</pre>
<p>
Note that <b>sll_insert_after</b> does not give us a way to add a node to the beginning of
the list. We could do something kludgy like the textbook and say that if <tt>NULL</tt> is passed
as the node to <b>sll_insert_after</b>, then the value will be prepended to the front of the list. In
practice it is better to define another function that prepends to the front of the list, because
people often think in terms of prepending or appending to a list and so would like an operation
that does exactly that. Additionally, prepending to the list is going to require some special
manipulation of the list's <tt>head</tt> pointer, so it is better to isolate this special case
in a separate function. When we prepend an item to the list, we must make the list's <tt>head</tt>
pointer point to the new node and make the new node point to the node to which the list's <tt>head</tt>
used to point, since this node will now be the second node in the list. We must also be prepared
for the special case in which the list was previously 
empty. In this case, the list's <tt>tail</tt> pointer
must also be updated to point to the new node, since the new node will be both the first and last
node in the list:
<pre>
Sllist_Node *sll_prepend(Sllist *l, void *val)
{
  Sllist_Node *tmp;

  tmp = (Sllist_Node *) malloc(sizeof(Sllist_Node));
  tmp->val = val;
  tmp->next = l->head;

  // make the new element be the head of the list
  l->head = tmp;

  // if the list was empty, make the tail point to the new element
  if (l->tail == NULL)
    l->tail = tmp;
  return tmp;
}
</pre>
Unlike <b>sll_prepend</b>, <b>sll_append</b> is not strictly necessary because we could
use <b>sll_last</b> to retrieve the last node in the list and then call <tt>sll_insert_after</tt>
to insert the new value after this node. However, that is a bit awkward, especially for an
operation like append that will be frequently performed. <b>sll_append</b> has an implementation
that is very similar to <b>sll_prepend</b>, except that we update the list's <tt>tail</tt> pointer
to point to the new node and make the node that used to be the tail point to the new node. We must
also handle the special case where the list was previously empty. In this case the new node will
be both the first and last object in the list, and hence we must make the <tt>list's</tt> head
pointer point to it as well:
<pre>
Sllist_Node *sll_append(Sllist *l, void *val)
{
  Sllist_Node *tmp;

  tmp = (Sllist_Node *) malloc(sizeof(Sllist_Node));
  tmp->val = val;
  tmp->next = NULL;
  
  // make the previous tail point to this new element and then update 
  // the list's tail to point to the new element
  if (l->tail != NULL) 
    l->tail->next = tmp;
  l->tail = tmp;
  
  // if the list was empty, made the head point to the new element
  if (l->head == NULL)
    l->head = tmp;
  return tmp;
}
</pre>
<hr>
<h3> Accessor Routines </h3>
<p>
The accessor functions can all be implemented with simple one line functions.
<p>
The first node on the list is the one pointed to by the list's <tt>head</tt> pointer:
<pre>
Sllist_Node *sll_first(Sllist *l)
{
  return l->head;
}
</pre>
Similarly, the last node on the list is the one pointed to by the list's <tt>tail</tt> pointer:
<pre>
Sllist_Node *sll_last(Sllist *l)
{
  return l->tail;
}
</pre>
The next node following the current node is the one pointed to by its 
<b>next</b> field:
<pre>
Sllist_Node *sll_next(Sllist_Node *n)
{
  return n->next;
}
</pre>
Notice that <b>sll_next</b> takes a node as a parameter while <b>sll_first</b> and <b>sll_last</b>
take a list as a parameter. That is because we can determine the first and last nodes by consulting
the list's <tt>head</tt> and <tt>tail</tt> pointers, while we can only determine the next node
in the list by consulting the current node.
<p>
If we want the value of the current node, then we return the <tt>val</tt> field:
<pre>
void *sll_val(Sllist_Node *n) 
{
  return n->val;
}
</pre>
<hr>
<h3> Freeing the List </h3>
<p>
When we are done using a list, we should free it. The text book allows the programmer to
provide a <tt>destroy</tt> function that will be called on each element of the list when
the list is destroyed. This is a bit too complicated for us at this point. We will simply
iterate through the nodes of the list and free the memory associated with each node. Then
we will free the memory associated with the list's container object. The user will need to
separately find a way to free the memory associated with the objects pointed to by the <tt>val</tt>
field of each node:
<pre>
void free_sllist(Sllist *l)
{
  Sllist_Node *current_node, *next_node;
 
  for (current_node = l->head;
       current_node != NULL;
       current_node = next_node) {
    next_node = current_node->next;
    free(current_node);
  }
  free(l);
}
</pre>
The code does the following.  It iterates through the list and as it reaches
each node, it saves a pointer to the next node in the list. Then it frees the
memory associated with the current node. If we did not save a pointer to the
next node before freeing the current node, then the pointer to the next node would
be lost when we freed the current node and we could not continue to iterate through
the list.
<hr>
<h3>Why are deletion and inserting before another node bad?</h3>
<p>
Deletion and inserting before another node are both bad because both
operations require that you have a pointer to the previous node on the list.  
Why? Because deletion requires that you make the previous node in the list
point to the node that our deleted node used to point to. And inserting
before a node requires that we have a pointer to the previous node. Unfortunately,
we only have pointers to the next node in a singly linked list. If we
want to obtain a pointer to the previous node, we have to start from the
front of the list, and traverse each element until we reach the node we
wish to delete or insert before. Along the way we must take care to always
save a pointer to the previous node. This traversal and remembering process
is cumbersome. The text book actually pushes the burden for this traversal
onto the programmer by having its deletion procedure take a parameter to
the node preceding the node you wish to delete. In other words, you must
implement the code that finds the previous node. Once you have done so, you
can call the book's deletion procedure.
<p>
The bottom line is this.  If you want to do deletions or be able to insert
values before nodes in a list, then you should use a different data structure:
a doubly linked list. As a bonus, you will find that it is much easier to
implement the append operation, at least when you use the sentinel node technique
that we discuss in the dllist notes.