CS140 Lecture notes -- Singly Linked Lists

  • Jim Plank (with additions by Brad Vander Zanden)

    Singly Linked Lists

    Linked lists come in two main flavors: singly linked and doubly linked lists. Doubly linked lists are more useful, but if you find yourself tight on memory, you may find yourself forced to use singly linked lists, so you should know about them. Also, you may need to talk to others about your or their data structures, and to do so, you must be fluent in the details of singly linked lists.

    The details of singly linked lists are not very pretty, but they are slightly clever, so you should know them. But first the API.

    Look in /home/cs140/spring-2004/include/sllist.h. This header file defines the programming interface to singly linked lists. There is a single struct that defines a data type called a Sllist:

    typedef struct sllist { 
      struct sllist *link;
      Jval val;
    } *Sllist;
    
    You've seen this before. Now, the following procedures create and manipulate Sllist's: Now the structure of an Sllist is important. If a Sllist contains n items, then it will have n+1 nodes, each node being a struct sllist. The first node is called a sentinel or dummy node. You can also think of it as a header node. Its val field is not used, and you should not use it. The link field of the sentinel node is the first node on the list. The link field of each node on the list points to the next node. The link field of the last node on the list points back to the sentinel node.

    This structure is very common in linked list code for two reasons. First, it's nice to have a sentinel node so that empty lists have a clean implementation -- they are simply the sentinel node whose link field points to itself. Second, they remove the need for special code to deal with the ends of the list. You'll see how when you see the code.

    So, for a few examples. When you call list = new_sllist(), you get a pointer to a Sllist struct whose link field points to itself:

    list ----+->|---------|
             |  | link ------\
             |  | val = ? |  |
             |  |---------|  |
             |               |
             \---------------/
    
    Note, the val field is uninitialized, and should not be used.

    If we next call tmp = sllist_prepend(list, new_jval_i(3)), then a node will be added to the list. It will look as follows:

            
    tmp----------------------\
                             |
    list ----+->|---------|  +----->|---------|
             |  | link ------/      | link --------\
             |  | val = ? |         | val.i=3 |    |
             |  |---------|         |---------|    |
             |                                     |
             \-------------------------------------/
    
    Now, suppose you insert a new node after tmp with the call tmp2 = sllist_insert_after(tmp, new_jval_i(9)). Now, you get:
            
    tmp2-----------------------------------------\
    tmp----------------------\                   |
                             |                   |
    list ----+->|---------|  +----->|---------|  +--->|---------|
             |  | link ------/      | link ------/    | link --------\
             |  | val = ? |         | val.i=3 |       | val.i=9 |    |
             |  |---------|         |---------|       |---------|    |
             |                                                       |
             \-------------------------------------------------------/
    
    Finally, if you call tmp = sllist_prepend(list, new_jval_i(1)), you'll get a new node in the front of the list.
    tmp2-------------------------------------------------------\
    tmp----------------------\                                 |
                             |                                 |
    list ----+->|---------|  +--->|---------|  /->|---------|  +->|---------|
             |  | link ------/    | link ------/  | link ------/  | link --------\
             |  | val = ? |       | val.i=1 |     | val.i=3 |     | val.i=9 |    |
             |  |---------|       |---------|     |---------|     |---------|    |
             |                                                                   |
             \-------------------------------------------------------------------/
    

    There is an important thing to note about this API (oh, API is a acronym for ``application programming interface.'' This is a buzzword that is bandied about quite a bit these days, so I thought I'd join in with the trend). That is that there are some primitives missing that we would probably like, such as sll_last() to return the last node in the list, sll_append() to put an item at the end of the list, and sll_delete_node() to delete an item from the list. As it turns out, you really can't implement these cleanly and efficiently, so they are best left out. The book does implement some of these, but their implementation is not efficient (or, in the case of deleting, it is not clean). The bottom line is that if you need these things, you would do best to use a different data structure (doubly linked lists).


    A usage example

    Look at gradefile. This is a fictitious grade file for a class that has both graduates and undergraduates. The format of the file is:
    first-name last-name U/G score
    
    The U/G says whether the person is a graduate or undergraduate. Now, suppose you want to write a program that takes a grade file on standard input, and prints out the average for graduates, and then a listing of the graduate students plus their grades and their distance to the average, and then the average for undergraduates and a similar listing of students.

    This is something that can be done fairly well with singly linked lists. We'll take each person and make a struct for that person that has the person's name and grade, and then putting that struct onto either a list for graduate students or for undergraduates.

    The code is in grader.c. What it does is create two Sllists called grad and ugrad. It appends students to these lists. To append to the list, you must maintain a pointer to the last node on the list. This is gtmp for the graduate student list, and ugtmp for the undergraduates.

    Each student is put into a struct, which is then entered into the list as a (void *). This is a legal thing to do, since all pointers in C (in this case (void *)'s and (Person *)'s) are the same sizel

    Once the students are all read in, we need to calculate the averages and print out the students. Since we're doing this twice, it's best to do these things in procedures. return_avg() calculates the average of a list of students, and print_list() prints out the students.

    Note that these make use of the macro sll_traverse():

    #define sll_traverse(tmpnode, list) for (tmpnode = sll_first(list); \
                                             tmpnode != sll_nil(list); \
                                             tmpnode = sll_next(tmpnode))
    
    What this does is substitute the for loop wherever it sees sll_traverse(). This may be a little confusing, but what it means is that the C preprocessor turns:
      sll_traverse(tmp, d) {
    
    into
      for (tmp = sll_first(d); tmp != sll_nil(d); tmp = sll_next(tmp)) {
    
    It's a nice way of traversing the list and having the code say what you're doing.

    Anyway, try out the code and see that it works. Note that the only reason that the output is sorted is that the input file is also sorted by grade. Grader does nothing to actually sort beyond separating the students into graduates and undergraduates.

    UNIX> head gradefile
    Betty Flintstone U 99.43
    Pat Anderson U 98.56
    Pat Fulmer U 96.77
    Pat Ward G 96.01
    Barney Fulmer U 94.80
    Phil Rubble G 93.64
    Wilma Rubble G 93.05
    Bill Flintstone U 92.85
    Dino Fulmer U 92.00
    Fred Ward G 90.62
    UNIX> grader < gradefile
    Undergraduates: Average = 74.19
      Betty      Flintstone   99.43   25.24
      Pat        Anderson     98.56   24.37
      Pat        Fulmer       96.77   22.58
      Barney     Fulmer       94.80   20.61
      Bill       Flintstone   92.85   18.66
      Dino       Fulmer       92.00   17.81
      ...
    
    Graduates: Average = 75.53
      Pat        Ward         96.01   20.48
      Phil       Rubble       93.64   18.11
      Wilma      Rubble       93.05   17.52
      Fred       Ward         90.62   15.09
      ...
    
    UNIX>
    

    Let's now place an additional requirement on the undergraduate and graduate lists that will require the flexibility of a list. Specifically let's say that the lists should be printed alphabetically by last name. To simplify matters, we will say that if two people have the same last name then it does not matter in which order they are printed out.

    The easiest way to satisfy this requirement is to keep the lists in alphabetical order. This means that each time we add a person to the list we will need to insert that person in its proper place in alphabetical order. This in turn means that we will need to insert into the middle of the list. In order to insert a person we will traverse the list until we find a node whose last name is alphabetically greater than the person we are inserting. We will then insert the new person before this node (call it the greater node).

    When we scan the list of operations provided by sllist.h we find that there is an insert_after operation but no insert_before operation. Hence, we will need to have a pointer to the node immediately preceding the greater node. In order to have this pointer available we will need to save a pointer to both the previous node in the list and the current node. Here is the code that accomplishes this task:

    void insert_person(Person *p, Sllist student_list) {
      Sllist prev_node = student_list;
      Sllist current_node;
      Person *current_person;
     
      sll_traverse(current_node, student_list) {
        current_person = (Person *)current_node->val.v;
        if (strcmp(p->lname, current_person->lname) < 0)
          break;
        else
          prev_node = current_node;
      }
      sll_insert_after(prev_node, new_jval_v((void *)p));
    }
    

    Notice that we start by setting prev_node to student_list. student_list points to the list's sentinel node. Making prev_node point to the sentinel node ensures that we will insert the new person in the proper place, even if the new person should be the first person in the list. If the new person should be first in the list, then the strcmp operation will return a result that is less than 0 when the new person is compared with the first node in the list. prev_node will point to the sentinel node in this case so the sll_insert_after command will place the new person after the sentinel node and hence make the new person the first node in the list.

    In the most common case the new person will go somewhere in the middle of the list. In this case the current_node pointer will be incremented several times to point at the next node in the list. Before the pointer is incremented, prev_node is set to the current node so that we always have a pointer to the previous node. When the code finally finds a node whose last name is alphabetically greater than the new person, it will be able to use the previous node pointer to perform the sll_insert_after operation.

    So what could go wrong with this code? Suppose that the new person's last name is alphabetically greater than any other node in the list. For example suppose we want to insert "Vander Zanden" into the following list:

    list ----+->|---------|  +--->|-------------|  /->|-------------|  
             |  | link ------/    | link ---------/   | link --------------|
             |  | val = ? |       | val="Brown" |     | val="Smith" |      |
             |  |---------|       |-------------|     |-------------|      |
             |                                                             |
             \-------------------------------------------------------------/
    
    The traversal of the list will come to an end after current_node visits "Smith" and the loop will exit without the strcmp ever returning a result less than 0. At this point prev_node will point to "Smith"'s node. That is exactly what we want, since "Vander Zanden" should be inserted after "Smith." In other words our code works even when the person should be inserted at the end of the list.

    The code for our sort program can be found in sorter.c. Here is the result of running it on gradefile:

    UNIX> grader < gradefile
    Undergraduates: Average = 74.19
      Pat        Anderson     98.56   24.37
      Fred       Anderson     89.14   14.95
      Barney     Anderson     87.59   13.40
      Phil       Anderson     75.63    1.44
      Wilma      Anderson     63.53  -10.66
      Bill       Anderson     63.48  -10.71
      John       Anderson     62.49  -11.70
      Dino       Anderson     52.05  -22.14
      Betty      Anderson     51.65  -22.54
      Betty      Flintstone   99.43   25.24
      Bill       Flintstone   92.85   18.66
      Phil       Flintstone   87.43   13.24
      ...
    
    Graduates: Average = 75.53
      Phil       Rubble       93.64   18.11
      Wilma      Rubble       93.05   17.52
      Betty      Rubble       80.89    5.36
      Bill       Rubble       73.04   -2.49
      John       Rubble       71.77   -3.76
      Dino       Rubble       68.64   -6.89
      Barney     Rubble       67.14   -8.39
      Fred       Rubble       61.32  -14.21
      Pat        Rubble       51.04  -24.49
      Bill       Summitt      89.32   13.79
      Fred       Summitt      88.23   12.70
      Betty      Summitt      83.71    8.18
      Dino       Summitt      82.24    6.71
      ...
    

    The implementation

    The implementation is in sllist.c. I'll go over each subroutine. The whole set of procedures is very nice and clean (i.e. easy to read and understand). This is made possible by the sentinel node.

    First, new_sllist() merely creates and returns an empty list:

    list ----+->|---------|
             |  | link ------\
             |  | val = ? |  |
             |  |---------|  |
             |               |
             \---------------/
    
    Here's the code:
    Sllist new_sllist()
    {
      Sllist l;
    
      l = (Sllist) malloc(sizeof(struct sllist));
      l->link = l;
      return l;
    }
    
    Similarly, an list is empty only if l->link points to l. Therefore, sll_empty() is one line:
    int sll_empty(Sllist l)
    {
      return (l->link == l);
    }
    
    To insert a node after a node, the code is the same, regardless of whether the node is the sentinel, the first node, a middle node, or the last node. You simply create a new node, have that node's link point to the specified node's link, and have the node's link point to the new node. Note, it must be done in that order, or the rest of the list after the node will get lost!
    Sllist sll_insert_after(Sllist l, Jval val)
    {
      Sllist tmp;
    
      tmp = (Sllist) malloc(sizeof(struct sllist));
      tmp->val = val;
      tmp->link = l->link;
      l->link = tmp;
      return tmp;
    }
    
    Note, this works on the empty list when l is the sentinel. Work out the pointers for yourself if that is not clear to you.

    Now, all sll_prepend(l, val), is create a new node whose val field is val, and inserts it after the sentinel. Thus, sll_prepend() is equivalent to sll_insert_after() called on the sentinel. Again, the sentinel has made our life very simple:

    Sllist sll_prepend(Sllist l, Jval val)
    {
      Sllist tmp;
    
      return sll_insert_after(l, val);
    }
    
    The first node on the list is the one after the sentinel:
    Sllist sll_first(Sllist l)
    {
      return l->link;
    }
    
    and the next node following node is the one pointed to by its link field:
    Sllist sll_next(Sllist l)
    {
      return l->link;
    }
    
    All this leaves are sll_nil() and free_sllist(). Sll_nil() returns the sentinel node. This is because the link field of the last node on the list is the sentinel node. Moreover, when the list is empty, sll_first() also returns the sentinel node. The code is:
    Sllist sll_nil(Sllist l)
    {
      return l;
    }
    
    Finally, free_sllist(l) needs to free every node in l. It is done as follows:
    free_sllist(Sllist l)
    {
      Sllist tmp;
    
      while (!sll_empty(l)) {
        tmp = sll_first(l);
        l->link = tmp->link;
        free(tmp);
      }
      free(l);
    }
    
    The code does the following. While there are still nodes on the list besides the sentinel node, you remove the first node from the list and free it. Then continue until the list it empty (i.e. only the sentinel node remains). Then free it.

    Why are deletion and appending bad?

    The problem with trying to implement something like sll_append() is that to do so, you must keep track of the last node in the list. You can go ahead and find that node when you call sll_append(), but that requires traversing the list at each call to sll_append(), which is too expensive, certainly compared to sll_prepend() which just takes a few instructions. Alternatively, you could attempt to keep track of the last node in a header node, much like you do in your Queue implementation. However, then if the last node changes (due to, say, a call to sll_insert_after()), you must be able to recognize this and account for it. This means, for example, that you will either have to include a pointer to the list on calls to sll_insert_after(), or perhaps to have each node on the list contain a pointer to the sentinel so that sll_insert_after() can recognize when it is appending. My decision was that this was too cumbersome -- if you are going to have an extra pointer in the Sllist struct, you may as well have a doubly linked list.

    Deletion fits in even less well. In order to delete a node, you need to have a pointer to the previous node on the list. Why? Because that is the only node on the list that has a pointer to the node to be deleted, and in order to delete it, you must change that pointer to point to the next node in the list. Thus, you must either

    The bottom line is this. If you want to do list appending, you should do it as in grader.c. Even that looks inelegant to me. I would go as far as to say that if you need list appending or deletion of arbitrary nodes, then you should use a different data structure: a doubly linked list.