CS140 Lecture notes -- Singly Linked Lists

  • Jim Plank (with additions by Brad Vander Zanden)

    Singly Linked Lists

    Linked lists come in two main flavors: singly linked and doubly linked lists. Doubly linked lists are more useful, but if you find yourself tight on memory, you may find yourself forced to use singly linked lists, so you should know about them. Also, you may need to talk to others about your or their data structures, and to do so, you must be fluent in the details of singly linked lists.

    The details of singly linked lists are not very pretty, but they are slightly clever, so you should know them. But first the API (API is a acronym for ``application programming interface.''). I have created an API and implementation that is simpler than the one in the textbook. I feel that the API in the textbook is overly long and too hard to remember. I also feel that the implementation in the textbook is overly complicated and should not have used function pointers.

    Look in sllist.h. This header file defines the programming interface to singly linked lists. There are two structs: a container struct called Sllist that holds information about the list and a node struct called Sllist_Node that contains the information for a single node in the list:

    typedef struct sllist_node {
      struct sllist_node *next;  // pointer to the next element in the list
      void *val;                 // pointer to the data for this node
    } Sllist_Node;
    
    typedef struct sllist {
      Sllist_Node *head;  // pointer to the first element in the list
      Sllist_Node *tail;  // pointer to the last element in the list
    } Sllist;
    
    A few notes about the above two structs:

    Now, the following procedures create and manipulate Sllist's:

    Now the structure of an Sllist is important and it illustrates how data structures are typically designed. Sllist is a container object that contains administrative information about the list. In this case it contains pointers to the first and last elements in the list. In the textbook, the container object contains additional fields that keep track of the size of the list and functions that should be used for destroying and matching elements in the list.

    The Sllist_Node object represents the actual nodes in the list. The programmer should not have to know how Sllist_Node is implemented so we provide accessor functions, such as sll_first, sll_next, and sll_val, that allow programmers to access the important information in each list node. You should always use the accessor functions to access the values in a list, rather than the field names (e.g., val and next), because you do not want your code to break if the implementor of the sllist library decides to change the field names at a later date.

    So, for a few examples. When you call list = new_sllist(), you get a pointer to a Sllist struct whose head and tail fields are NULL, since the list is empty.

    list ------>|-------------|
                | head = NULL |
                | tail = NULL |
                |-------------|
    

    If we next call tmp = sllist_prepend(list, p), where p is a pointer to some data value, then a node will be added to the list. If we assume that p has the address 0x100, then the node will look as follows:

            
    tmp----------------------\
                             |
    list ------>|---------|  +----->|-------------|
                | head -----/       | next = NULL |
                | tail ----/        | val = 0x100 |
                |---------|         |-------------|
    
    Now, suppose you insert a new node after tmp with the call tmp2 = sllist_insert_after(list, tmp, p1) and that p1 has the address 0x200. Now, you get:
            
    tmp2-------------------------------------------\
    tmp----------------------\                      \
                             |                       \
    list ------>|---------|  +----->|-------------|   +--->|-------------|
                | head -----/       | next = --------/     | next = NULL |
                | tail ------       | val = 0x100 |     -->| val = 0x200 |
                |---------|  \      |-------------|    /   |-------------|
                              \-----------------------/
    
    Finally, if you call tmp = sllist_prepend(list, p2), where p2 has the address 0x300 you'll get a new node in the front of the list.
            
    tmp2------------------------------------------------------------------\
    tmp----------------------\                                             \
                             |                                              \
    list ------>|---------|  +----->|-------------|   +--->|-------------|   +--->|-------------|
                | head -----/       | next = --------/     | next = --------/     | next = NULL |
                | tail ------       | val = 0x300 |        | val = 0x100 |     -->| val = 0x200 |
                |---------|  \      |-------------|        |-------------|    /   |-------------|
                              \----------------------------------------------/
    

    There is an important thing to note about this API. That is that there are some primitives missing that we would probably like, such as sll_insert_before() to insert before a node in the list, and sll_delete_node() to delete an item from the list. As it turns out, you really can't implement these cleanly and efficiently, so they are best left out. The book does implement some of these, but their implementation is not efficient (or, in the case of deleting, it is not clean). The bottom line is that if you need these things, you would do best to use a different data structure (doubly linked lists).


    A usage example

    Look at gradefile. This is a fictitious grade file for a class that has both graduates and undergraduates. The format of the file is:
    first-name last-name U/G score
    
    The U/G says whether the person is a graduate or undergraduate. Now, suppose you want to write a program that takes a grade file on standard input, and prints out the average for graduates, and then a listing of the graduate students plus their grades and their distance to the average, and then the average for undergraduates and a similar listing of students.

    This is something that can be done fairly well with singly linked lists. We'll take each person and make a struct for that person that has the person's name and grade, and then putting that struct onto either a list for graduate students or for undergraduates.

    The code is in grader.c. What it does is create two Sllists called grad and ugrad. It appends students to these lists using sll_append.

    Each student is put into a struct, which is then entered into the list as a (void *). This is a legal thing to do, since all pointers in C (in this case (void *)'s and (Person *)'s) are the same sizel

    Once the students are all read in, we need to calculate the averages and print out the students. Since we're doing this twice, it's best to do these things in procedures. return_avg() calculates the average of a list of students, and print_list() prints out the students.

    Try out the code and see that it works. Note that the only reason that the output is sorted is that the input file is also sorted by grade. Grader does nothing to actually sort beyond separating the students into graduates and undergraduates.

    UNIX> head gradefile
    Betty Flintstone U 99.43
    Pat Anderson U 98.56
    Pat Fulmer U 96.77
    Pat Ward G 96.01
    Barney Fulmer U 94.80
    Phil Rubble G 93.64
    Wilma Rubble G 93.05
    Bill Flintstone U 92.85
    Dino Fulmer U 92.00
    Fred Ward G 90.62
    UNIX> grader < gradefile
    Undergraduates: Average = 74.19
      Betty      Flintstone   99.43   25.24
      Pat        Anderson     98.56   24.37
      Pat        Fulmer       96.77   22.58
      Barney     Fulmer       94.80   20.61
      Bill       Flintstone   92.85   18.66
      Dino       Fulmer       92.00   17.81
      ...
    
    Graduates: Average = 75.53
      Pat        Ward         96.01   20.48
      Phil       Rubble       93.64   18.11
      Wilma      Rubble       93.05   17.52
      Fred       Ward         90.62   15.09
      ...
    
    UNIX>
    

    Let's now place an additional requirement on the undergraduate and graduate lists that will require the flexibility of a list. Specifically let's say that the lists should be printed in ascending order by score. To simplify matters, we will say that if two people have the same score then it does not matter in which order they are printed out.

    The easiest way to satisfy this requirement is to keep the lists in ascending order by score. This means that each time we add a person to the list we will need to insert that person in its proper place in ascending score order. This in turn means that we will need to insert into the middle of the list. In order to insert a person we will traverse the list until we find a node whose score is greater than the person we are inserting. We will then insert the new person before this node (call it the greater node).

    When we scan the list of operations provided by sllist.h we find that there is an insert_after operation but no insert_before operation. Hence, we will need to have a pointer to the node immediately preceding the greater node. In order to have this pointer available we will need to save a pointer to both the previous node in the list and the current node. Here is the code that accomplishes this task:

    void insert_person(Person *p, Sllist *student_list) {
      Sllist_Node *prev_node = NULL;
      Sllist_Node *current_node;
      Person *current_person;
    
      if (sll_empty(student_list)) {
        sll_append(student_list, p);
        return;
      }
      for (current_node = sll_first(student_list);
           current_node != NULL;
           current_node = sll_next(current_node)) {
        current_person = (Person *)sll_val(current_node);
        if (p->score < current_person->score)
          break;
        else
          prev_node = current_node;
      }
      if (prev_node == NULL) 
        sll_prepend(student_list, p);
      else
        sll_insert_after(student_list, prev_node, p);
    }
    

    Notice that we start by initializing prev_node to NULL. If the new person should be first in the list, then the score comparison operation will return true when the new person is compared with the first node in the list. prev_node will be NULL, so in this case we will call sll_prepend to prepend the new person to the front of the list.

    We also start by checking whether the list is empty. If so, we use sll_append to make the person be the first node in the list (we could just as easily have used sll_prepend).

    In the most common case the new person will go somewhere in the middle of the list. In this case the current_node pointer will be incremented several times to point at the next node in the list. Before the pointer is incremented, prev_node is set to the current node so that we always have a pointer to the previous node. When the code finally finds a node whose score is greater than the score of the new person, it will be able to use the previous node pointer to perform the sll_insert_after operation.

    So what could go wrong with this code? Suppose that the new person's score name is greater than any other node in the list. For example suppose we want to insert 100 into a list whose last score is 97.

    The traversal of the list will come to an end after current_node visits the node with a score of 97 and the loop will exit without the score comparison ever succeeding. At this point prev_node will point to the node whose score is 97. That is exactly what we want, since the person with a score of 100 should be inserted after the person with a score of 97. In other words our code works even when the person should be inserted at the end of the list.

    The code for our sort program can be found in sorter.c. Here is the result of running it on gradefile:

    UNIX> sorter < gradefile
    Undergraduates: Average = 74.19
      Betty      Anderson     51.65  -22.54
      Dino       Anderson     52.05  -22.14
      Pat        Flintstone   52.85  -21.34
      Fred       Fulmer       53.70  -20.49
      Barney     Flintstone   54.12  -20.07
      Fred       Flintstone   54.57  -19.62
      Dino       Flintstone   58.92  -15.27
      John       Anderson     62.49  -11.70
      ...
      Barney     Fulmer       94.80   20.61
      Pat        Fulmer       96.77   22.58
      Pat        Anderson     98.56   24.37
      Betty      Flintstone   99.43   25.24
    
    Graduates: Average = 75.53
      Pat        Rubble       51.04  -24.49
      John       Ward         56.09  -19.44
      Phil       Summitt      58.68  -16.85
      John       Summitt      60.22  -15.31
      Fred       Rubble       61.32  -14.21
      ...
      Fred       Ward         90.62   15.09
      Wilma      Rubble       93.05   17.52
      Phil       Rubble       93.64   18.11
      Pat        Ward         96.01   20.48
    

    The implementation

    The implementation is in sllist.c. I'll go over each subroutine. The procedures could be a bit simpler to read and understand if we did not have to worry about maintaining a head and tail pointer. When we discuss doubly linked lists, we will use a technique involving something called a sentinel node to eliminate the need for this extra code.

    For the current implementation, new_sllist() merely creates and returns an empty list:

    list ------>|-------------|
                | head = NULL |
                | tail = NULL |
                |-------------|
    
    Here's the code:
    Sllist *new_sllist()
    {
      Sllist *l;
      
      l = (Sllist *) malloc(sizeof(Sllist));
      l->head = NULL;
      l->tail = NULL;
      return l;
    }
    
    The above diagram should suggest a way to check whether a list is empty--it is empty only if head is NULL. Therefore, sll_empty() is one line:
    int sll_empty(Sllist *l)
    {
      return (l->head == NULL);
    }
    

    List Insertion Routines

    To insert a node after another node n, the code is the same, regardless of whether the node is the first node, a middle node, or the last node. You simply create a new node, have the new node's next point to n's next, and have n's next point to the new node. Note, it must be done in that order, or the rest of the list after the node will get lost! We must also check whether or not n used to be the last node in the list. If so, then the new node becomes the last node in the list and we must update the list's tail pointer:

    Sllist_Node *sll_insert_after(Sllist *l, Sllist_Node *node, void *val)
    {
      Sllist_Node *tmp;
    
      tmp = (Sllist_Node *) malloc(sizeof(Sllist_Node));
      tmp->val = val;
      tmp->next = node->next;
      node->next = tmp;
    
      /* if node was the previous tail of the list, then the new element becomes
         the new tail of the list */
      if (l->tail == node)
        l->tail = tmp;
      return tmp;
    }
    

    Note that sll_insert_after does not give us a way to add a node to the beginning of the list. We could do something kludgy like the textbook and say that if NULL is passed as the node to sll_insert_after, then the value will be prepended to the front of the list. In practice it is better to define another function that prepends to the front of the list, because people often think in terms of prepending or appending to a list and so would like an operation that does exactly that. Additionally, prepending to the list is going to require some special manipulation of the list's head pointer, so it is better to isolate this special case in a separate function. When we prepend an item to the list, we must make the list's head pointer point to the new node and make the new node point to the node to which the list's head used to point, since this node will now be the second node in the list. We must also be prepared for the special case in which the list was previously empty. In this case, the list's tail pointer must also be updated to point to the new node, since the new node will be both the first and last node in the list:

    Sllist_Node *sll_prepend(Sllist *l, void *val)
    {
      Sllist_Node *tmp;
    
      tmp = (Sllist_Node *) malloc(sizeof(Sllist_Node));
      tmp->val = val;
      tmp->next = l->head;
    
      // make the new element be the head of the list
      l->head = tmp;
    
      // if the list was empty, make the tail point to the new element
      if (l->tail == NULL)
        l->tail = tmp;
      return tmp;
    }
    
    Unlike sll_prepend, sll_append is not strictly necessary because we could use sll_last to retrieve the last node in the list and then call sll_insert_after to insert the new value after this node. However, that is a bit awkward, especially for an operation like append that will be frequently performed. sll_append has an implementation that is very similar to sll_prepend, except that we update the list's tail pointer to point to the new node and make the node that used to be the tail point to the new node. We must also handle the special case where the list was previously empty. In this case the new node will be both the first and last object in the list, and hence we must make the list's head pointer point to it as well:
    Sllist_Node *sll_append(Sllist *l, void *val)
    {
      Sllist_Node *tmp;
    
      tmp = (Sllist_Node *) malloc(sizeof(Sllist_Node));
      tmp->val = val;
      tmp->next = NULL;
      
      // make the previous tail point to this new element and then update 
      // the list's tail to point to the new element
      if (l->tail != NULL) 
        l->tail->next = tmp;
      l->tail = tmp;
      
      // if the list was empty, made the head point to the new element
      if (l->head == NULL)
        l->head = tmp;
      return tmp;
    }
    

    Accessor Routines

    The accessor functions can all be implemented with simple one line functions.

    The first node on the list is the one pointed to by the list's head pointer:

    Sllist_Node *sll_first(Sllist *l)
    {
      return l->head;
    }
    
    Similarly, the last node on the list is the one pointed to by the list's tail pointer:
    Sllist_Node *sll_last(Sllist *l)
    {
      return l->tail;
    }
    
    The next node following the current node is the one pointed to by its next field:
    Sllist_Node *sll_next(Sllist_Node *n)
    {
      return n->next;
    }
    
    Notice that sll_next takes a node as a parameter while sll_first and sll_last take a list as a parameter. That is because we can determine the first and last nodes by consulting the list's head and tail pointers, while we can only determine the next node in the list by consulting the current node.

    If we want the value of the current node, then we return the val field:

    void *sll_val(Sllist_Node *n) 
    {
      return n->val;
    }
    

    Freeing the List

    When we are done using a list, we should free it. The text book allows the programmer to provide a destroy function that will be called on each element of the list when the list is destroyed. This is a bit too complicated for us at this point. We will simply iterate through the nodes of the list and free the memory associated with each node. Then we will free the memory associated with the list's container object. The user will need to separately find a way to free the memory associated with the objects pointed to by the val field of each node:

    void free_sllist(Sllist *l)
    {
      Sllist_Node *current_node, *next_node;
     
      for (current_node = l->head;
           current_node != NULL;
           current_node = next_node) {
        next_node = current_node->next;
        free(current_node);
      }
      free(l);
    }
    
    The code does the following. It iterates through the list and as it reaches each node, it saves a pointer to the next node in the list. Then it frees the memory associated with the current node. If we did not save a pointer to the next node before freeing the current node, then the pointer to the next node would be lost when we freed the current node and we could not continue to iterate through the list.

    Why are deletion and inserting before another node bad?

    Deletion and inserting before another node are both bad because both operations require that you have a pointer to the previous node on the list. Why? Because deletion requires that you make the previous node in the list point to the node that our deleted node used to point to. And inserting before a node requires that we have a pointer to the previous node. Unfortunately, we only have pointers to the next node in a singly linked list. If we want to obtain a pointer to the previous node, we have to start from the front of the list, and traverse each element until we reach the node we wish to delete or insert before. Along the way we must take care to always save a pointer to the previous node. This traversal and remembering process is cumbersome. The text book actually pushes the burden for this traversal onto the programmer by having its deletion procedure take a parameter to the node preceding the node you wish to delete. In other words, you must implement the code that finds the previous node. Once you have done so, you can call the book's deletion procedure.

    The bottom line is this. If you want to do deletions or be able to insert values before nodes in a list, then you should use a different data structure: a doubly linked list. As a bonus, you will find that it is much easier to implement the append operation, at least when you use the sentinel node technique that we discuss in the dllist notes.