One of the more interesting qualities of an object is that an object can contain a reference to another object of the same type. There is a common data structure, the linked list, that takes advantage of this feature.
Lists are made up of nodes, where each node contains a pointer or reference to the next node in the list. In addition, each node usually contains a unit of data called the cargo. In our first example, the cargo will be a single integer, but later (Section 18.12) we will write a generic list that can contain objects of any type.
As usual when we write a new class, we'll start with the instance variables, one or two constructors and toString so that we can test the basic mechanism of creating and displaying the new type.
class Node {The declarations of the instance variables follow naturally from the specification, and the rest follows mechanically from the instance variables.
To test the implementation so far, we would put something like this in main:
Node* node = new Node (1, NULL);The result is simply
1To make it interesting, we need a list with more than one node!
Node*
node1 = new Node (1, NULL);
Node* node2 = new
Node (2, NULL);
Node* node3 = new
Node (3, NULL);
This code creates three nodes, but we don't have a list yet because the nodes are not linked. The state diagram looks like this:
To link up the nodes, we have to make the first node refer to the second and the second node refer to the third.
node1->next = node2;The reference of the third node is NULL, which indicates that it is the end of the list. Now the state diagram looks like this:
Now we know how to create nodes and link them into lists. What might be less clear at this point is why.
The thing that makes lists useful is that they are a way of assembling multiple objects into a single entity, sometimes called a collection. In the example, the first node of the list serves as a reference to the entire list.
If we want to pass the list as a parameter, all we have to pass is a reference to the first node. For example, the method printList takes a single node as an argument. Starting with the head of the list, it prints each node until it gets to the end (indicated by the NULL reference).
void printList (Node* list) {To invoke this method we just have to pass a reference to the first node:
printList (node1);Inside printList we have a reference to the first node of the list, but there is no variable that refers to the other nodes. We have to use the next value from each node to get to the next node.
This diagram shows the value of list and the values that node takes on:
This way of moving through a list is called a traversal, just like the similar pattern of moving through the elements of an vector. It is common to use a loop variable like node to refer to each of the nodes in the list in succession.
The output of this method is
123By convention, lists are printed in parentheses with commas between the elements, as in (1, 2, 3). As an exercise, modify printList so that it generates output in this format.
As another exercise, rewrite printList using a for loop instead of a while loop.
Recursion and lists go together like fava beans and a nice Chianti. For example, here is a recursive algorithm for printing a list backwards:
Of course, Step 2, the recursive call, assumes that we have a way of printing a list backwards. But if we assume that the recursive call works—the leap of faith—then we can convince ourselves that this algorithm works.
All we need is a base case, and a way of proving that for any list we will eventually get to the base case. A natural choice for the base case is a list with a single element, but an even better choice is the empty list, represented by NULL.
void printBackward (Node* list) {The first line handles the base case by doing nothing. The next two lines split the list into head and tail. The last two lines print the list.
We invoke this method exactly as we invoked printList:
printBackward (node1);The result is a backwards list: 321.
Can we prove that this method will always terminate? In other words, will it always reach the base case? In fact, the answer is no. There are some lists that will make this method crash.
There is nothing to prevent a node from referring back to an earlier node in the list, including itself. For example, this figure shows a list with two nodes, one of which refers to itself:
If we invoke printList on this list, it will loop forever. If we invoke printBackward it will recurse infinitely. This sort of behavior makes infinite lists difficult to work with.
Nevertheless, they are occasionally useful. For example, we might represent a number as a list of digits and use an infinite list to represent a repeating fraction (e.g., 1/6 = 0.1666…).
Regardless, it is problematic that we cannot prove that printList and printBackward terminate. The best we can do is the hypothetical statement, "If the list contains no loops, then these methods will terminate." This sort of claim is called a precondition. It imposes a constraint on one of the parameters and describes the behavior of the method if the constraint is satisfied. We will see more examples soon.
There is a part of printBackward that might have raised an eyebrow:
Node* head = list;After the first assignment, head and list have the same type and the same value. So why did I create a new variable?
The reason is that the two variables play different roles. We think of head as a reference to a single node, and we think of list as a reference to the first node of a list. These "roles" are not part of the program; they are in the mind of the programmer.
The second assignment creates a new reference to the second node in the list, but in this case we think of it as a list. So, even though head and tail have the same type, they play different roles.
This ambiguity is useful, but it can make programs with lists difficult to read. I often use variable names like node and list to document how I intend to use a variable, and sometimes I create additional variables to disambiguate.
I could have written printBackward without head and tail, but I think it makes it harder to understand:
void printBackward (Node* list) {Looking at the two function calls, we have to remember that printBackward treats its argument as a list and cout << list->cargo treats it as a single object.
Always keep in mind the fundamental ambiguity theorem:
A variable that refers to a node might treat the node as a single object or as the first in a list of nodes.
You might have wondered why printList and printBackward are nonmember functions. I have made the claim that anything that can be done with nonmember functions (class methods) can also be done with member functions (object methods); it's just a question of which form is cleaner.
In this case there is a legitimate reason to choose nonmember functions. It is legal to send NULL as an argument to a class method, but it is not legal to invoke an object method on a null object.
Node* node = NULL;This limitation makes it awkward to write list-manipulating code in a clean, object-oriented style. A little later we will see a way to get around this, though.
Obviously one way to modify a list is to change the cargo of one on the nodes, but the more interesting operations are the ones that add, remove, or reorder the nodes.
As an example, we'll write a method that removes the second node in the list and returns a reference to the removed node.
Node* removeSecond (Node* list) {Again, I am using temporary variables to make the code more readable. Here is how to use this method.
printList (node1);The output is
(1, 2, 3) the original listHere is a state diagram showing the effect of this operation:
What happens if we invoke this method and pass a list with only one element (a singleton)? What happens if we pass the empty list as an argument? Is there a precondition for this method?
For some list operations it is useful to divide the labor into two functions. For example, to print a list backwards in the conventional list format, (3, 2, 1) we can use the printBackwards function to print "3, 2, " but we need a separate function to print the parentheses and the first node. We'll call it printBackwardNicely.
void printBackwardNicely (Node* list) {Again, it is a good idea to check functions like this to see if they work with special cases like an empty list or a singleton.
Elsewhere in the program, when we use this function, we will invoke printBackwardNicely directly and it will invoke printBackward on our behalf. In that sense, printBackwardNicely acts as a wrapper, and it uses printBackward as a helper.
There are a number of subtle problems with the way we have been implementing lists. In a reversal of cause and effect, I will propose an alternative implementation first and then explain what problems it solves.
First, we will create a new class called LinkedList. Its instance variables are an integer that contains the length of the list and a reference to the first node in the list. LinkedList objects serve as handles for manipulating lists of Node objects.
class LinkedList {One nice thing about the LinkedList class is that it gives us a natural place to put wrapper functions like printBackwardNicely, which we can make a public member function in the LinkedList class.
void printBackward () {Just to make things confusing, I renamed printBackwardNicely. Now there are two functions named printBackward: one in the Node class (the helper) and one in the LinkedList class (the wrapper). In order for the wrapper to invoke the helper, it has to select the member function of Nodes (tail->printBackward).
So, one of the benefits of the LinkedList class is that it provides a nice place to put wrapper functions. Another is that it makes it easier to add or remove the first element of a list. For example, push_front is a member function for LinkedLists; it takes an int as an argument and puts it at the beginning of the list.
void push_front (int i) {As always, to check code like this it is a good idea to think about the special cases. For example, what happens if the list is initially empty?
Some lists are "well-formed"; others are not. For example, if a list contains a loop, it will cause many of our functions to crash, so we might want to require that lists contain no loops. Another requirement is that the length value in the LinkedList object should be equal to the actual number of nodes in the list.
Requirements like this are called invariants because, ideally, they should be true of every object all the time. Specifying invariants for objects is a useful programming practice because it makes it easier to prove the correctness of code, check the integrity of data structures, and detect errors.
One thing that is sometimes confusing about invariants is that there are some times when they are violated. For example, in the middle of addFirst, after we have added the node, but before we have incremented length, the invariant is violated. This kind of violation is acceptable; in fact, it is often impossible to modify an object without violating an invariant for at least a little while. Normally the requirement is that every function that violates an invariant must restore the invariant.
If there is any significant stretch of code in which the invariant is violated, it is important for the comments to make that clear, so that no operations are performed that depend on the invariant.
The LinkedList class implements linked lists of integers, carried in the cargo field of Nodes. It would be more useful to be able to have linked lists of any type of data, and this is easy to do by making the LinkedList class generic. In this case all you have to do is (1) put template <class ITEMTYPE> in front of the declarations of Node and LinkedList, (2) replace int (when it refers to cargo) with ITEMTYPE, and (3) replace Node with Node<ITEMTYPE> and LinkedList with LinkedList<ITEMTYPE>. Do this, and chack it with linked lists of strings and doubles.
Revised 2008-12-11.