An important element to good object-oriented programming is good object-oriented design. In other words, we want well-designed objects that can be re-used in different applications. This in turn means that we need to design a good interface for an object that provides the operations that a programmer wants. Since this course is about data structures, we will be using objects to implement data structures. Therefore throughout the course we will illustrate object-oriented design by showing how it can be applied to data structures.
In these notes we are going to show how object-oriented design can be applied to the implementation of a doubly linked list data structure. We will also cover example uses of friend declarations and forward declarations.
You can use whatever naming conventions you like in this course. These are the conventions I will try to adhere to in this course (if I am using notes that someone else has written however I will use their variable names):
const int BOILING_POINT_OF_WATER = 212;
You should recall from CS140 that a doubly linked list is a list in which each node has two pointers: 1) a flink pointer to the next node in the list, and 2) a blink pointer to the previous node in the list. The advantage of a doubly linked list (hereafter called a dlist) is that one can traverse it in either direction. This property simplifies common operations such as insertion and deletion, as well as allowing for situations where a list must be traversed in reverse.
Unless memory is at an absolute premium, a doubly linked list is preferred to a singly linked list. The reason is that the doubly linked list has a simpler implementation, which leads to greater efficiency, and greater flexibility, which means that the data structure may not have to be changed if a new operation has to be added to the application. Of course each node in a doubly linked list has one more pointer than each node in a singly linked list and thus a doubly linked list requires more storage. However, this storage overhead tends to be minimal. Since most computers have millions of megabytes of memory, this small additional overhead typically does not present problems.
The implementation of a dlist is further facilitated by the use of a sentinel node. A sentinel node is a header node that does not store a value. It points both to the first element of the list and the last element of the list. In other words, its flink points to the first element of the list and its blink points to the last element of the list. The sentinel node is fully integrated into the dlist by making the first element point back to the sentinel node (i.e., its blink points to the sentinel node) and by making the last element point to the sentinel node (i.e., its flink points to the sentinel node). A sentinel node simplifies the insertion and deletion procedures for a list because the programmer no longer has to worry about the special cases of inserting/deleting a node at the head or tail of the list. A sentinel node also makes it easy to determine when a traversal has reached the end of the list--it has done so when it reaches the sentinel node. Note that either a forward or reverse traversal can use this check to determine if it's reached the end of the list.
The C design of a dlist typically consists of one struct, which represents the node in a dlist. As we have previously seen, our C definition of a dlist is as follows:
typedef struct dllist {
struct dlNode *flink;
struct dlNode *blink;
char *value;
} *Dllist;
|
In C++ we want to treat a dlist as an object. That means that we will view a dlist as an object that stores data as a list, that allows the list to be traversed in either direction, and that is manipulated using a set of operations provided by the dlist's interface. We will hide the dList's implementation so that we can modify it without affecting the programs that use it. In particular, we will not let the programs that use it have access to its internal representation, such as dlNodes. Therefore we will declare a dlNode as follows (for now, ignore the template syntax with the brackets -- we'll define this later in the notes):
template <class Object>
class dList;
template <class Object>
class dlNode {
friend class dList<Object>;
protected:
dlNode() {};
dlNode(Object val);
~dlNode() {};
Object value;
dlNode *flink;
dlNode *blink;
};
|
class dlNode {
protected:
dList <int> *mylist;
};
A forward declaration does not allow us to use the class
as a static object in another class. For example, the following
declaration is illegal:
class dlNode {
protected:
dList <int> mylist;
};
The reason it is illegal is that since dList has not yet been
fully declared, the compiler does not know how much storage to
allocate for it. In contrast, it is ok to declare a pointer to
a dList because a pointer has a standard fixed size.
You may wonder why we do not simply declare the dList class before the dlNode class. The reason is that the dList class uses the dlNode class so we're faced with a circular definition. If we place the dList class before the dlNode class, then the compiler will complain because we have not declared the dlNode class. Consequently a forward declaration allows us to avoid circular definition problems.
Now that we have the declaration for a dlNode, we can turn our attention to defining the dList's interface. In order to make the dList as universally usable as possible, we want to define a set of essential, primitive operations that programmers can then use to assemble more complex operations. In other words, rather than trying to imagine every conceivable use for a dList and placing an operation in the dList's interface that supports that use, we try to envision a set of basic building block operations that can be used to create these more complicated operations.
We know that traversing a dList is essential, so a next and a prev operation seem essential. It is also necessary to know when we have reached the end of the list so an endOfList operation seems useful. We also want to be able to start a list traversal at either the front or end of the list, so a first operation that positions us at the start of the list and a last operation that positions us at the end of the list seems essential.
Before we go further you might be wondering about what has happened to the C pointer that we used to traverse a list. The answer is that we internally use a pointer to maintain our position in the list. This pointer is commonly called a cursor. The cursor points to the current element in the list that we are traversing. The name probably derives from wysiwyg editors in which a cursor is used to show your position on a line of text. Of course a cursor in a wsysiwyg editor is positioned between characters whereas a cursor in a list points to an object. However, hopefully you see the similarity.
In any event, the user of a dList does not use pointers. Rather the user manipulates the dList using the operations that are provided. For example, the following code fragment prints out each element of the dList named mylist:
for (mylist.first(); !mylist.endOfList(); mylist.next())
printf("%s\n", mylist.get().c_str());
|
Ok, back to the design of the interface. We have come up with a set of essential operations for traversing the list. However we also need a set of operations for inserting elements into the list. A programmer will probably want to be able to insert at the beginning of the list, the end of the list, and before or after the cursor. We therefore need four more operations:
If we can insert elements into the list we also have to be able to delete them, so we define a deleteNode command that deletes the element pointed to by the cursor. We could also implement deleteFirst and deleteLast operations but we won't because the programmer can implement them and because they are not as ubiquitous as append and prepend. You may note that there is a bit of judgement that goes into deciding what operations are and are not provided in an interface.
Finally we need the constructor and destructor for the list. It also helps to have an operation that tells us if the list is empty. We will call this operation isEmpty. This gives us the following class declaration for a dList:
template <class Object>
class dList {
public:
dList(); // initializes the dList
dList(Object default_value); // initializes the dList
~dList(); // destroys the dList's nodes
void insertBeforeCursor(Object value); // inserts the value before the node
// pointed to by the list cursor
void insertAfterCursor(Object value); // inserts the value after the node
// pointed to by the list cursor
void append(Object value); // appends the value to the end of the list
void prepend(Object value); // appends the value to the front of the list
void next(); // moves the list cursor to the next
// element in the list
void prev(); // moves the list cursor to the previous
// element in the list
bool endOfList(); // true if the list cursor points to the
// sentinel node; false otherwise
// If the list is empty, the following two operations make the
// list cursor point to the list's sentinel node
void first(); // resets the list cursor to the first
// element in the list.
void last(); // resets the list cursor to the last
// element in the list
Object get(); // returns the value of the node pointed to
// by the list cursor
void set(Object value); // sets the the current node to the indicated value
void deleteNode(); // deletes the node pointed to by the list cursor
// and advances the list cursor to the next
// list element
bool isEmpty(); // returns whether the list is empty
// set the value in the sentinel node
void setSentinelValue(Object value) {
sentinel_node->value = value;
protected:
dlNode <Object> *cursor; // the list cursor
dlNode <Object> *sentinel_node; // a pointer to the sentinel node
};
|
The implementation of the dList is pretty straightforward and mimics the C code that you've probably seen in CS140. For example, to initialize the list we need to allocate a sentinel node, make the sentinel node point to itself, and make the cursor point to the sentinel node. The code that does this is as follows:
template <class Object>
dList::dList() {
sentinel_node = new dlNode();
sentinel_node->flink = sentinel_node;
sentinel_node->blink = sentinel_node;
cursor = sentinel_node;
}
|
As another example, the next operation moves the cursor to the next element in the list. It can be implemented as follows:
template <class Object>
void dList <Object>::next() {
cursor = cursor->flink;
}
|
You will note that in order to make his dlist library as general as possible Dr. Plank defines a Jval union and uses Jval's to store values in his dlist implementation. The reason he has to do this is that when you write a dlist implementation, the nodes have to store values of a specific type. The problem is that you often want a dlist to store values of different types, for example, ints, strings, or golf course records. One way around this problem is to define dlist libraries for each specific type. However, in that case you will have multiple copies of code that looks almost identical. The only difference is that the types of the value field will change in each implementation. The drawback of this approach is that if change the dlist implementation, you need to remember to change every dlist library you've created. This can be quite a code maintanance problem in the real world.
Dr. Plank has overcome this problem by defining a union which can hold multiple types of values. The union is ingenious from an implementation standpoint because he only needs to write the code once and the dlist can store many different types of values. However, as you may have noticed, the implementation complicates the user's life. Some of the complications include:
In C that is the trade-off you're forced to live with if you want to use a generic library. The designers of C++ tried to eliminate this tradeoff by introducing templates. Templates allow the programmer to write type-independent classes and functions. In other words, the programmer writes the code once and it works with different types. For example, I can write the code for a dlist once and it will work with any type of element.
Templates are easy to use. To declare a variable to have a template class type, you need to provide the name of the class and any type parameters that the class expects. For example:
#include "Dllist.h"
dList<int> a; // A dlist of ints
dList<PayrollRec *> b; // A dlist of pointers to payroll records
HashTable<int, string> x; // A hash table in which the key is
// an integer and the value is a string
|
Once the variable is declared, you can forget that the variable is a template class and simply use its methods as normal. For example:
a.insertBeforeCursor(3); // inserts 3 before the cursor
PayrollRec *p = new PayrollRec("Popeye the Sailor Man", 20000.00);
b.insertBeforeCursor(p); // inserts the pointer to the new
// payroll record before the cursor
string y = x.find(4); // return the value associated with 4
|
Unfortunately, the syntax for writing templates is incredibly complex and is beyond the scope of this class. In the real world you will probably have to use template classes but it is doubtful you will have to write one. The reason you probably won't have to write one is that most C++ implementations provide a standard template library (STL) of common data structures. You can almost always use these data structures rather than writing your own. We may or may not use the standard template library in this class--we'll see how things go.
Although the syntax for writing templates is complex, conceptually writing a template class is easy. The code in the methods looks similar to the code in a non-template class since the code can be used with any type. The declarations of variables is the only thing that changes. Since the template class is supposed to be used with many different types, you cannot declare some of the variables (e.g., the value field in a dlist node) to be a specific type. Instead you declare them to be a parameter type. For example, the dlists in the above examples took one parameter type, the type of the value. The declaration of a template class looks a bit like the declaration of a function. The declaration includes the parameter types for that class. For example, the declaration of the dlNode class might look as follows:
template<class Value>
class dlNode {
...
protected:
Value val;
dlNode *next;
dlNode *prev;
...
}:
|
Notice that the val field is declared to be the parameter type Value, rather than being declared to be a specific type. When you declare a dlist, the parameter type that you provide is used to fill in the Value field and then the code is compiled by the C++ compiler. For each different type of dlist that you use, the C++ compiler compiles a different piece of code with the Value field replaced by the appropriate type. In this way you only have to write the code once and the C++ compiler ensures that multiple versions of the code are created as needed. Unfortunately, as has already benn pointed out, writing templates is pretty complicated and the resulting code can be hard to understand because the syntax is so cumbersome. Thus when deciding whether or not to write a template, you have to decide whether the pain of writing one is worth the effort. In this class, dlist and rbtree templates are provided for you, so there is no pain. Note that the benefit of templates is that the user doesn't have to use convoluted syntax, as is required by the generic C libraries.