Object-Oriented Design


Lecture Overview

An important element to good object-oriented programming is good object-oriented design. In other words, we want well-designed objects that can be re-used in different applications. This in turn means that we need to design a good interface for an object that provides the operations that a programmer wants. Since this course is about data structures, we will be using objects to implement data structures. Therefore throughout the course we will illustrate object-oriented design by showing how it can be applied to data structures.

In these notes we are going to show how object-oriented design can be applied to the implementation of a doubly linked list data structure. We will also cover example uses of friend declarations and forward declarations.


Naming Conventions

You can use whatever naming conventions you like in this course. These are the conventions I will try to adhere to in this course (if I am using notes that someone else has written however I will use their variable names):

  1. Class, Variable and Function Names: It seems that many job shops have adopted the convention of running the words in a variable name together without using underscores ('_'). The first letter of the first word is not capitalized but the first letter of every other word is capitalized. Hence you will see class names like dList (for doubly linked List) and rbTree (for red black Tree). One exception is the fields class which will be declared as Fields. The reason for the exception is that the 'fields' name is already taken by the fields struct and the Fields class uses the fields struct. Similarly you will see variable names like objToBeCopied or golfCourseInfo.

  2. Constants: All the letters in the names of constants should be capitalized and you should use underscores ('_') to distinguish between words. For example:
         const int BOILING_POINT_OF_WATER = 212;
         


Doubly Linked List Overview

You should recall from CS140 that a doubly linked list is a list in which each node has two pointers: 1) a flink pointer to the next node in the list, and 2) a blink pointer to the previous node in the list. The advantage of a doubly linked list (hereafter called a dlist) is that one can traverse it in either direction. This property simplifies common operations such as insertion and deletion, as well as allowing for situations where a list must be traversed in reverse.

Unless memory is at an absolute premium, a doubly linked list is preferred to a singly linked list. The reason is that the doubly linked list has a simpler implementation, which leads to greater efficiency, and greater flexibility, which means that the data structure may not have to be changed if a new operation has to be added to the application. Of course each node in a doubly linked list has one more pointer than each node in a singly linked list and thus a doubly linked list requires more storage. However, this storage overhead tends to be minimal. Since most computers have millions of megabytes of memory, this small additional overhead typically does not present problems.

The implementation of a dlist is further facilitated by the use of a sentinel node. A sentinel node is a header node that does not store a value. It points both to the first element of the list and the last element of the list. In other words, its flink points to the first element of the list and its blink points to the last element of the list. The sentinel node is fully integrated into the dlist by making the first element point back to the sentinel node (i.e., its blink points to the sentinel node) and by making the last element point to the sentinel node (i.e., its flink points to the sentinel node). A sentinel node simplifies the insertion and deletion procedures for a list because the programmer no longer has to worry about the special cases of inserting/deleting a node at the head or tail of the list. A sentinel node also makes it easy to determine when a traversal has reached the end of the list--it has done so when it reaches the sentinel node. Note that either a forward or reverse traversal can use this check to determine if it's reached the end of the list.


C Design of A Dlist

The C design of a dlist typically consists of one struct, which represents the node in a dlist. As we have previously seen, our C definition of a dlist is as follows:
typedef struct dllist {
  struct dlNode *flink;
  struct dlNode *blink;
  char *value;
} *Dllist;

The program would then store a pointer to the dlist's sentinel node. The list would typically be manipulated by traversing the flink and blink pointers.


C++ Design of A Dlist

In C++ we want to treat a dlist as an object. That means that we will view a dlist as an object that stores data as a list, that allows the list to be traversed in either direction, and that is manipulated using a set of operations provided by the dlist's interface. We will hide the dList's implementation so that we can modify it without affecting the programs that use it. In particular, we will not let the programs that use it have access to its internal representation, such as dlNodes. Therefore we will declare a dlNode as follows (for now, ignore the template syntax with the brackets -- we'll define this later in the notes):
template <class Object>
class dList;

template <class Object>
class dlNode {
  friend class dList<Object>;

  protected:
    dlNode() {};
    dlNode(Object val);
    ~dlNode() {};
    Object value;
    dlNode *flink;
    dlNode *blink;
};
Several points should be made about this declaration

  1. We are treating a dlNode as a "data warehouse". In other words, it is functioning just like a struct in C. In particular, it does not provide any methods for manipulating its data, except for initialization. Its primary purpose is to store data. If we omitted the constructor and destructor definitions, C++ would provide the same definitions by default. Often these definitions are omitted when using a class as a data warehouse.

  2. We have made all of dlNode's members protected. Therefore the outside world does not have access to any part of a dlNode.

  3. We have made a dList be a friend of a dlNode. Therefore a dList has complete access to dlNode's members. Remember that friend is not reciprocal, so dlNode does not have access to dList's protected members.

  4. We have used a forward declaration for dList. A forward declaration consists of the keyword class followed by the name of the class. A forward declaration is a promise to the C++ compiler that we will define the class at a later time. Forward declarations are used when we need to reference a class before it has been defined. In this case we need to declare a dList as a friend to dlNode. Sometimes we may have to declare pointers to a class before it has been defined. For example, we could have declared a pointer to a dList in the dlNode class:
         class dlNode {
           protected:
             dList <int> *mylist;
         };
         
    A forward declaration does not allow us to use the class as a static object in another class. For example, the following declaration is illegal:
         class dlNode {
           protected:
             dList <int> mylist;
         };
         
    The reason it is illegal is that since dList has not yet been fully declared, the compiler does not know how much storage to allocate for it. In contrast, it is ok to declare a pointer to a dList because a pointer has a standard fixed size.

    You may wonder why we do not simply declare the dList class before the dlNode class. The reason is that the dList class uses the dlNode class so we're faced with a circular definition. If we place the dList class before the dlNode class, then the compiler will complain because we have not declared the dlNode class. Consequently a forward declaration allows us to avoid circular definition problems.

Now that we have the declaration for a dlNode, we can turn our attention to defining the dList's interface. In order to make the dList as universally usable as possible, we want to define a set of essential, primitive operations that programmers can then use to assemble more complex operations. In other words, rather than trying to imagine every conceivable use for a dList and placing an operation in the dList's interface that supports that use, we try to envision a set of basic building block operations that can be used to create these more complicated operations.

We know that traversing a dList is essential, so a next and a prev operation seem essential. It is also necessary to know when we have reached the end of the list so an endOfList operation seems useful. We also want to be able to start a list traversal at either the front or end of the list, so a first operation that positions us at the start of the list and a last operation that positions us at the end of the list seems essential.

Before we go further you might be wondering about what has happened to the C pointer that we used to traverse a list. The answer is that we internally use a pointer to maintain our position in the list. This pointer is commonly called a cursor. The cursor points to the current element in the list that we are traversing. The name probably derives from wysiwyg editors in which a cursor is used to show your position on a line of text. Of course a cursor in a wsysiwyg editor is positioned between characters whereas a cursor in a list points to an object. However, hopefully you see the similarity.

In any event, the user of a dList does not use pointers. Rather the user manipulates the dList using the operations that are provided. For example, the following code fragment prints out each element of the dList named mylist:
for (mylist.first(); !mylist.endOfList(); mylist.next())
  printf("%s\n", mylist.get().c_str());
Notice that instead of using pointers, we use the dList's operations to traverse the list. The get method is another essential operation that returns the value of the list element pointed at by the cursor.

Ok, back to the design of the interface. We have come up with a set of essential operations for traversing the list. However we also need a set of operations for inserting elements into the list. A programmer will probably want to be able to insert at the beginning of the list, the end of the list, and before or after the cursor. We therefore need four more operations:

  1. prepend: inserts an element at the start of the list

  2. append: inserts an element at the end of the list

  3. insertBeforeCursor: inserts an element before the cursor

  4. insertAfterCursor: inserts an element after the cursor
Note that prepend and append are not primitive operations. The prepend operation can be implemented by a programmer using the first and insertBeforeCursor operations and the append operation can be implemented by using the last and insertAfterCursor operations. However, we choose to provide these operations because they are so ubiquitous.

If we can insert elements into the list we also have to be able to delete them, so we define a deleteNode command that deletes the element pointed to by the cursor. We could also implement deleteFirst and deleteLast operations but we won't because the programmer can implement them and because they are not as ubiquitous as append and prepend. You may note that there is a bit of judgement that goes into deciding what operations are and are not provided in an interface.

Finally we need the constructor and destructor for the list. It also helps to have an operation that tells us if the list is empty. We will call this operation isEmpty. This gives us the following class declaration for a dList:
template <class Object>
class dList {
  public:
    dList();                      // initializes the dList
    dList(Object default_value);  // initializes the dList
    ~dList();                     // destroys the dList's nodes

    void insertBeforeCursor(Object value); // inserts the value before the node
                                           // pointed to by the list cursor
    void insertAfterCursor(Object value); // inserts the value after the node
                                          // pointed to by the list cursor
    void append(Object value);   // appends the value to the end of the list
    void prepend(Object value);  // appends the value to the front of the list

    void next();               // moves the list cursor to the next 
                               // element in the list
    void prev();	       // moves the list cursor to the previous
                               // element in the list
    bool endOfList();          // true if the list cursor points to the 
                               // sentinel node; false otherwise

    // If the list is empty, the following two operations make the
    // list cursor point to the list's sentinel node
    void first();             // resets the list cursor to the first
                              // element in the list. 
    void last();              // resets the list cursor to the last
                              // element in the list

    Object get();               // returns the value of the node pointed to
                              // by the list cursor
 
    void set(Object value);  // sets the the current node to the indicated value

    void deleteNode();        // deletes the node pointed to by the list cursor
                              // and advances the list cursor to the next
			      // list element
    bool isEmpty();           // returns whether the list is empty

    // set the value in the sentinel node
    void setSentinelValue(Object value) { 
      sentinel_node->value = value;

  protected:
    dlNode <Object> *cursor;            // the list cursor
    dlNode <Object> *sentinel_node;     // a pointer to the sentinel node
};


dList Implementation

The implementation of the dList is pretty straightforward and mimics the C code that you've probably seen in CS140. For example, to initialize the list we need to allocate a sentinel node, make the sentinel node point to itself, and make the cursor point to the sentinel node. The code that does this is as follows:
template <class Object>
dList::dList() {
  sentinel_node = new dlNode();
  sentinel_node->flink = sentinel_node;
  sentinel_node->blink = sentinel_node;
  cursor = sentinel_node;
}

As another example, the next operation moves the cursor to the next element in the list. It can be implemented as follows:
template <class Object>
void dList <Object>::next() {
  cursor = cursor->flink;
}


Templates

You will note that in order to make his dlist library as general as possible Dr. Plank defines a Jval union and uses Jval's to store values in his dlist implementation. The reason he has to do this is that when you write a dlist implementation, the nodes have to store values of a specific type. The problem is that you often want a dlist to store values of different types, for example, ints, strings, or golf course records. One way around this problem is to define dlist libraries for each specific type. However, in that case you will have multiple copies of code that looks almost identical. The only difference is that the types of the value field will change in each implementation. The drawback of this approach is that if change the dlist implementation, you need to remember to change every dlist library you've created. This can be quite a code maintanance problem in the real world.

Dr. Plank has overcome this problem by defining a union which can hold multiple types of values. The union is ingenious from an implementation standpoint because he only needs to write the code once and the dlist can store many different types of values. However, as you may have noticed, the implementation complicates the user's life. Some of the complications include:

  1. You have to remember to use the appropriate field in the union when you retrieve a value from the value field (e.g., node->val.d).

  2. You have to remember to create a jval when you store a value in a node (e.g., dll_insert_str("Popeye", new_jval_d(10))).

  3. You have to remember to cast void *'s back to the appropriate typed pointer when you retrieve pointers from the value field (e.g., golfer = (golfCourseInfo *)node->val.v).

In C that is the trade-off you're forced to live with if you want to use a generic library. The designers of C++ tried to eliminate this tradeoff by introducing templates. Templates allow the programmer to write type-independent classes and functions. In other words, the programmer writes the code once and it works with different types. For example, I can write the code for a dlist once and it will work with any type of element.


Using Templates

Templates are easy to use. To declare a variable to have a template class type, you need to provide the name of the class and any type parameters that the class expects. For example:

#include "Dllist.h"
     
dList<int> a;          // A dlist of ints
dList<PayrollRec *> b; // A dlist of pointers to payroll records
HashTable<int, string> x; // A hash table in which the key is
                          // an integer and the value is a string

Once the variable is declared, you can forget that the variable is a template class and simply use its methods as normal. For example:

a.insertBeforeCursor(3);  // inserts 3 before the cursor
 
PayrollRec *p = new PayrollRec("Popeye the Sailor Man", 20000.00);
b.insertBeforeCursor(p);  // inserts the pointer to the new
                          // payroll record before the cursor
string y = x.find(4);    // return the value associated with 4


Writing Templates

Unfortunately, the syntax for writing templates is incredibly complex and is beyond the scope of this class. In the real world you will probably have to use template classes but it is doubtful you will have to write one. The reason you probably won't have to write one is that most C++ implementations provide a standard template library (STL) of common data structures. You can almost always use these data structures rather than writing your own. We may or may not use the standard template library in this class--we'll see how things go.

Although the syntax for writing templates is complex, conceptually writing a template class is easy. The code in the methods looks similar to the code in a non-template class since the code can be used with any type. The declarations of variables is the only thing that changes. Since the template class is supposed to be used with many different types, you cannot declare some of the variables (e.g., the value field in a dlist node) to be a specific type. Instead you declare them to be a parameter type. For example, the dlists in the above examples took one parameter type, the type of the value. The declaration of a template class looks a bit like the declaration of a function. The declaration includes the parameter types for that class. For example, the declaration of the dlNode class might look as follows:

template<class Value>
class dlNode {
  ...
  protected:
    Value val;
    dlNode *next;
    dlNode *prev;
  ...
}:

Notice that the val field is declared to be the parameter type Value, rather than being declared to be a specific type. When you declare a dlist, the parameter type that you provide is used to fill in the Value field and then the code is compiled by the C++ compiler. For each different type of dlist that you use, the C++ compiler compiles a different piece of code with the Value field replaced by the appropriate type. In this way you only have to write the code once and the C++ compiler ensures that multiple versions of the code are created as needed. Unfortunately, as has already benn pointed out, writing templates is pretty complicated and the resulting code can be hard to understand because the syntax is so cumbersome. Thus when deciding whether or not to write a template, you have to decide whether the pain of writing one is worth the effort. In this class, dlist and rbtree templates are provided for you, so there is no pain. Note that the benefit of templates is that the user doesn't have to use convoluted syntax, as is required by the generic C libraries.