Modules


Modules provide a way to allow several related files to share the details of their implementation while hiding this implementation from the outside world. Modules provide an added layer of flexibility to classes in the way of access protection. These notes outline the differences in data encapsulation afforded by C, C++, and Java.


Module Mechanisms in C: Void *'s

C has a very weak form of data encapsulation that is provided via the generic void * pointer and the ability to declare that a struct is local to a file. Suppose I want to declare a Stack data type in C and I want to hide its implementation, including its data structures, from users. I can do this by first defining a public file called Stack.h that contains my generic Stack data type and the functions that the stack data type supports:

Stack.h:
    typedef void * Stack;
    Stack stack_new(int size);
    void stack_free(Stack s);
    void stack_push(Stack s, int value);
    int stack_pop(Stack s);
Note that I have prefaced all my function names with the "stack_" prefix so that I can avoid name conflicts with user selected names. C++ and Java have ways to avoid these name conflicts and they will be discussed later.

Next I create my stack.c file that contains the implementation for my stack data type:

#include "stack.h"
#include <stdlib.h>

typedef struct {
   int size;
   int *data;
   int top;
} myStack;

Stack stack_new(int size) {
   myStack *newStack = (myStack *)malloc(sizeof(myStack));
   newStack->size = size;
   newStack->data = (int *)malloc(sizeof(int) * size);
   newStack->top = 0;
   return (Stack)newStack;  /* cast myStack to a (void *) */
}

void stack_push(Stack s, int value) {
   myStack *stack = (myStack *)s;
   if (stack->top == stack->size) return;  /* should really do error handling */
   stack->data[stack->top] = value;
   stack->top++;
}
...
Since myStack is declared locally and is not declared extern in the stack.h file, its scope is limited to stack.c. Hence only the functions in stack.c can manipulate the myStack data structure. The user is handed a (void *) which effectively hides a stack's implementation because there is no way for the user to cast the (void *) to a myStack. Whenever the user wants to manipulate the stack the user passes a (void *) to the appropriate stack function. The stack function can cast this (void *) to a myStack struct and manipulate the stack in any way it wishes.

This form of data encapsulation using void *'s is fairly kludgy but it does allow several files to share their implementation, as long as each file declares its local data structures in exactly the same way. For example, I could spread the stack implementation over two files by declaring a myStack struct locally in both files. The obvious drawback to this approach is that instead of having one central declaration for the stack's data structures I have one declaration per file, which makes it much more difficult and error-prone to change the data structures.

A positive aspect of the void * implementation is that you can hand a binary implementation to a third party without divulging any proprietary implementation knowledge because the third party will only see the void * in the .h file. Hence the third party will not even know what data structures you are using.


Module Mechanisms in C++: Friends and Namespaces

The public, protected, and private accessors in C++ provide a way to control access to the implementation of a class. Unfortunately, these accessors are "all" or "nothing" accessors, they either let everyone access the implementation or only subclasses to access the implementation. They do not provide a way to say "let classes A, B, and C have access to each other's implementation, but exclude everyone else."

C++'s developers partially address this problem by providing the friend keyword. A class can declare that other classes are its friends, which allow the other classes to examine the protected and private instance variables of the class. For example:

	  class ListNode {
	     friend class List;
		...
	  };
This declaration gives any method in List the ability to examine any variable in ListNode and to call any method in ListNode, regardless of whether or not the access protection is public.

Friendship has a number of klunky disadvantages. First it is not two way. When you declare List to be ListNode's friend, ListNode does not become a friend of List. List must explicitly declare ListNode to be a friend before the friendship becomes two way. Second, subclasses do not inherit a superclass's friendship status. For example, suppose you have the following subclass declaration:

           class DList : public List { ... }
DList is not considered a friend of ListNode, despite the fact that it is a subclass of a friend of ListNode.

These restrictions are incredibly annoying and really limit the effectiveness of friends in C++. First, if you want classes A, B, and C to share their implementation, you must ensure that all the classes mutually refer to each other as friends. Second, if you want their subclasses to also be friends, which invariably you do, then you have to make sure that the subclasses mutually refer to each other as friends. In general, if you want n classes to be friends, you will need n(n-1) friend declarations. In addition, if you add a new class to the system that should be included amongst the friends, then you must remember to add 2n more friend declarations. What a mess!

The second module-related concept in C++ is that of namespace's. The namespace keyword allows a programmer to specify that a certain set of variables, functions, and classes belong to the same library or "module". For example, a programmer might write:

namespace ibm {
    class Stack { ... };
    class List { ... };
    class ListNode { ... };
    class Consult { ... };
    ...
}

namespace apple {
    class Stack { ... };
    class List { ... };
    class ListNode { ... };
    class Cut { ... };
    ...
}
Notice that the same set of names have been re-used, but since they are in two different namespaces, that is ok. There are three common ways to access members of a namespace:
  1. On an as-needed basis with the :: operator:
    ibm::Stack *s = new ibm::Stack();
    

  2. Importing specific members into the current namespace with the using keyword:
    using ibm::Stack;
    
    Stack *s = new Stack();
    
  3. Importing all members into the current namespace with the using keyword:
    using namespace ibm;
    
    Stack *s = new Stack();
    

If you import conflicting names into the same namespace it is problematic only if you try to use that name:

using namespace ibm;
using namespace apple;

Consult *c = new Consult();  // ok--no name conflict
Stack *s = new Stack();      // compiler error because of a name conflict
Namespaces can span one or more files, so you can still place declarations in a .h file and definitions in a .cpp file. For example, to declare the methods for ibm's Stack class one could use any of the following three styles in an ibm.cpp file:
#include "ibm.h"
using namespace ibm;

int Stack::pop() { ... }
...
#include "ibm.h"

int ibm::Stack::pop() { ... }
...
#include "ibm.h"

namespace ibm {
   int Stack::pop() { ... }
   ...
}

C++ implements its standard template library using the std namespace. This library provides a number of pre-defined data structures, such as vectors and lists.

Namespaces solve another of C's problems, which is that all variable, function, and class names end up in the same global name space. This common grouping can create problems when you combine third party software from two different vendors, who duplicate one or more names, as shown above.

Unfortunately C++'s developers did not create true modules with the namespace keyword. Unlike Java's packages, C++'s namespaces do not provide a way to share implementation among members of the namespace. If ListNode and List are declared in the same namespace, they still cannot access one another's members without using the friend keyword. It would have been nice if they also added the concept of package-level access so that one could truly create modules in C++, but they didn't. As a result Java has a much more powerful module mechanism than C++.


Packages Versus Friends--A Summary

  1. Packages in Java provide a way to organize a group of related classes and allow them to seamlessly refer to each other's implementation while protecting their implementation from the prying eyes of outside classes

  2. Friends provide a less elegant way in C++ to organize a group of related classes and allow them to refer to each other's implementation while protecting their implementation from the prying eyes of outside classes

    1. A class declares its variables as protected, and may even declare its constructors as protected if it does not want outside classes to create instances of the class

    2. A class can declare that other classes are its friends, which allow the other classes to examine the protected and private instance variables of the class. Here's an example:
      	 class ListNode {
      	    friend class List;
      	    ...
               };
      	 

    3. Friendship is not two way: when you declare List to be ListNode's friend, ListNode does not become a friend of List. List must explicitly declare ListNode to be a friend before the friendship becomes two way

    4. Subclasses do not inherit a superclass's friendship status. For example, if you have the following class declaration:
                 class DList : public List { ... }
      	   
      then DList is not considered a friend of ListNode. This restriction is incredibly annoying and really limits the effectiveness of friends in C++
The above discussion of the drawbacks of friends should make it clear that Java provides a much cleaner way to group related classes so that they can share their implementation while protecting their implementation from the outside world.