CS202 Lecture notes -- Implementing a class with a void *


Before you read

This is not an "industry-standard" way to program in C++. In fact, it circumvents many of the nice features of C++, the most important of which is using the type-checking abilities of your compilers. When you learn inheritance and interfaces next semester, you may well feel that what I'm teaching here is wrong.

However, I don't believe it's wrong, and this is indeed how my research group's classes work. Hiding information behind a (void *) is, in my opinion, a valuable thing to do, so I teach it to you.


A motivating example: Implementing a histogram data structure

We're going to define and implement a C++ class to help us generate histograms from data. This class has three important methods:
  1. Set_Bin_Size() -- our histogram will organize data by "bins", which represent regions of data. For example, if our bin size is 10, then bin 0 will represent all data values between 0 and 10 (not including 10), bin 1 will represent all data values between 10 and 20 (not including 20), etc.
  2. Add_Value() -- this adds a value to our histogram. Basically, our data structure will figure out which bin it belongs to, and then it will add one to the bin.
  3. Get_Data() -- this gives us our histogram data: what the bins are and how many data items are in each bin. You could use this information to make graphs (like the ones that I make to show how students have done on an exam.)
Here's the entire class definition, in include/histogram.hpp:

#pragma once

#include <vector>

class Histogram {
  public:

    /* Constructors, destructor, assignment overload. */

    Histogram();
    ~Histogram();
    Histogram(const Histogram &h);
    Histogram& operator= (const Histogram &h);

    void Clear();                   // This clears the histogram's data
                                    // but retains the bin size if one has been set.

    bool   Set_Bin_Size(double bs); // Returns false if bs is <= 0, or if histogram is non empty.
    double Get_Bin_Size() const;    // Return -1 if the bin size has not been sent.
    bool   Add_Value(double d);     // Add a new value to the histogram.  Returns false if
                                    // the bin size has not been set.

    /* Get_Data() creates these two vectors.  There will be an element in each
       vector for every non-empty bin in the histogram.  bin_ids[i] will contain
       the "id" of the bin, where an id of 0 corresponds to values between [0 and
       Bin_Size), an id of 1 corresponds to values between [Bin_Size and Bin_Size*2),
       etc.  bin_ids will be sorted.  num_elts[i] contains the number of data points
       in bin i.
     */

    bool Get_Data(std::vector <int> &bin_ids, std::vector <int> &num_elts) const;

  protected:
    void *state;                  // This is so that whoever uses the data structure
                                  // does not know how it is implemented.
};

Everything is straightforward, with the exception of the one protected variable, state. You'll note it's a (void *). A (void *) is a pointer, where you don't know what it points to. This is a good thing, because the person who is reading the class definition does not know what it contains. Therefore, if you use this class, and are not privy to the source code that implements it, you don't know how it's implemented. Why is this nice? Because those who use the class cannot mess it up. And it's not really important how it's implemented, so long as the implementation is correct and efficient.


Three programs that use the class

I have three programs that use the Histogram class:
  1. src/histogram_tester.cpp -- this is one of my standard line-based testers that lets you make sure that the data structure is working as it should.
  2. src/data_to_histogram.cpp -- this takes a bin size on the command line, and then reads words from standard input. Every word that is a number is inserted into the histogram, and then the histogram is printed at the end.
  3. src/range_tester.cpp -- this generates random numbers and inserts them into the histogram. It times this activity. It then calls Get_Data(), and times that. It prints the timings, and if you want, it will also print the histogram.
For this lecture, we will only concentrate on histogram_tester. I have compiled it with an implementation of the class which is in src/histogram_vector.cpp into the binary bin/ht_vector. Let's walk through how it works:
UNIX> make clean
rm -f obj/* bin/*
UNIX> make bin/ht_vector
g++ -std=c++98 -Wall -Wextra -Iinclude -c -o obj/histogram_tester.o src/histogram_tester.cpp
g++ -std=c++98 -Wall -Wextra -Iinclude -c -o obj/histogram_vector.o src/histogram_vector.cpp
g++ -std=c++98 -Wall -Wextra -Iinclude -o bin/ht_vector obj/histogram_tester.o obj/histogram_vector.o
UNIX> bin/ht_vector                                              # Take a look at the commands
usage: histo_tester prompt(- for empty) -- commands on stdin.

commands:
  SET_BIN bin_size       - Call Set_Bin(bin_size).
  ADD_VALUES v1 v2 ...   - Call Add_Value(v) for each vi.
  PRINT                  - Get Get_Data() and print.

  CLEAR                  - Calls the Clear() method.
  DESTROY                - Call the destructor and remake an empty histogram.
  PRINT_COPY             - Print a copy, thus testing the copy constructor.
  ASSIGN                 - Use the assignment overload to make a copy.

  QUIT                   - Quit.
  ?                      - Print commands.
UNIX> bin/ht_vector '-->'
--> SET_BIN 10                          # We're using a bin size of ten.
--> ADD_VALUES 3 4 5 6 15 23 41 42      # Four values into 0-10, one each into 10-20 and 20-30, 
--> PRINT                               # plus two values into 40-50
       0        4
      10        1
      20        1
      40        2
--> CLEAR                               # This clears the histogram, but keeps the bin size at 10.
--> ADD_VALUES 101 4.5                  # One value into 0-10 and one into 100-110
--> PRINT
       0        1
     100        1
--> QUIT
UNIX> 

Implementing the class with a double and a vector

There are many ways that we can implement this class. We will explore the different implementations in the Running Time Lecture Notes. In this lecture we focus on one implementation, which is in the file src/histogram_vector.cpp. In this implementation, we define a class called Histo_Vector:

class Histo_Vector {
  public:
    vector <int> Elts;
    double Bin_Size;
};

This will implement the Histogram class as follows: When I call Add_Value(v), I will calculate the bin number from v, I will use that as an index to the vector Elts. If Elts isn't big enough, I will resize it (putting zeros in the new entries). I will then increment Elts[bin].

I can implement Get_Data() by traversing Elts, and calling push_back() on ids and num_elts whenever Elts[i] is greater than zero.

That's straightforward, but we'll get to that code later. Instead, let's look at the constructor and destructor:

/* The constructor allocates an instance of
   Histo_Vector, and then sets the state member
   variable to that instance. */ 

Histogram::Histogram()
{
  Histo_Vector *hv;

  hv = new Histo_Vector;
  hv->Bin_Size = -1;

  state = (void *) hv;
}
/* The destructor calls the Histo_Vector
   destructor, which will clear out the Elts
   vector, and release the memory of the instance.*/

Histogram::~Histogram()
{
  Histo_Vector *hv;

  hv = (Histo_Vector *) state;
  delete hv;
}

When the constructor for Histogram is called, the only variable that has been allocated it state, and it is uninitialized. We need to create an instance of Histo_Vector, so we do that with new. And then we store it in state. Whenever we call a method of the Histogram class, the very first thing we'll do is declare a pointer to a Histo_Vector, and then set it equal to state. This is how the Histogram class only uses that one (void *), that the users know is there, but they don't know what it contains. In other words, we know what the state is, but the users don't.

As you can see above, the destructor does just what I said -- it declares a variable hv, which is a pointer to a Histo_Vector, and it sets it equal to state. It then calls delete, which deallocates all of its memory.

Below, I show the implementations of the three easy methods: Clear(), Set_Bin_Size() and Get_Bin_Size():

/* Clear() gets rid of the data by 
   clearing the Elts vector. */




void Histogram::Clear()
{
  Histo_Vector *hv;

  hv = (Histo_Vector *) state;
  hv->Elts.clear();
}
/* Set_Bin_Size() error checks its argument,
   and also error checks that there no data
   (because it wouldn't make sense to change
   the bin size if we have already assigned
   data to bins).  It then sets the bin size. */

bool Histogram::Set_Bin_Size(double bs)
{
  Histo_Vector *hv;

  hv = (Histo_Vector *) state;

  if (bs <= 0) return false;
  if (hv->Elts.size() != 0) return false;
  hv->Bin_Size = bs;
  return true;
}
/* Get_Bin_Size simply returns the bin size
   from the Histo_Vector. */




double Histogram::Get_Bin_Size() const
{
  Histo_Vector *hv;

  hv = (Histo_Vector *) state;
  return hv->Bin_Size;
}

The point of this code that I want to stress to you is how in each method, the very first thing that we do is declare a pointer to a Histo_Vector, and set it from state. This is highlighted in red in each method above. Once we have done this, we can work with the (Histo_Vector *).

Add_Value() works as described above. It error checks, then it figures out the bin, then it makes sure that the vector has the bin defined, and it adds one to the bin's element in the vector:

bool Histogram::Add_Value(double d)
{
  Histo_Vector *hv;
  int bin;

  hv = (Histo_Vector *) state;

  if (d < 0) return false;                      // Error check
  if (hv->Bin_Size < 0) return false;

  bin = (int) floor(d/hv->Bin_Size);            // Calculate the bin and make sure there's room
  if (bin >= (int) hv->Elts.size()) {           // for it in the Elts vector.
    hv->Elts.resize(bin+1, 0);
  }

  hv->Elts[bin]++;                              // Add one to the bin.
  return true;
}

And Get_Data() simply creates the ids and num_elts vectors from Elts. If Elts[i] is equal to zero, then we ignore that bin:

bool Histogram::Get_Data(vector <int> &bin_ids, vector <int> &num_elts) const
{
  Histo_Vector *hv;
  size_t i;

  hv = (Histo_Vector *) state;

  if (hv->Bin_Size < 0) return false;       // If we haven't set the bin size yet, return an error..

  bin_ids.clear();                          // Clear out the vectors if they have any elements.
  num_elts.clear();

  for (i = 0; i < hv->Elts.size(); i++) {   // Push the bins onto id's and Elts[bin] onto num_elts.
    if (hv->Elts[i] != 0) {
      bin_ids.push_back(i);
      num_elts.push_back(hv->Elts[i]);
    }
  }
  return true;
};

Finally, the assignment overload simply copies the Histo_Vector from one Histogram to the other. The copy constructor sets up the state and then uses the assignment overload to copy the Histo_Vector:

/* The copy constructor creates a new 
   Histo_Vector, and then uses the
   assignment overload to copy from h. */
   
Histogram::Histogram(const Histogram &h)
{
  Histo_Vector *hv;

  hv = new Histo_Vector;
  state = (void *) hv;
  *this = h;
}
/* The assignment overload allocates 
   a new instance of Histo_Vector, and 
   copies it from h's Histo_Vector. */

Histogram& Histogram::operator= (const Histogram &h)
{
  Histo_Vector *hv_original, *hv_copy;

  hv_original = (Histo_Vector *) h.state;
  hv_copy = (Histo_Vector *) state;
  *hv_copy = *hv_original;   // This copies Elts and Bin_Size.
  return *this;
}

There's more to this lecture in the Running Time Lecture Notes, where we implement the Histogram class in five different ways and then compare them. All of them work like src/histogram_vector.cpp, creating their own classes, and then storing pointers to them in state.