CS140 Lecture notes -- Multiple Files; Information Hiding; Void *'s

Directory: /home/plank/cs140/Notes/Inf-Hiding

Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs140/Notes/Inf-Hiding

Fri Aug 31 14:08:38 EDT 2007

Interfaces, Multiple Files, Information-Hiding

We're going to motivate this with an example scenario that is only slightly contrived. Consider a low-budget music company that serves music downloads. Suppose such a company is just starting up and hires two interns to manage three different parts of the software system:

Frank is responsible for the software that reads the raw data about songs.
Luther is responsible for putting the data into HTML forms for the CEO.

Frank arrives at his job first, and starts figuring out the format of the data about songs. The data is all in one file called Music.txt. Go ahead and take a look at it. It is a text file, where each line is composed of six words:

Song-name Duration(m:ss) Artist Album-Name Genre Track-Number

No words have spaces in them -- where there would normally be a space, there is instead an underscore. For example, here are the first five lines:

Back_In_Black   4:15    AC_DC   Back_In_Black   Rock    6
Larks'_Tongues_In_Aspic,_Part_III       5:56    King_Crimson    Three_of_a_Perfect_Pair Rock    9
Ravel,_Menuet_Antique   5:28    Casadesus,_Robert       Ravel,_Complete_Piano_Music,_Disc_2     Classical       8
Pungee  3:02    Meters,_The     Look-Ka_Py_Py   Rock    3
Naima   4:24    Coltrane,_John  Giant_Steps     Jazz    6

Note the underscores instead of spaces.

Now, Frank decides to get a head start on his work and write a program to read in the data and print out some information about it -- total number of songs and average song duration. Also he will print the same same information about Rock songs, Classical songs, and all the others. It's a straightforward program that uses the fields library. It uses strcmp() and field #4 (zero indexed) to identify the Rock and Classical songs, and it uses atoi(), strchr() and field #1 to determine song durations. It is in genreport.c.

Take a special look at how it determines duration:

time = atoi(is->fields[1])*60;
x = strchr(is->fields[1], ':') + 1;
time += atoi(x);

The first atoi() gets the number of minutes, then multiplies it by 60. The strchr() finds the colon, and sets x to point to the first character after the colon. The last command uses atoi() to determine the number of seconds and adds it to time.

This runs fine. Like a good programmer, Frank double checks himself a little, by testing it on the first two lines. Since the first song has a duration of 3:05 and the second has a duration of 3:15, their average duration should be 3:10, or 190 seconds:

UNIX> head -2 Music.txt
Back_In_Black   4:15    AC_DC   Back_In_Black   Rock    6
Larks'_Tongues_In_Aspic,_Part_III       5:56    King_Crimson    Three_of_a_Perfect_Pair Rock    9
UNIX> head -2 Music.txt | genreport
Total Songs:     2   Avg Duration:  305.50
Rock  Songs:     2   Avg Duration:  305.50
Class Songs:     0   Avg Duration:     nan
Other Songs:     0   Avg Duration:     nan
UNIX> genreport < Music.txt
Total Songs:  9126   Avg Duration:  277.42
Rock  Songs:  3340   Avg Duration:  247.67
Class Songs:  3703   Avg Duration:  310.48
Other Songs:  2083   Avg Duration:  266.34
UNIX>

Frank double-checks the last execution of genreport: 3340+3703+2083 = 9126. Moreover: (3340*247.67)+(3703*310.48)+(2083*266.34) = 2531711, and 9126*277.42 = 2531734. That's close enough for me (they'd match were we to print the averages to four decimal places.

Add Luther

Luther comes to work, and now Frank has to modify his code so that Luther can work with it. They decide to do the correct thing -- Frank works with frank.c, Luther works with luther.c, and the two files will be glued together with a header file: flcommon.h.

First take a look at that header file:

typedef struct {
  double nsongs, ttime;
  double ncsongs, tctime;
  double nrsongs, trtime;
  double nosongs, totime;
} Music_Info;

*Music_Info read_music_file(char *filename);

You'll see that the the relevant information is now bundled up in a struct whose name is Music_Info. Frank must convert his code to read filename instead of standard input, and allocate/return a Music_Info struct. Luther calls read_music_file(), and then prints out a report about the file in HTML. The makefile makes sure that all three files (frank.c, luther.c and flcommon.h) compile together correctly. Copy it all over and try it out:

UNIX> make fl
gcc -g -c -I/home/plank/cs140/include luther.c
gcc -g -c -I/home/plank/cs140/include frank.c
gcc -O -o fl luther.o frank.o /home/plank/cs140/objs/libfdr.a
UNIX> fl > report.html

The output is here. I will say that I made Luther's code ugly on purpose. If you think that Luther is overpaid at this point, you're probably correct, but it's hard to find good help these days....

Luther makes an ill-advised modification

Now, suppose Luther talks with the CEO, who wants to know specifically about classical music. Luther, who took a few hours writing his code and still doesn't quite understand why it works, decides to make a quick fix so that he doesn't have to touch the ugly part of his code. He changes the nsongs and ttime fields of the Music_Info struct so that they equal the classical information.

Here's luther2.c, which is the same as luther.c, except for the bold-faced stuff lines:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "flcommon.h"

main()
{
  Music_Info *mi;

  mi = read_music_file("Music.txt");
  mi->nsongs = mi->ncsongs;
  mi->ttime = mi->tctime;
  printf("<h2>Classical Music Data</h2>\n");
  printf("<UL>\n");
  printf("<LI> Total number of songs: %.0lf\n", mi->nsongs);
  printf("<LI> Average song duration: %.0lf seconds: (%d:%02.0lf)\n", 
           mi->ttime/mi->nsongs, ((int) (mi->ttime/mi->nsongs))/60,
           mi->ttime/mi->nsongs - (((int) (mi->ttime/mi->nsongs))/60)*60);
  printf("</UL>\n");
}

When you run fl2, the output looks as follows (in report2.html).

The problem with luther2.c

Now, suppose a third programmer wants to generate a report that contains some details after Luther's report. He gets Luther to change his code to luther3.c, which now changes the main() program to print_report(). This takes a Music_Info as input, and prints out the HTML output. Note, it also changes mi->nsongs and mi->ttime.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "flcommon3.h"

void print_report(Music_Info *mi)
{
  mi->nsongs = mi->ncsongs;
  mi->ttime = mi->tctime;
  printf("<h2>Classical Music Data</h2>\n");
  printf("<UL>\n");
  printf("<LI> Total number of songs: %.0lf\n", mi->nsongs);
  printf("<LI> Average song duration: %.0lf seconds: (%d:%02.0lf)\n", 
           mi->ttime/mi->nsongs, ((int) (mi->ttime/mi->nsongs))/60,
           mi->ttime/mi->nsongs - (((int) (mi->ttime/mi->nsongs))/60)*60);
  printf("</UL>\n");
}

The third programmer now writes report3.c, which appends the total number of songs to the report.

#include <stdio.h>
#include <stdlib.h>
#include "flcommon3.h"

main()
{
  Music_Info *mi;

  mi = read_music_file("Music.txt");

  print_report(mi);

  printf("\n");
  printf("Total number of songs (classical and non-classical): %.0lf\n", mi->nsongs);
}

The output, as you can see, is incorrect, because Luther changed mi->nsongs. It is here.. The total number of songs should be 9126, not 3703.

Fixing this with Information Hiding

Frank sees this disaster, and decides that he can do something about it. First, he looks into getting Luther fired, but when he finds out that Luther is the CEO's nephew, he gives up on that approach. Instead, he decides to employ a standard C (not C++) technique of using a (void *). This is a generic pointer. When you pass someone a (void *), you are saying "Here's a pointer that holds some data -- you can't look at the data, but if you pass that pointer back to me, I'll do the right thing with it."

What happens is that Frank now moves the Music_Info struct to frank4.c, instead of having it in the header file. The header file instead returns a (void *) from read_music_file(). When frank4.c returns from read_music_file(), it returns a pointer to its Music_Info struct, but it casts it to a (void *). This means that Luther can't do anything with it. Frank also augments the header file with eight accessor functions:

double get_nsongs(void *mi);
double get_ttime(void *mi);
double get_ncsongs(void *mi);
double get_tctime(void *mi);
double get_nrsongs(void *mi);
double get_trtime(void *mi);
double get_nosongs(void *mi);
double get_totime(void *mi);

Frank implements these simply:

double get_nsongs(void *mi) { 
  Music_Info *m;
  
  m = (Music_Info *) mi;
  return m->nsongs;
}

double get_ttime(void *mi) { return ((Music_Info *) mi)->ttime; }
double get_ncsongs(void *mi) { return ((Music_Info *) mi)->ncsongs; }
double get_tctime(void *mi) { return ((Music_Info *) mi)->tctime; }
double get_nrsongs(void *mi) { return ((Music_Info *) mi)->nrsongs; }
double get_trtime(void *mi) { return ((Music_Info *) mi)->trtime; }
double get_nosongs(void *mi) { return ((Music_Info *) mi)->nosongs; }
double get_totime(void *mi) { return ((Music_Info *) mi)->totime; }

The implementation of get_nsongs() is pretty standard. The (void *) is cast back to a Music_Info *, and then the proper field of the Music_Info is returned. The remaining seven accessor functions are implemented in one line, performing the cast in the return statement.

Now, Luther must modify print_report() to work with a (void *). Note, he cannot modify nsongs like he did before. He may only access it. It took him a few hours, but he did eventually get it right (in luther4.c). Similarly, the code in report4.c works with a (void *) and the accessor functions. And the output (in report4.html) is now correct.

(Void *)'s and C++

Employing a (void *) is an excellent way to perform information hiding. It is also a nice way to pass around generic pointers, which we will see very soon in this class.

In CS102, you saw accessor functions defined in C++ classes. They are indeed an integral part of any object-oriented programming methodology.

Typical proponents of C++ dislike the use of (void *)'s, because they circumnavigate the type system and are rather unsafe (there's nothing illegal about casting a (void *) to any struct -- which will likely destroy your program). They prefer templates, inheritance, and specification of variables as private, public and protected to achieve the same purpose. Both methodologies (information hiding with void *'s in C, information hiding with protected variables in C++) ht have merit. However, you will find that:

There are zealots of C and zealots of C++, and neither will tolerate the methodologies of the other. You will need to understand and employ both metholodogies, so you should keep an open mind about it. They both have merit!!!!!

In this class, we will use the C methodology, and in CS302, we will revert to the C++ methodology.

A Final Word

Regardless of the methodology, the ability to hide an implementation of a data structure, and only present its interfaces to the user, is an important one, and laying out your code so that it presents good clean interfaces, which are implemented "behind the scenes", is a useful skill to learn.