CS202 Lecture notes -- Procedures, Prototypes, Reference Parameters, Const


Procedures

Procedures allow you to perform the same task multiple times. If you have a functionality that you want to perform over and over, it's often good to define a procedure and use it. It also cleans up your code so that your programs are broken up into reasonable-sized procedures rather than one giant main().

When you are writing/using procedures, there are some things that you should keep in mind about how parameters are passed. That is the focus of this lecture.

I'm going to motivate this with an example that comes from my life. My wife and I have our phones store pictures into the same Dropbox folder. That's convenient, because I can get them on my laptop easily. There is an issue, though -- my wife's phone names the files one way, and my phone names them another. In the file data/filenames.txt, there is a listing of a few pictures from July and August, 2019.

2019-07-13 00.09.11.jpg
2019-07-14 21.57.36.jpg
2019-07-17 09.28.48.jpg
2019-07-26 19.06.20.jpg
2019-07-29 17.51.39.jpg
2019-07-31 16.02.56.jpg
2019-07-31 16.04.31.jpg
2019-07-31 16.05.08.jpg
2019-08-04 21.35.23.jpg
2019-08-05 14.10.06.jpg
2019-08-06 15.31.40.jpg
Photo Aug 02, 10 22 21 PM.jpg
Photo Aug 03, 9 25 04 PM.jpg
Photo Aug 04, 3 03 33 PM.jpg
Photo Aug 10, 9 56 09 AM.jpg
Photo Aug 10, 9 59 07 AM.jpg
Photo Aug 21, 9 27 02 AM.jpg
Photo Aug 21, 9 35 04 AM.jpg
Photo Jul 13, 9 57 54 AM.jpg
Photo Jul 19, 10 35 39 PM.jpg

We're going to write a program that will read filenames on standard input, and then it does one of two things -- if the filename begins with "2019", then it will emit the shell command to move the file to one with the same name, but with the space removed. If the filename begins with "Photo", then it will emit the shell command that converts it into the same format as the "2019" files. Here's the C++ code -- the procedures make it all quite easy to read, I think. (It's in src/move_filenames_1.cpp).

#include <cstdlib>
#include <cstdio>
#include <sstream>
#include <iostream>
using namespace std;

/* This emits the shell command to move the files that start with 2019.  
   We implement it by finding the space in the line.  Then we print "mv 'line'", 
   change the space to a hyphen, and finally "line\n". */

void move_2019(string line)
{
  int i;

  i = line.find(' ');
  if (i == string::npos) {
    fprintf(stderr, "move_2019() - the line of text has no space.\n");
    exit(1);
  }
  printf("mv '%s'", line.c_str());
  line[i] = '-';
  printf(" %s\n", line.c_str());
}
  
/* move_photo() assumes that the line is like "Photo Aug 04, 3 03 33 PM.jpg".  It uses
   an istringstream to read all of those values, and then emits the shell command
   to move the photo to one like "2019-08-04-15.03.33.jpg" */

void move_photo(string line)
{
  string p, month, comma, ampm;
  int day, hour, minute, second;
  int m;
  istringstream ss;

  ss.str(line);
  if (ss >> p >> month >> day >> comma >> hour >> minute >> second >> ampm) {
    m = (month == "Jul") ? 7 : 8;
    if (ampm.substr(0, 2) == "PM") hour += 12;
    printf("mv '%s' 2019-%02d-%02d-%02d.%02d.%02d.jpg\n", line.c_str(), m, day, hour, minute, second);
  } else {
    fprintf(stderr, "move_2019() - line in the wrong format.\n");
    exit(1);
  }
}

/* The main is pretty simple.  It reads in a line of text, and if the line begins with
   "2019", it calls move_2019().  Otherwise, it calls movo_photo(). */

int main()
{
  string l;

  while (getline(cin, l)) {
    if (l.substr(0, 4) == "2019") {
      move_2019(l);
    } else {
      move_photo(l);
    }
  }
  return 0;
}

I'm going to highlight just a few things about this program. First, move_2019() uses the find() method of strings to find the space in the string. If there's no space, then find() returns "string::npos". You may not have known that, so now you do.

Second, both move_photo() and main() use the substr() method of strings to extract a substring from a string. The first argument is the starting character of the substring, and the second argument is the size. That's how you check that ampm starts with "PM".

Third, you'll note that in move_photo() I explicitly read the comma after the day. If I didn't do that, then the rest of the "ss >> p >> ..." would fail.

This works very nicely, and enables me to have all of the photo files have the same format:

UNIX> bin/move_filenames_1 < data/filenames.txt
mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg
mv '2019-07-14 21.57.36.jpg' 2019-07-14-21.57.36.jpg
mv '2019-07-17 09.28.48.jpg' 2019-07-17-09.28.48.jpg
mv '2019-07-26 19.06.20.jpg' 2019-07-26-19.06.20.jpg
mv '2019-07-29 17.51.39.jpg' 2019-07-29-17.51.39.jpg
mv '2019-07-31 16.02.56.jpg' 2019-07-31-16.02.56.jpg
mv '2019-07-31 16.04.31.jpg' 2019-07-31-16.04.31.jpg
mv '2019-07-31 16.05.08.jpg' 2019-07-31-16.05.08.jpg
mv '2019-08-04 21.35.23.jpg' 2019-08-04-21.35.23.jpg
mv '2019-08-05 14.10.06.jpg' 2019-08-05-14.10.06.jpg
mv '2019-08-06 15.31.40.jpg' 2019-08-06-15.31.40.jpg
mv 'Photo Aug 02, 10 22 21 PM.jpg' 2019-08-02-22.22.21.jpg
mv 'Photo Aug 03, 9 25 04 PM.jpg' 2019-08-03-21.25.04.jpg
mv 'Photo Aug 04, 3 03 33 PM.jpg' 2019-08-04-15.03.33.jpg
mv 'Photo Aug 10, 9 56 09 AM.jpg' 2019-08-10-09.56.09.jpg
mv 'Photo Aug 10, 9 59 07 AM.jpg' 2019-08-10-09.59.07.jpg
mv 'Photo Aug 21, 9 27 02 AM.jpg' 2019-08-21-09.27.02.jpg
mv 'Photo Aug 21, 9 35 04 AM.jpg' 2019-08-21-09.35.04.jpg
mv 'Photo Jul 13, 9 57 54 AM.jpg' 2019-07-13-09.57.54.jpg
mv 'Photo Jul 19, 10 35 39 PM.jpg' 2019-07-19-22.35.39.jpg
UNIX> 

Prototypes

Suppose I put main() before move_2019() and move_photo(). I've done that in src/move_filenames_2.cpp. When you try to compile it, the compiler will exit with an error, because it doesn't know what move_2019() and move_photo() are when it is trying to compile main().

Here it is on my Mac:

UNIX> g++ src/move_filenames_2.cpp
move_filenames_2.cpp:16:7: error: use of undeclared identifier 'move_2019'
      move_2019(l);
      ^
move_filenames_2.cpp:18:7: error: use of undeclared identifier 'move_photo'
      move_photo(l);
      ^
2 errors generated.
UNIX> 
and here it is on one of the hydra machines:
UNIX> g++ src/move_filenames_2.cpp
move_filenames_2.cpp:16:7: error: 'move_2019' was not declared in this scope
      move_2019(l);
      ^
move_filenames_2.cpp:18:7: error: 'move_2019' was not declared in this scope
      move_photo(l);
      ^
UNIX> 
When you don't define a procedure before it is being used (or you define it in another file), you need to specify its prototype. This is the part of the procedure before the opening curly brace, and instead of the curly brace, you have a semi-colon.

I have done this in src/move_filenames_3.cpp. Here are the two lines that I added before main():

void move_2019(string line);
void move_photo(string line);

The compiler works fine on this program.


Reference Parameters vs Regular Parameters

By default, C++ passes parameters using "Pass By Value". What that means is that when you call move_2019(line), it passes the "value" of line to the procedure, making a copy of line. While that's less efficient than just letting move_2019() use line, it also makes the semantics a little cleaner -- if move_2019() modifies line (as it does), that doesn't affect whoever is calling it.

Just to hammer this home, src/move_filenames_4.cpp is the same as src/move_filenames_1.cpp, except in main(), at the end of the while loop, it prints the original filename. Here's the main(), which is the only part that differs from src/move_filenames_1.cpp:

int main()
{
  string l;

  while (getline(cin, l)) {
    if (l.substr(0, 4) == "2019") {
      move_2019(l);
    } else {
      move_photo(l);
    }
    printf("The line of text: %s\n", l.c_str());    // Here's the new line of code.
  }
  return 0;
}

Let's run it on the first line of data/filenames.txt, just to confirm that we know what it's doing:

UNIX> head -n 1 data/filenames.txt | bin/move_filenames_4
mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg
The line of text: 2019-07-13 00.09.11.jpg
UNIX> 
Now, in C++, we are allowed to specify that we want to pass a parameter using "call by reference." This is done by putting an ampersand in front of the variable in the procedure declaration. When this happens, the parameter is not copied, but instead the the procedure actually acts on what it was called with. Let me illustrate, by changing move_2019() to use a reference parameter. in src/move_filenames_5.cpp:

void move_2019(string &line)     // This line is the only change.
{
  size_t i;

  i = line.find(' ');
  if (i == string::npos) {
    fprintf(stderr, "move_2019() - the line of text has no space.\n");
    exit(1);
  }
  printf("mv '%s'", line.c_str());
  line[i] = '-';
  printf(" %s\n", line.c_str());
}

Here's where it's called, in main:

int main()
{
  string l;

  while (getline(cin, l)) {
    if (l.substr(0, 4) == "2019") {
      move_2019(l);                 // move_2019() acts on l and not a copy, so when it changes l, l is changed.
    } else {
      move_photo(l);
    }
    printf("The line of text: %s\n", l.c_str());
  }
  return 0;
}

When we run it, you'll notice that now the line of text is changed after the call to move_2019() (I put the changed line in blue - the space has been changed to a hyphen):

UNIX> head -n 1 data/filenames.txt | bin/move_filenames_5
mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg
The line of text: 2019-07-13-00.09.11.jpg
UNIX> 
I understand that this is a subtle change, but it's an important one for you do know. Now, why do we have reference parameters? Two reasons:
  1. Sometimes it is convenient to have a procedure modify its parameters.
  2. You save time and space when you don't make a copy, so a reference parameter is more efficient. However, you have to pay attention to the fact that you didn't make a copy, and if your caller doesn't want the parameter changed, then you'd better not change it.
Fortunately, there is a keyword const that can help. If you declare your parameter as const, then the compiler checks and makes sure that you aren't changing anything. That way you get the efficiency of not making a copy, without the danger that your data is being modified. For example, in src/move_filenames_6.cpp I have changed move_photo() so that it uses a const reference parameter:

void move_photo(const string &line)      // This is the only line that is changed.

Now, when we call it, it doesn't make a copy, but because of the const keyword, the compiler has assured us that move_photo() has not changed line. I won't bother running it, because you won't see anything exciting.

If I try to have move_2019() use the const keyword, as in src/move_filenames_7.cpp, the compiler will give me an error, because move_2019() does, in fact, change its parameter:

void move_2019(const string &line)   // This is the only line that has changed.
{
  size_t i;

  i = line.find(' ');
  if (i == string::npos) {
    fprintf(stderr, "move_2019() - the line of text has no space.\n");
    exit(1);
  }
  printf("mv '%s'", line.c_str());
  line[i] = '-';                    // Here's where we modify line, and why the compiler complains.
  printf(" %s\n", line.c_str());
}

Here's what happens when we try to compile:

UNIX> g++ src/move_filenames_7.cpp
move_filenames_7.cpp:20:11: error: cannot assign to return value because function 'operator[]'
      returns a const value
  line[i] = '-';                    // Here's where we modify line, and why the compiler complains.
  ~~~~~~~ ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/.bin/include/c++/v1/string:1460:31: note: 
      function 'operator[]' which returns const-qualified type 'const_reference'
      (aka 'const char &') declared here
    _LIBCPP_INLINE_VISIBILITY const_reference operator[](size_type __pos) const;
                              ^~~~~~~~~~~~~~~
1 error generated.
UNIX> 
Let's change move_2019() so that it doesn't modify its parameter. Here, we'll use the substr() method of strings to make copies of line before and after the space. This is in src/move_filenames_8.cpp:

void move_2019(const string &line)   // Since this code doesn't change line, it compiles.
{
  size_t i;
  string w1, w2;

  i = line.find(' ');
  if (i == string::npos) {
    fprintf(stderr, "move_2019() - the line of text has no space.\n");
    exit(1);
  }
  w1 = line.substr(0, i);
  w2 = line.substr(i+1);
  cout << "mv '" << w1 << " " << w2 << "' " << w1 << "-" << w2 << endl;
}

Now, it compiles and runs without changing the original line (you'll see that the space is back in the blue text):

UNIX> head -n 1 data/filenames.txt | bin/move_filenames_8
mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg
The line of text:  2019-07-13 00.09.11.jpg
UNIX> 
If you want far more text than you care to read about "call by value" and "call by reference," read https://en.wikipedia.org/wiki/Evaluation_strategy.

Demonstrating how reference parameters improve performance

Here, we're going to demonstrate how reference parameters can improve performance, and how const gives you peace of mind. Take a look at src/total_etc_1.cpp. This is a program that defines four functions that act on vectors of doubles: They are all really simple:

/* This program defines four procedures -- total(), avg(), max() and man() -- that
   return single values calculated from a vector of doubles.  None of them use 
   reference parameters, so each one of them makes a copy of the vector, which is
   expensive.

   To test this, I shove some randomish values into a large vector, and then call
   all of them.  
 */

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

/* This returns the total of the values. */

double total(vector <double> v)
{
  size_t i;
  double t;

  t = 0;
  for (i = 0; i < v.size(); i++) t += v[i];
  return t;
}

/* This returns the average of the values.  It does it by calling total to sum the values. */

double avg(vector <double> v)
{
  double size;

  size = v.size();
  return total(v)/size;
}

/* This returns the maximum of the values. */

double max(vector <double> v)
{
  size_t i;
  double mx;

  mx = v[0];
  for (i = 1; i < v.size(); i++) if (v[i] > mx) mx = v[i];
  return mx;
}

/* This returns the minimum of the values. */

double min(vector <double> v)
{
  size_t i;
  double mn;

  mn = v[0];
  for (i = 1; i < v.size(); i++) if (v[i] < mn) mn = v[i];
  return mn;
}

For the main(), I read the size of the vector on the command line, and then create a randomish vector with that many values. Read the code if you care how I create the values -- they are random-ish between 1 and ten. Finally, I call the four procedures and return the results:

/* We call main() with the number of values in the vector. 
   We then create the vector, and call all of the procedures. */

int main(int argc, char **argv)
{
  int i;
  int n;
  istringstream ss;
  vector <double> v;
  double val;

  /* Parse the command line. */

  if (argc != 2) { cerr << "usage: total_etc_1 number-of-elements\n"; exit(1); }
  ss.str(argv[1]);
  if (!(ss >> n)) { cerr << "usage: total_etc_1 number-of-elements\n"; exit(1); }

  /* Create the vector -- I'm not using a random number generator here -- I'm 
     just starting with val = 10*1/7 and repeatedly squaring it, dividing by
     ten with it exceeds ten.  That will keep the values between 1 and 10, but
     looking kinda random. */

  val = 1/7.0 * 10.0;

  for (i = 0; i < n; i++) {
    v.push_back(val);
    val *= val; 
    if (val > 10) val /= 10.0;
  }

  /* Print the values if there are fewer than 10 */

  if (n < 10) {
    for (i = 0; i < n; i++) printf("%6.4lf\n", v[i]);
    printf("\n");
  }

  /* Call the procedures and print the results. */

  printf("Total: %12.4lf\n", total(v));
  printf("Avg:   %12.4lf\n", avg(v));
  printf("Max:   %12.4lf\n", max(v));
  printf("Min:   %12.4lf\n", min(v));
  return 0;
}

Let's make sure it works -- we'll call it with a small value of four:

UNIX> bin/total_etc_1 4
1.4286
2.0408
4.1649
1.7347

Total:       9.3690
Avg:         2.3422
Max:         4.1649
Min:         1.4286
UNIX> 
We can verify by hand that these are all correct. Now, let's time calling it a really big value:
UNIX> time bin/total_etc_1 50000000
Total: 195385228.1964
Avg:         3.9077
Max:        10.0000
Min:         1.0000
3.736u 0.808s 0:04.55 99.5%	0+0k 0+0io 0pf+0w
UNIX> 
Using "time" has the shell print three times: If the machine that you're on is relatively unloaded, the first two will roughly add up to the last -- when I'm reporting times, I will typically report the wall clock time, and make sure that the machine that I'm on (in this case a MacBook Pro with a 2.2 GHz processor) is not doing anything else.

Now, in src/total_etc_2.cpp, I have changed all of the parameters to const reference parameters. For example, here's total():

double total(const vector <double> &v)
{
  size_t i;
  double t;

  t = 0;
  for (i = 0; i < v.size(); i++) t += v[i];
  return t;
}

Now, when total() is called, it doesn't make a copy of v, and the const keyword assures us that total() does not modify v. Therefore, there is no real difference between total_etc_1.cpp and total_etc_2.cpp, except that we're not copying that vector so much. Accordingly, when we run it, it is a lot faster -- nearly a factor of two!

UNIX> time bin/total_etc_2 50000000
Total: 195385228.1964
Avg:         3.9077
Max:        10.0000
Min:         1.0000
2.212u 0.215s 0:02.44 99.1%	0+0k 0+0io 0pf+0w
UNIX> 

Summary of key points from this lecture