CS140 Lecture notes -- Procedures


Basics

Procedures allow you to perform the same task multiple times. If you have a functionality that you want to perform over and over, it's often good to define a procedure and use it. For example, suppose you have a bunch of songs and you want to burn them onto CD's. You know that a CD will hold 80 minutes of music, and you have a list of songs with their timings in MM:SS format in the file songs.txt:

Africano                                     5:10
Get to Me                                    4:06
In-A-Gadda-Da-Vida                           3:02
Can't Get Used To Losing You                 3:05
Calling All Angels                           4:02
Crosstown Traffic                            2:20
Overture - Prologue                          3:04
Funky Miracle                                2:30
Queen Bee                                    3:13
One Way or Another                           3:29
Till There Was You                           2:14
I Love The Nightlife (Disco 'Round)          2:55
Tales 3. The Ancient; Giants Under the Sun  18:38
Sit Down and Talk To Me                      3:18
Mississippi Queen                            2:31
Heartless                                    5:00
Shake A Leg                                  4:06
I Want To Know                               4:24
Sugar Babe                                   4:35
Intensity                                    3:50
It's Too Late                                3:51
You've Made Me So Very Happy                 4:23
Tumbling Dice                                3:47
Rainbow Blues                                3:39
Anyway                                       3:19
Bad Time                                     2:57
If I Can't Have You                          3:00
War Within a Breath                          3:37
High Time We Went                            4:30
I'm Coming Home                              3:01
Loving Cup                                   4:26
Darts                                        2:14
Amen                                         3:30
Are You Happy                                4:50
First Brain                                  3:40
Sexy Ida                                     2:34
Josie                                        4:31
Stairway To Heaven                           8:02
While The City Sleeps                        3:51
Do For the Others                            2:50
Fair Game                                    3:30
I'm Not Leaving                              2:32
Roll on Down the Highway                     3:57
You Get Me                                   3:54
Little Guitars (Intro)                       0:42
The Caves Of Altamira                        3:34

You'd like to turn this big list into a list of playlists where each playlist fits onto a CD. First, let's write a program that simply reads in each song and timing and creates two string vectors -- one of song titles and one of timings. This is good string practice -- we're going to assume that the last word on each line is a timing and that it is separated from the title by more than one space. The program is in readlist.cpp:

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <iostream>
using namespace std;

int main()
{
  vector <string> songs;
  vector <string> timings;
  string line;
  int i;

  while(getline(cin, line)) {
    i = line.find("  ");
    if (i == string::npos) { cerr << "Bad line -- " << line << endl; exit(1); }
    songs.push_back(line.substr(0, i));
    while (line[i] == ' ') i++;
    timings.push_back(line.substr(i));
  }
  for (i = 0; i < songs.size(); i++) {
    printf("Song: %-40s -- Timing %5s\n", songs[i].c_str(), timings[i].c_str());
  }
}

As always, we should test -- here we do it on the first ten lines of songs.txt:

UNIX> head songs.txt | readlist
Song: Africano                                 -- Timing  5:10
Song: Get to Me                                -- Timing  4:06
Song: In-A-Gadda-Da-Vida                       -- Timing  3:02
Song: Can't Get Used To Losing You             -- Timing  3:05
Song: Calling All Angels                       -- Timing  4:02
Song: Crosstown Traffic                        -- Timing  2:20
Song: Overture - Prologue                      -- Timing  3:04
Song: Funky Miracle                            -- Timing  2:30
Song: Queen Bee                                -- Timing  3:13
Song: One Way or Another                       -- Timing  3:29
UNIX> 
Now, we need to turn those timings into numbers -- it's convenient if we can turn them into seconds. We can use stringstreams to do this -- the program is in timing_convert.cpp, and it's an interesting use of stringstreams:

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

int main()
{
  vector <string> songs;
  vector <string> timings;
  string line;
  int i, sec, min, colon;
  istringstream ss;

  while(getline(cin, line)) {
    i = line.find("  ");
    if (i == string::npos) { cerr << "Bad line -- " << line << endl; exit(1); }
    songs.push_back(line.substr(0, i));
    while (line[i] == ' ') i++;
    timings.push_back(line.substr(i));
  }
  for (i = 0; i < songs.size(); i++) {
    colon = timings[i].find(':');
    timings[i][colon] = ' ';
    ss.clear();
    ss.str(timings[i]);
    ss >> min >> sec;
    sec += (min * 60); 
    timings[i][colon] = ':';
    
    printf("Song: %-40s -- Timing %5s -- %5d\n", songs[i].c_str(), timings[i].c_str(), sec);
  }
}

We convert the colon in timings[i] to a space, then convert that string to a stringstream and extract the minutes and seconds. We add the minutes to the seconds, and we then convert the space back to a colon. Finally, we print out the song name, original timing and timing in seconds. Convince yourself that the output is correct:

UNIX> head songs.txt | timing_convert
Song: Africano                                 -- Timing  5:10 --   310
Song: Get to Me                                -- Timing  4:06 --   246
Song: In-A-Gadda-Da-Vida                       -- Timing  3:02 --   182
Song: Can't Get Used To Losing You             -- Timing  3:05 --   185
Song: Calling All Angels                       -- Timing  4:02 --   242
Song: Crosstown Traffic                        -- Timing  2:20 --   140
Song: Overture - Prologue                      -- Timing  3:04 --   184
Song: Funky Miracle                            -- Timing  2:30 --   150
Song: Queen Bee                                -- Timing  3:13 --   193
Song: One Way or Another                       -- Timing  3:29 --   209
UNIX> 

Defining a procedure to convert the timing to seconds

That code is a bit yucky and hard to read -- this is a good place to define a procedure that converts the timing string to a number of seconds. We do this in timing_procedure.cpp:

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

int timing_seconds(string s)
{
  int colon;
  istringstream ss;
  int min, sec;

  colon = s.find(':');
  if (colon == string::npos) return -1;
  s[colon] = ' ';
  ss.str(s);
  if (!(ss >> min >> sec)) return -1;
  return sec + min*60;
}

int main()
{
  vector <string> songs;
  vector <string> timings;
  string line;
  int i;

  while(getline(cin, line)) {
    i = line.find("  ");
    if (i == string::npos) { cerr << "Bad line -- " << line << endl; exit(1); }
    songs.push_back(line.substr(0, i));
    while (line[i] == ' ') i++;
    timings.push_back(line.substr(i));
  }
  for (i = 0; i < songs.size(); i++) {
    printf("Song: %-40s -- Timing %5s -- %5d\n", songs[i].c_str(), timings[i].c_str(), 
           timing_seconds(timings[i]));
  }
}

We've defined the procedure timing_seconds() which takes one parameter, a string, and returns an integer. It assumes that the string is in the format "mm:ss", converts the colon to a space and extracts the minutes and seconds using a stringstream as before. It also performs some tests to make sure that the parameter is in the proper format -- if there is no color or if the extraction of minutes and seconds fails, it returns -1. That's a good habit to acquire -- writing code so that if something unexpected happens (like s not being in the correct format), it is discovered and handled appropriately.

In the printf() statement, we simply call timing_seconds(timings[i]) -- that returns the number of seconds.

UNIX> head songs.txt | timing_procedure
Song: Africano                                 -- Timing  5:10 --   310
Song: Get to Me                                -- Timing  4:06 --   246
Song: In-A-Gadda-Da-Vida                       -- Timing  3:02 --   182
Song: Can't Get Used To Losing You             -- Timing  3:05 --   185
Song: Calling All Angels                       -- Timing  4:02 --   242
Song: Crosstown Traffic                        -- Timing  2:20 --   140
Song: Overture - Prologue                      -- Timing  3:04 --   184
Song: Funky Miracle                            -- Timing  2:30 --   150
Song: Queen Bee                                -- Timing  3:13 --   193
Song: One Way or Another                       -- Timing  3:29 --   209
UNIX> 
Now, you may notice a few differences between timing_convert.cpp and timing_procedure.cpp. One glaring difference is that you didn't change the space back to a colon. Why? The answer is a little subtle, but one that you need to pay attention to -- when you call a procedure, copies are made of the parameters. These copies exist for the lifetime of the procedure call, and they go away when the procedure call returns. The same is true of the local variables -- they exist for the lifetime of the procedure call, and they too go away when the procedure call returns. This is why changing the colon to a space does not affect the string in timings[i].

This is a very important thing for you to understand -- by default, parameters to procedures are copied. Most of the time, it's not a problem. Sometimes it is.

Regardless, let's finish our CD partitioning program: The final program, cd_partition.cpp, keeps track of the total time of each CD, and when a song makes that time greater than 80 minutes, it starts a new CD:

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

int timing_seconds(string s)
{
  int colon;
  istringstream ss;
  int min, sec;

  colon = s.find(':');
  if (colon == string::npos) return -1;
  s[colon] = ' ';
  ss.str(s);
  if (!(ss >> min >> sec)) return -1;
  return sec + min*60;
}

int main()
{
  vector <string> songs;
  vector <string> timings;
  string line;
  int i, ttime, cd_number;

  while(getline(cin, line)) {
    i = line.find("  ");
    if (i == string::npos) { cerr << "Bad line -- " << line << endl; exit(1); }
    songs.push_back(line.substr(0, i));
    while (line[i] == ' ') i++;
    timings.push_back(line.substr(i));
  }
 
  cd_number = 0;
  ttime = 0;
  printf("CD %d\n\n", cd_number);
  for (i = 0; i < songs.size(); i++) {
    ttime += timing_seconds(timings[i]);
    if (ttime > 80*60) {
      cd_number++;
      printf("\nCD %d\n\n", cd_number);
      ttime = timing_seconds(timings[i]);
    }
    printf("Song: %-50s   Timing %5s -- Total: %2d:%02d\n", 
           songs[i].c_str(), timings[i].c_str(), 
           ttime/60, ttime%60);
  }
}

Take a good look at the printf() statement -- when it prints the total time, it prints the minutes in two digits, right justified, then a colon, then the seconds in two digits with leading zeros. Here's the final output. Make sure you understand the flow of control and how the program works.

UNIX> cd_partition < songs.txt
CD 0

Song: Africano                                             Timing  5:10 -- Total:  5:10
Song: Get to Me                                            Timing  4:06 -- Total:  9:16
Song: In-A-Gadda-Da-Vida                                   Timing  3:02 -- Total: 12:18
Song: Can't Get Used To Losing You                         Timing  3:05 -- Total: 15:23
Song: Calling All Angels                                   Timing  4:02 -- Total: 19:25
Song: Crosstown Traffic                                    Timing  2:20 -- Total: 21:45
Song: Overture - Prologue                                  Timing  3:04 -- Total: 24:49
Song: Funky Miracle                                        Timing  2:30 -- Total: 27:19
Song: Queen Bee                                            Timing  3:13 -- Total: 30:32
Song: One Way or Another                                   Timing  3:29 -- Total: 34:01
Song: Till There Was You                                   Timing  2:14 -- Total: 36:15
Song: I Love The Nightlife (Disco 'Round)                  Timing  2:55 -- Total: 39:10
Song: Tales 3. The Ancient; Giants Under the Sun           Timing 18:38 -- Total: 57:48
Song: Sit Down and Talk To Me                              Timing  3:18 -- Total: 61:06
Song: Mississippi Queen                                    Timing  2:31 -- Total: 63:37
Song: Heartless                                            Timing  5:00 -- Total: 68:37
Song: Shake A Leg                                          Timing  4:06 -- Total: 72:43
Song: I Want To Know                                       Timing  4:24 -- Total: 77:07

CD 1

Song: Sugar Babe                                           Timing  4:35 -- Total:  4:35
Song: Intensity                                            Timing  3:50 -- Total:  8:25
Song: It's Too Late                                        Timing  3:51 -- Total: 12:16
Song: You've Made Me So Very Happy                         Timing  4:23 -- Total: 16:39
Song: Tumbling Dice                                        Timing  3:47 -- Total: 20:26
Song: Rainbow Blues                                        Timing  3:39 -- Total: 24:05
Song: Anyway                                               Timing  3:19 -- Total: 27:24
Song: Bad Time                                             Timing  2:57 -- Total: 30:21
Song: If I Can't Have You                                  Timing  3:00 -- Total: 33:21
Song: War Within a Breath                                  Timing  3:37 -- Total: 36:58
Song: High Time We Went                                    Timing  4:30 -- Total: 41:28
Song: I'm Coming Home                                      Timing  3:01 -- Total: 44:29
Song: Loving Cup                                           Timing  4:26 -- Total: 48:55
Song: Darts                                                Timing  2:14 -- Total: 51:09
Song: Amen                                                 Timing  3:30 -- Total: 54:39
Song: Are You Happy                                        Timing  4:50 -- Total: 59:29
Song: First Brain                                          Timing  3:40 -- Total: 63:09
Song: Sexy Ida                                             Timing  2:34 -- Total: 65:43
Song: Josie                                                Timing  4:31 -- Total: 70:14
Song: Stairway To Heaven                                   Timing  8:02 -- Total: 78:16

CD 2

Song: While The City Sleeps                                Timing  3:51 -- Total:  3:51
Song: Do For the Others                                    Timing  2:50 -- Total:  6:41
Song: Fair Game                                            Timing  3:30 -- Total: 10:11
Song: I'm Not Leaving                                      Timing  2:32 -- Total: 12:43
Song: Roll on Down the Highway                             Timing  3:57 -- Total: 16:40
Song: You Get Me                                           Timing  3:54 -- Total: 20:34
Song: Little Guitars (Intro)                               Timing  0:42 -- Total: 21:16
Song: The Caves Of Altamira                                Timing  3:34 -- Total: 24:50
UNIX> 

Prototypes

Take a look at cd_partition_below.cpp. It has moved the definition of the procedure from before main() to after main(). This causes a compilation warning from the C++ compiler:
UNIX> g++ -o cd_partition_below  cd_partition_below.cpp
cd_partition_below.cpp: In function 'int main()':
cd_partition_below.cpp:27: error: 'timing_seconds' was not declared in this scope
UNIX> 
The C++ compiler is strict -- it wants to see procedures defined before they are used. There are times when you can't enforce this, and to handle that, you can define a placeholder for the procedure. This is called a prototype -- it is like the procedure definition, except you only include the first line and you end it with a semi-colon. The example is in cd_partition_prototype.cpp, and I just include the prototype:

int timing_seconds(string s);

Sometimes you see the prototype preceded by the word extern. This means that the procedure definition may be in another file. That's convenient.

Sometimes you see the prototype preceded by the word static. This means that the procedure definition is most definitely in this file, and other files cannot use it. That's convenient too, sometimes.


Reference Parameters in C++: Swallowing the Red Pill

This text comes from 2009, when I was still coming to grips with reference parameters. It reads well though, so I'm keeping it verbatim.

Reference parameters appall me, but they are ubiquitous in the STL, so unlike previous years, I will not pretend that they don't exist, but instead address them head-on.

You may declare a procedure parameter to take a reference to a variable. You this by putting an ampersand (&) between the parameter's type and its name. When you do this, a reference is passed to the parameter rather than a copy, which means that if you change the parameter's value, it will change that value in the caller as well.

Let's see an example in refparam.cpp

#include <cstdio>
#include <cstdlib>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;

void add_5_to_i_non_ref(int i)
{
  i += 5;
}

void add_5_to_i_ref(int &i)
{
  i += 5;
}

int main(int argc, char **argv)
{
  int i;
  istringstream ss;

  if (argc != 2) {
    cerr << "usage: refparam number\n";
    exit(1);
  }

  ss.str(argv[1]);
  if (!(ss >> i)) {
    cerr << "usage: refparam number\n";
    exit(1);
  }

  cout << "I is " << i << endl;

  add_5_to_i_non_ref(i);
  cout << "After calling add_5_to_i_non_ref(i).  I is " << i << endl;
  
  add_5_to_i_ref(i);
  cout << "After calling add_5_to_i_ref(i).      I is " << i << endl;
  
  cout << "I feel ill." << endl;
  exit(0);
}

As you can see, the procedures add_5_to_i_non_ref(i) and add_5_to_i_non(i) are identical except that add_5_to_i_non(i) declares i as a reference parameter. That explains the output:

UNIX> refparam 3
I is 3
After calling add_5_to_i_non_ref(i).  I is 3
After calling add_5_to_i_ref(i).      I is 8
I feel ill.
UNIX> 
Since add_5_to_i_non(i) declares i as a reference parameter, the value of i is changed in main().

I believe that reference parameters are used for convenience, when you want to pass an object to a procedure that is not going to modify it, and evidently you are too lazy to pass a pointer. Then reference parameters are nice because you get efficiency and you can live in denial that pointers exist for a reason.

Why do they make me sick? Because when I see a procedure call like:

  add_5_to_i_ref(i);
I believe that i will not change, and that is enforced by a well-designed language like C. I don't want to have to go hunting down the prototype of add_5_to_i_ref() to see that i's value might get changed from under me. It is an atrocity.

End of 2009 text.

Now, acceptance is often the first part of recovery, so since I first started programming in C++ (2005 or so), I have grown to accept reference parameters for what they are. There are reasons to use reference parameters, and there are reasons not to use them. The reasons to use them pretty much boil down to one thing:

You want the convenience of using a procedure, but you don't want to make copies, because that is expensive.

I'll give you two examples to illustrate.


Example #1: Using a procedure to calculate a function of a large vector

Take a look at total_etc_1.cpp:

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

double total(vector <double> v)
{
  int i;
  double t;

  t = 0;
  for (i = 0; i < v.size(); i++) t += v[i];
  return t;
}

double avg(vector <double> v)
{
  double size;

  size = v.size();
  return total(v)/size;
}

double max(vector <double> v)
{
  int i;
  double mx;

  mx = v[0];
  for (i = 1; i < v.size(); i++) if (v[i] > mx) mx = v[i];
  return mx;
}

double min(vector <double> v)
{
  int i;
  double mn;

  mn = v[0];
  for (i = 1; i < v.size(); i++) if (v[i] < mn) mn = v[i];
  return mn;
}

int main(int argc, char **argv)
{
  int i;
  int n, seed;
  istringstream ss;
  vector <double> v;

  if (argc != 3) {
    cerr << "usage: total_etc_1 number-of-elements seed\n";
    exit(1);
  }

  ss.str(argv[1]);
  if (!(ss >> n)) { cerr << "usage: total_etc_1 number-of-elements seed\n"; exit(1); }

  ss.clear(); ss.str(argv[2]);
  if (!(ss >> seed)) { cerr << "usage: total_etc_1 number-of-elements seed\n"; exit(1); }

  if (n <= 0) exit(0);

  srand48(seed);

  for (i = 0; i < n; i++) v.push_back(drand48());

  if (n < 10) {
    for (i = 0; i < n; i++) printf("%6.4lf\n", v[i]);
    printf("\n");
  }

  printf("Total: %12.4lf\n", total(v));
  printf("Avg:   %12.4lf\n", avg(v));
  printf("Max:   %12.4lf\n", max(v));
  printf("Max:   %12.4lf\n", min(v));
  exit(0);
}

This is a program that generates a given number of random doubles, then prints out their total, average, max and min, using procedures to calculate each of them. This is a very natural task -- one that lends itself very cleanly to procedures. First, as always, you should make sure it actually works on a small input value like 3:

UNIX> total_etc_1 3 1
0.0416
0.4545
0.8348

Total:       1.3309
Avg:         0.4436
Max:         0.8348
Max:         0.0416
UNIX>
You can confirm by hand that all those values are correct. Now, let's create a second program, total_etc_2.cpp, that uses reference parameters in the definition of total(), avg(), max() and min(). When we call both on a very large value like 50,000,000, you can see a significant difference in the running times (This is on my linux box in January, 2011):
UNIX> total_etc_1 50000000 1
Total: 25000209.0714
Avg:         0.5000
Max:         1.0000
Max:         0.0000
2.720u 2.450s 0:05.24 98.6%	0+0k 208+0io 2pf+0w
UNIX> time total_etc_2 50000000 1
Total: 25000209.0714
Avg:         0.5000
Max:         1.0000
Max:         0.0000
1.770u 0.950s 0:02.72 100.0%	0+0k 0+0io 0pf+0w
UNIX> 
The second one runs in 2.72 seconds, while the first runs in 5.24 seconds. That's nearly half (51.9%) the running time, and is significant. Why? Because v contains 50 million doubles, which is roughly 400 MB of storage. Making copies takes a significant amount of time and memory.

Example #2: Using a procedure to modify a large amount of data

Take a look at rev_1.cpp:

#include <cstdlib>
#include <cstdio>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

void reverse(vector <double> &v)
{
  double tmp;
  int i;

  for (i = 0; i < v.size()/2; i++) {
    tmp = v[i];
    v[i] = v[v.size()-i-1];
    v[v.size()-i-1] = tmp;
  }
}

int main(int argc, char **argv)
{
  int i;
  int n, seed;
  istringstream ss;
  vector <double> v;

  if (argc != 3) {
    cerr << "usage: rev_1 number-of-elements seed\n";
    exit(1);
  }

  ss.str(argv[1]);
  if (!(ss >> n)) { cerr << "usage: rev_1 number-of-elements seed\n"; exit(1); }

  ss.clear(); ss.str(argv[2]);
  if (!(ss >> seed)) { cerr << "usage: rev_1 number-of-elements seed\n"; exit(1); }

  if (n <= 0) exit(0);

  srand48(seed);

  for (i = 0; i < n; i++) v.push_back(drand48());

  if (n < 10) {
    for (i = 0; i < n; i++) printf("%6.4lf\n", v[i]);
  }

  reverse(v);
  printf("\n");

  if (n < 10) {
    for (i = 0; i < n; i++) printf("%6.4lf\n", v[i]);
  }
  exit(0);
}
  

This is very much like the previous program, only the procedure reverses the values in the vector. It uses a reference parameter -- otherwise, the values in v would be unmodified after the call. We can verify that it works with a simple example:

UNIX> rev_1 3 1
0.0416
0.4545
0.8348

0.8348
0.4545
0.0416
UNIX> 
Suppose we didn't want to use a reference parameter (and we didn't know about pointers yet). Then we'd have to write something like the following for reverse:

vector <double> reverse(vector <double> v)
{
  vector <double> rv;
  int i;

  for (i = v.size()-1; i >= 0; i--) rv.push_back(v[i]);
  return rv;
}

And we'd call it with:

  v = reverse(v);

This is done in the program rev_2.cpp. Again, we can verify that it works with a small value:

UNIX> rev_2 4 1
0.0416
0.4545
0.8348
0.3360

0.3360
0.8348
0.4545
0.0416
UNIX> 
As before, let's time them on 50 million doubles:
UNIX> time rev_1 50000000 1

1.520u 0.970s 0:03.05 81.6%	0+0k 0+0io 0pf+0w
UNIX> time rev_2 50000000 1

2.000u 2.230s 0:08.16 51.8%	0+0k 328+8io 2pf+0w
UNIX> 
Once again, the copies matter. Using the reference parameter (rev_1) takes 37.4 percent of the time as making the copy and returning the reversed vector (rev_2).

Caution about using reference parameters

If you use reference parameters, it should be for the right reason (the one above). There is no reason to use a reference parameter for a small value like an integer or a double. None. You use it when your data structure (like those vectors) can be large.