When you are writing/using procedures, there are some things that you should keep in mind about how parameters are passed. That is the focus of this lecture.
I'm going to motivate this with an example that comes from my life. My wife and I have our phones store pictures into the same Dropbox folder. That's convenient, because I can get them on my laptop easily. There is an issue, though -- my wife's phone names the files one way, and my phone names them another. In the file data/filenames.txt, there is a listing of a few pictures from July and August, 2019.
2019-07-13 00.09.11.jpg 2019-07-14 21.57.36.jpg 2019-07-17 09.28.48.jpg 2019-07-26 19.06.20.jpg 2019-07-29 17.51.39.jpg 2019-07-31 16.02.56.jpg 2019-07-31 16.04.31.jpg 2019-07-31 16.05.08.jpg 2019-08-04 21.35.23.jpg 2019-08-05 14.10.06.jpg 2019-08-06 15.31.40.jpg Photo Aug 02, 10 22 21 PM.jpg Photo Aug 03, 9 25 04 PM.jpg Photo Aug 04, 3 03 33 PM.jpg Photo Aug 10, 9 56 09 AM.jpg Photo Aug 10, 9 59 07 AM.jpg Photo Aug 21, 9 27 02 AM.jpg Photo Aug 21, 9 35 04 AM.jpg Photo Jul 13, 9 57 54 AM.jpg Photo Jul 19, 10 35 39 PM.jpg |
We're going to write a program that will read filenames on standard input, and then it does one of two things -- if the filename begins with "2019", then it will emit the shell command to move the file to one with the same name, but with the space removed. If the filename begins with "Photo", then it will emit the shell command that converts it into the same format as the "2019" files. Here's the C++ code -- the procedures make it all quite easy to read, I think. (It's in src/move_filenames_1.cpp).
#include <cstdlib> #include <cstdio> #include <sstream> #include <iostream> using namespace std; /* This emits the shell command to move the files that start with 2019. We implement it by finding the space in the line. Then we print "mv 'line'", change the space to a hyphen, and finally "line\n". */ void move_2019(string line) { int i; i = line.find(' '); if (i == string::npos) { fprintf(stderr, "move_2019() - the line of text has no space.\n"); exit(1); } printf("mv '%s'", line.c_str()); line[i] = '-'; printf(" %s\n", line.c_str()); } /* move_photo() assumes that the line is like "Photo Aug 04, 3 03 33 PM.jpg". It uses an istringstream to read all of those values, and then emits the shell command to move the photo to one like "2019-08-04-15.03.33.jpg" */ void move_photo(string line) { string p, month, comma, ampm; int day, hour, minute, second; int m; istringstream ss; ss.str(line); if (ss >> p >> month >> day >> comma >> hour >> minute >> second >> ampm) { m = (month == "Jul") ? 7 : 8; if (ampm.substr(0, 2) == "PM") hour += 12; printf("mv '%s' 2019-%02d-%02d-%02d.%02d.%02d.jpg\n", line.c_str(), m, day, hour, minute, second); } else { fprintf(stderr, "move_2019() - line in the wrong format.\n"); exit(1); } } /* The main is pretty simple. It reads in a line of text, and if the line begins with "2019", it calls move_2019(). Otherwise, it calls movo_photo(). */ int main() { string l; while (getline(cin, l)) { if (l.substr(0, 4) == "2019") { move_2019(l); } else { move_photo(l); } } return 0; } |
I'm going to highlight just a few things about this program. First, move_2019() uses the find() method of strings to find the space in the string. If there's no space, then find() returns "string::npos". You may not have known that, so now you do.
Second, both move_photo() and main() use the substr() method of strings to extract a substring from a string. The first argument is the starting character of the substring, and the second argument is the size. That's how you check that ampm starts with "PM".
Third, you'll note that in move_photo() I explicitly read the comma after the day. If I didn't do that, then the rest of the "ss >> p >> ..." would fail.
This works very nicely, and enables me to have all of the photo files have the same format:
UNIX> bin/move_filenames_1 < data/filenames.txt mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg mv '2019-07-14 21.57.36.jpg' 2019-07-14-21.57.36.jpg mv '2019-07-17 09.28.48.jpg' 2019-07-17-09.28.48.jpg mv '2019-07-26 19.06.20.jpg' 2019-07-26-19.06.20.jpg mv '2019-07-29 17.51.39.jpg' 2019-07-29-17.51.39.jpg mv '2019-07-31 16.02.56.jpg' 2019-07-31-16.02.56.jpg mv '2019-07-31 16.04.31.jpg' 2019-07-31-16.04.31.jpg mv '2019-07-31 16.05.08.jpg' 2019-07-31-16.05.08.jpg mv '2019-08-04 21.35.23.jpg' 2019-08-04-21.35.23.jpg mv '2019-08-05 14.10.06.jpg' 2019-08-05-14.10.06.jpg mv '2019-08-06 15.31.40.jpg' 2019-08-06-15.31.40.jpg mv 'Photo Aug 02, 10 22 21 PM.jpg' 2019-08-02-22.22.21.jpg mv 'Photo Aug 03, 9 25 04 PM.jpg' 2019-08-03-21.25.04.jpg mv 'Photo Aug 04, 3 03 33 PM.jpg' 2019-08-04-15.03.33.jpg mv 'Photo Aug 10, 9 56 09 AM.jpg' 2019-08-10-09.56.09.jpg mv 'Photo Aug 10, 9 59 07 AM.jpg' 2019-08-10-09.59.07.jpg mv 'Photo Aug 21, 9 27 02 AM.jpg' 2019-08-21-09.27.02.jpg mv 'Photo Aug 21, 9 35 04 AM.jpg' 2019-08-21-09.35.04.jpg mv 'Photo Jul 13, 9 57 54 AM.jpg' 2019-07-13-09.57.54.jpg mv 'Photo Jul 19, 10 35 39 PM.jpg' 2019-07-19-22.35.39.jpg UNIX>
Here it is on my Mac:
UNIX> g++ src/move_filenames_2.cpp move_filenames_2.cpp:16:7: error: use of undeclared identifier 'move_2019' move_2019(l); ^ move_filenames_2.cpp:18:7: error: use of undeclared identifier 'move_photo' move_photo(l); ^ 2 errors generated. UNIX>and here it is on one of the hydra machines:
UNIX> g++ src/move_filenames_2.cpp move_filenames_2.cpp:16:7: error: 'move_2019' was not declared in this scope move_2019(l); ^ move_filenames_2.cpp:18:7: error: 'move_2019' was not declared in this scope move_photo(l); ^ UNIX>When you don't define a procedure before it is being used (or you define it in another file), you need to specify its prototype. This is the part of the procedure before the opening curly brace, and instead of the curly brace, you have a semi-colon.
I have done this in src/move_filenames_3.cpp. Here are the two lines that I added before main():
void move_2019(string line); void move_photo(string line); |
The compiler works fine on this program.
Just to hammer this home, src/move_filenames_4.cpp is the same as src/move_filenames_1.cpp, except in main(), at the end of the while loop, it prints the original filename. Here's the main(), which is the only part that differs from src/move_filenames_1.cpp:
int main() { string l; while (getline(cin, l)) { if (l.substr(0, 4) == "2019") { move_2019(l); } else { move_photo(l); } printf("The line of text: %s\n", l.c_str()); // Here's the new line of code. } return 0; } |
Let's run it on the first line of data/filenames.txt, just to confirm that we know what it's doing:
UNIX> head -n 1 data/filenames.txt | bin/move_filenames_4 mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg The line of text: 2019-07-13 00.09.11.jpg UNIX>Now, in C++, we are allowed to specify that we want to pass a parameter using "call by reference." This is done by putting an ampersand in front of the variable in the procedure declaration. When this happens, the parameter is not copied, but instead the the procedure actually acts on what it was called with. Let me illustrate, by changing move_2019() to use a reference parameter. in src/move_filenames_5.cpp:
void move_2019(string &line) // This line is the only change. { size_t i; i = line.find(' '); if (i == string::npos) { fprintf(stderr, "move_2019() - the line of text has no space.\n"); exit(1); } printf("mv '%s'", line.c_str()); line[i] = '-'; printf(" %s\n", line.c_str()); } |
Here's where it's called, in main:
int main() { string l; while (getline(cin, l)) { if (l.substr(0, 4) == "2019") { move_2019(l); // move_2019() acts on l and not a copy, so when it changes l, l is changed. } else { move_photo(l); } printf("The line of text: %s\n", l.c_str()); } return 0; } |
When we run it, you'll notice that now the line of text is changed after the call to move_2019() (I put the changed line in blue - the space has been changed to a hyphen):
UNIX> head -n 1 data/filenames.txt | bin/move_filenames_5 mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg The line of text: 2019-07-13-00.09.11.jpg UNIX>I understand that this is a subtle change, but it's an important one for you do know. Now, why do we have reference parameters? Two reasons:
void move_photo(const string &line) // This is the only line that is changed. |
Now, when we call it, it doesn't make a copy, but because of the const keyword, the compiler has assured us that move_photo() has not changed line. I won't bother running it, because you won't see anything exciting.
If I try to have move_2019() use the const keyword, as in src/move_filenames_7.cpp, the compiler will give me an error, because move_2019() does, in fact, change its parameter:
void move_2019(const string &line) // This is the only line that has changed. { size_t i; i = line.find(' '); if (i == string::npos) { fprintf(stderr, "move_2019() - the line of text has no space.\n"); exit(1); } printf("mv '%s'", line.c_str()); line[i] = '-'; // Here's where we modify line, and why the compiler complains. printf(" %s\n", line.c_str()); } |
Here's what happens when we try to compile:
UNIX> g++ src/move_filenames_7.cpp move_filenames_7.cpp:20:11: error: cannot assign to return value because function 'operator[]' returns a const value line[i] = '-'; // Here's where we modify line, and why the compiler complains. ~~~~~~~ ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/.bin/include/c++/v1/string:1460:31: note: function 'operator[]' which returns const-qualified type 'const_reference' (aka 'const char &') declared here _LIBCPP_INLINE_VISIBILITY const_reference operator[](size_type __pos) const; ^~~~~~~~~~~~~~~ 1 error generated. UNIX>Let's change move_2019() so that it doesn't modify its parameter. Here, we'll use the substr() method of strings to make copies of line before and after the space. This is in src/move_filenames_8.cpp:
void move_2019(const string &line) // Since this code doesn't change line, it compiles. { size_t i; string w1, w2; i = line.find(' '); if (i == string::npos) { fprintf(stderr, "move_2019() - the line of text has no space.\n"); exit(1); } w1 = line.substr(0, i); w2 = line.substr(i+1); cout << "mv '" << w1 << " " << w2 << "' " << w1 << "-" << w2 << endl; } |
Now, it compiles and runs without changing the original line (you'll see that the space is back in the blue text):
UNIX> head -n 1 data/filenames.txt | bin/move_filenames_8 mv '2019-07-13 00.09.11.jpg' 2019-07-13-00.09.11.jpg The line of text: 2019-07-13 00.09.11.jpg UNIX>If you want far more text than you care to read about "call by value" and "call by reference," read https://en.wikipedia.org/wiki/Evaluation_strategy.
/* This program defines four procedures -- total(), avg(), max() and man() -- that return single values calculated from a vector of doubles. None of them use reference parameters, so each one of them makes a copy of the vector, which is expensive. To test this, I shove some randomish values into a large vector, and then call all of them. */ #include <cstdlib> #include <cstdio> #include <vector> #include <sstream> #include <iostream> using namespace std; /* This returns the total of the values. */ double total(vector <double> v) { size_t i; double t; t = 0; for (i = 0; i < v.size(); i++) t += v[i]; return t; } /* This returns the average of the values. It does it by calling total to sum the values. */ double avg(vector <double> v) { double size; size = v.size(); return total(v)/size; } /* This returns the maximum of the values. */ double max(vector <double> v) { size_t i; double mx; mx = v[0]; for (i = 1; i < v.size(); i++) if (v[i] > mx) mx = v[i]; return mx; } /* This returns the minimum of the values. */ double min(vector <double> v) { size_t i; double mn; mn = v[0]; for (i = 1; i < v.size(); i++) if (v[i] < mn) mn = v[i]; return mn; } |
For the main(), I read the size of the vector on the command line, and then create a randomish vector with that many values. Read the code if you care how I create the values -- they are random-ish between 1 and ten. Finally, I call the four procedures and return the results:
/* We call main() with the number of values in the vector. We then create the vector, and call all of the procedures. */ int main(int argc, char **argv) { int i; int n; istringstream ss; vector <double> v; double val; /* Parse the command line. */ if (argc != 2) { cerr << "usage: total_etc_1 number-of-elements\n"; exit(1); } ss.str(argv[1]); if (!(ss >> n)) { cerr << "usage: total_etc_1 number-of-elements\n"; exit(1); } /* Create the vector -- I'm not using a random number generator here -- I'm just starting with val = 10*1/7 and repeatedly squaring it, dividing by ten with it exceeds ten. That will keep the values between 1 and 10, but looking kinda random. */ val = 1/7.0 * 10.0; for (i = 0; i < n; i++) { v.push_back(val); val *= val; if (val > 10) val /= 10.0; } /* Print the values if there are fewer than 10 */ if (n < 10) { for (i = 0; i < n; i++) printf("%6.4lf\n", v[i]); printf("\n"); } /* Call the procedures and print the results. */ printf("Total: %12.4lf\n", total(v)); printf("Avg: %12.4lf\n", avg(v)); printf("Max: %12.4lf\n", max(v)); printf("Min: %12.4lf\n", min(v)); return 0; } |
Let's make sure it works -- we'll call it with a small value of four:
UNIX> bin/total_etc_1 4 1.4286 2.0408 4.1649 1.7347 Total: 9.3690 Avg: 2.3422 Max: 4.1649 Min: 1.4286 UNIX>We can verify by hand that these are all correct. Now, let's time calling it a really big value:
UNIX> time bin/total_etc_1 50000000 Total: 195385228.1964 Avg: 3.9077 Max: 10.0000 Min: 1.0000 3.736u 0.808s 0:04.55 99.5% 0+0k 0+0io 0pf+0w UNIX>Using "time" has the shell print three times:
Now, in src/total_etc_2.cpp, I have changed all of the parameters to const reference parameters. For example, here's total():
double total(const vector <double> &v) { size_t i; double t; t = 0; for (i = 0; i < v.size(); i++) t += v[i]; return t; } |
Now, when total() is called, it doesn't make a copy of v, and the const keyword assures us that total() does not modify v. Therefore, there is no real difference between total_etc_1.cpp and total_etc_2.cpp, except that we're not copying that vector so much. Accordingly, when we run it, it is a lot faster -- nearly a factor of two!
UNIX> time bin/total_etc_2 50000000 Total: 195385228.1964 Avg: 3.9077 Max: 10.0000 Min: 1.0000 2.212u 0.215s 0:02.44 99.1% 0+0k 0+0io 0pf+0w UNIX>