CS302 Lecture Notes - STL Review with some Topcoder Problems


Recalling the STL

In COSC140/COSC202, we went over the following data structures from the C++ Standard Template Library: In this lecture, we go over some examples of vectors, sets and maps, and we also reiterate some strings/stream functionalities.

Taking the Topcoder out of Topcoder (as of 2020)

Since Topcoder's servers are sporadic, I have made these lecture notes independent of Topcoder. The organization of the code is different from Topcoder as well, because I'd rather adhere to a more structured coding format with include files and compilation into object files. In each case, you can implement this yourself in src/problem.cpp, and the makefile will let you compile it. You can test yourself with scripts/test_problem.sh and check yourself against the correct answers. The outputs should be identical.

SRM 347, D2, 250-pointer: CarBuyer

The problem's original description may be found on Topcoder's web servers at: http://community.topcoder.com/stat?c=problem_statement&pm=7652&rd=10671. I make it a lot pithier in class, but I summarize the problem precisely as follows:

Examples

Example 0:
---------
fuel_price: 2
annual_distance: 5000
years: 2
Car     0: 10000 50 50
Car     1: 12000 500 10
Car     2: 15000 100 65
Car     3: 20000 20 80
Car     4: 25000 10 90
Return value: 10500.00
Example 1:
---------
fuel_price: 8
annual_distance: 25000
years: 10
Car     0: 10000 50 50
Car     1: 12000 500 10
Car     2: 15000 100 65
Car     3: 20000 20 80
Car     4: 25000 10 90
Return value: 45200.0
Example 2:
---------
fuel_price: 33
annual_distance: 8673
years: 64
Car     0: 8426 774 19
Car     1: 29709 325 31
Car     2: 30783 853 68
Car     3: 27726 4 81
Car     4: 20788 369 69
Car     5: 17554 359 34
Car     6: 6264 230 69
Car     7: 14151 420 65
Car     8: 25115 528 70
Car     9: 2050 926 40
Car    10: 18618 714 29
Car    11: 173 358 57
Return value: 254122.444444


Header File and Main

I have included a header file in include/car_buyer.hpp:

/* CS302 Header file for Topcoder SRM 347, D2, 250-Pointer: CarBuyer. */

#include <string>
#include <vector>

class CarBuyer {
  public:
    double lowestCost(const std::vector <std::string> &cars, 
                      int fuel_price, 
                      int annual_distance, 
                      int years);
};

Your job is to implement the class.

I have a skeleton implementation in src/car_buyer.cpp. It simply prints its arguments and returns zero.

I also have a driver program in src/car_buyer_main.cpp. If you call it with an argument of 0, 1 or 2, it does that topcoder example. If you call it with "-", then you enter fuel_price, annual_distance, and years, and then the information for the cars. If you call it with a different numerical argument, then it uses the number to seed a random number generator, and it generates random input, which can be quite large.

You can use the makefile to compile:

UNIX> make bin/car_buyer
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/car_buyer.o src/car_buyer.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/car_buyer_main.o src/car_buyer_main.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/car_buyer obj/car_buyer.o obj/car_buyer_main.o
UNIX> bin/car_buyer 0   # This is example 0.
fuel_price: 2
annual_distance: 5000
years: 2
Car     0: 10000 50 50
Car     1: 12000 500 10
Car     2: 15000 100 65
Car     3: 20000 20 80
Car     4: 25000 10 90
0.000000                    # It always returns zero.
UNIX> echo 4 100 5 10000 25 28 | bin/car_buyer -  # Here I enter input on standard input:
fuel_price: 4
annual_distance: 100
years: 5
Car     0: 10000 25 28
0.000000
UNIX>
When you type make, it will compile the following programs:

Testing Yourself

If you want to implement this, then simply edit src/car_buyer.cpp. When you're done, you can "git stash" to restore it to what it was (or do what I do, and copy this whole directory to a temporary directory, and then delete the directory when you're done.

To test yourself, use the shell script scripts/test_car_buyer.sh. You run it using the name of your executable, and it runs 50 tests, printing only the last line of each test. Here it is on the skeleton implementation:

UNIX> sh scripts/test_car_buyer.sh bin/car_buyer | head -n 5
0.000000
0.000000
0.000000
0.000000
0.000000
UNIX> 
I have the correct output in txt/car_buyer_answers.txt. When you solve this, compare your answer to this using diff or openssl md5.

CarBuyer: A solution with stringstreams

The reason I did this problem was to review stringstreams. Solving this problem is pretty straightforward. First, convert all of those ints to doubles, since you'll be using floating point. Next, for each string in cars, use a stringstream to extract the price, tax and efficiency. Then calculate the cost and store the minimum.

Some smaller points to remember:

Here's the code, in src/car_buyer_stringstream.cpp:

#include "car_buyer.hpp"
#include <string>
#include <vector>
#include <sstream>
using namespace std;

/* This program uses a stringstream to extract the values from each of
   the car strings.  I then calculate the projected cost of each car,
   and return the minimum.  I convert all integers to doubles, because
   we will be performing floating point operations. */

double CarBuyer::lowestCost(const vector <string> &cars, 
                            int fuelPrice, 
                            int annualDistance, 
                            int years)
{
  size_t i;
  istringstream ss;
  double fp, ad, y, price, tax, efficiency, cost, min;
 
  /* Convert all of those integers to doubles. */

  fp = fuelPrice;
  ad = annualDistance; 
  y = years;

  /* Use a stringstream to extract the values from each string, 
     and compute the cost.  Keep track of the minimum cost. */

  min = -1;  /* I don't technically need this line, but some compilers
                will yell at me if I don't have it. */

  for (i = 0; i < cars.size(); i++) {
    ss.clear();
    ss.str(cars[i]);
    ss >> price >> tax >> efficiency;
    cost = price + (y * tax) + (y * ad * fp / efficiency);
    if (i == 0 || cost < min) min = cost;
  }
  return min;
}

Here it is in action. You can verify that it gets the three examples correct, and that it matches txt/car_buyer_answers.txt:

UNIX> make bin/car_buyer_stringstream
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/car_buyer_stringstream.o src/car_buyer_stringstream.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/car_buyer_main.o src/car_buyer_main.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/car_buyer_stringstream obj/car_buyer_stringstream.o obj/car_buyer_main.o
UNIX> bin/car_buyer_stringstream 0                                                    # Do examples 0, 1 and 2
10500.000000
UNIX> bin/car_buyer_stringstream 1
45200.000000
UNIX> bin/car_buyer_stringstream 2
254122.444444
UNIX> sh scripts/test_car_buyer.sh bin/car_buyer_stringstream | head -n 5             # See the first 5 answers when you run the testing script
10500.000000
45200.000000
254122.444444
2217138.040000
236390.000000
UNIX> sh scripts/test_car_buyer.sh bin/car_buyer_stringstream | openssl md5           # Show that the testing script matches the correct answers.
(stdin)= 88113b1f919987b342786171bca7ff09
UNIX> openssl md5 txt/car_buyer_answers.txt 
MD5(txt/car_buyer_answers.txt)= 88113b1f919987b342786171bca7ff09
UNIX> 

CarBuyer: A solution with sscanf()

In class, I may have showed you how to use sscanf() in place of the stringstream. Here's that code. Since sscanf() does not use reference variables, you have to pass pointers that get filled in. I personally prefer sscanf() to stringstreams, but that's mostly because I'm old and have been using sscanf() for 30+ years. The return value of sscanf() tells you how many matches were made correctly. In this program, that would be three, since Topcoder guarantees that your input is correct. (This code is in src/car_buyer_sscanf.cpp):

#include "car_buyer.hpp"
#include <string>
#include <vector>
#include <sstream>
using namespace std;

/* This program uses sscanf() instead of the stringstream. */

double CarBuyer::lowestCost(const vector <string> &cars, 
                            int fuelPrice, 
                            int annualDistance, 
                            int years)
{
  size_t i;
  double fp, ad, y, price, tax, efficiency, cost, min;
 
  fp = fuelPrice;
  ad = annualDistance; 
  y = years;
  min = -1;

  for (i = 0; i < cars.size(); i++) {
    sscanf(cars[i].c_str(), "%lf %lf %lf", &price, &tax, &efficiency);
    cost = price + (y * tax) + (y * ad * fp / efficiency);
    if (i == 0 || cost < min) min = cost;
  }
  return min;
}

As you can see, the output of scripts/test_car_buyer.sh, when piped to "openssl md5" is the same as before. That means that the outputs are identical (with exceptionally high probability, if you want to get technical about it):

UNIX> make bin/car_buyer_sscanf
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/car_buyer_sscanf.o src/car_buyer_sscanf.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/car_buyer_sscanf obj/car_buyer_sscanf.o obj/car_buyer_main.o
UNIX> bin/car_buyer_sscanf 0
10500.000000
UNIX> bin/car_buyer_sscanf 1
45200.000000
UNIX> bin/car_buyer_sscanf 2
254122.444444
UNIX> sh scripts/test_car_buyer.sh bin/car_buyer_sscanf | openssl md5
(stdin)= 88113b1f919987b342786171bca7ff09
UNIX> 

CarBuyer: Bottom Lines From This Program

This was just about input processing: Practice with stringstreams and sscanf(). It is good practice for you to write these yourself and test that it's correct.

SRM 551, D2, 250-Pointer: ColorfulBricks

The problem's original description may be found on Topcoder's web servers at: http://community.topcoder.com/stat?c=problem_statement&pm=12136&rd=15173. I will summarize the problem as follows:

Examples

Example 0:
---------
bricks: "ABAB"
Return value: 2

The strings are
"AABB" and "BBAA".
Example 1:
---------
bricks: "AAA"
Return value: 1

There is only one way
to arrange the string,
and it is "nice."
Example 2:
---------
bricks: "WXYZ"
Return value: 0


Header File and Main

The header file for the problem is in include/colorful_bricks.hpp:

/* Header file for Topcoder SRM 551, D2, 250-Pointer: ColorfulBricks */

#include <string>

class ColorfulBricks {
  public:
    int countLayouts(const std::string &bricks);
};

Here are other files for working through this problem:

.

Solution 1: Sorting the string to count the distinct letters.

It should be pretty clear that the solution here is to determine the number of distinct letters in bricks. One distinct letter means one nice string. Two distinct letters mean two nice strings. More than two distinct letters means no nice strings.

Now, determining the number of distinct letters is an easy matter, and there are a lot of ways to write ColorfulBricks. One of the easiest is to sort bricks, and then count the number of times adjacent letters don't equal each other. That's in src/colorful_1_sort.cpp. Remember, you need to include algorithm to use the STL's sort() procedure:

/* ColorfulBricks solution #1: Sort the bricks, and determine how
   many adjacent characters differ. */ 

#include "colorful_bricks.hpp"
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;

int ColorfulBricks::countLayouts(const string &bricks)
{
  size_t i;
  int nc;
  string sorted_bricks; // Since bricks is const, I have to make a copy to sort.

  /* Sort the bricks using sort() from the C++ algorithms library. */

  sorted_bricks = bricks;
  sort(sorted_bricks.begin(), sorted_bricks.end());

  /* Determine how many different characters there are by
     examining adjacent characters in the sorted string. */

  nc =  1;
  for (i = 1; i < sorted_bricks.size(); i++) {
    if (sorted_bricks[i] != sorted_bricks[i-1]) nc++;
  }

  /* There are three outcomes: 
      - One character = one nice string.
      - Two characters = two nice strings.
      - More than two characters = zero nice strings. */

  if (nc == 1) return 1;
  if (nc == 2) return 2;
  return 0;
}

As highlighted in the comments, you have to make a copy of bricks in order to sort it. Keep that in the back of your mind. Let's compile and test:

UNIX> make bin/colorful_1_sort
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/colorful_1_sort.o src/colorful_1_sort.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/colorful_bricks_main.o src/colorful_bricks_main.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/colorful_1_sort obj/colorful_1_sort.o obj/colorful_bricks_main.o
UNIX> bin/colorful_1_sort 0       # The examples match topcoder.
2                                 # It really doesn't take us much testing to convince
UNIX> bin/colorful_1_sort 1       # ourselves that this is working
1
UNIX> bin/colorful_1_sort 2
0                                 # The test script does 300 tests:
UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_1_sort | wc
300                               # Below we show that the program is correct
UNIX> openssl md5 txt/colorful_bricks_answer.txt 
MD5(txt/colorful_bricks_answer.txt)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_1_sort | openssl md5
(stdin)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_1_sort > /dev/null"
                       # I'm using bash as my shell, so this is the timing output.
real	0m1.399s        # Does that seem slow or fast to you?
user	0m1.098s
sys	0m0.489s
UNIX> 

Solution #2: Using a set to count distinct letters

Alternatively, you can use a set to store the characters in bricks, and the size of the set will be the number of distinct characters. This is in src/colorful_2_set.cpp:

/* Here we create a set of the characters in bricks.  Since
   sets do not store duplicate elements, the size of the set
   will equal the number of distinct characters. */

#include "colorful_bricks.hpp"
#include <string>
#include <set>
#include <iostream>
using namespace std;

int ColorfulBricks::countLayouts(const string &bricks)
{
  size_t i;
  set <char> s;

  for (i = 0; i < bricks.size(); i++) s.insert(bricks[i]);
  if (s.size() == 1) return 1;
  if (s.size() == 2) return 2;
  return 0;
}

Remember, sets don't store duplicate values and maps don't store duplicate keys -- use multisets and multimaps if you want that functionality. Here, we don't want duplicates, so a set is what we want. We use the MD5 hash of the testing script to determine that this program works the same as src/colorful_1_sort.cpp. It's slower, though:

UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_2_set | openssl md5
(stdin)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_2_set > /dev/null"
real	0m1.868s
user	0m1.555s
sys	0m0.506s
UNIX> 

Solution #3: Using an unordered_set

We don't care that our data structure is sorted, so we can use an unordered_set instead of a set. That should be faster, because O(n log n) is reduced to O(n). I won't show the code -- it's exactly the same as above, except the set has been changed to an unordered_set. The code is in src/colorful_3_uset.cpp.
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_3_uset > /dev/null"
real	0m1.930s           # I really thought this would be faster. 
user	0m1.619s
sys	0m0.502s
UNIX> 
Can you think of a reason why this solution would be slower than using a set? The only thing I can come up with is that perhaps their initial hash table is decently big, say, 500 elements. That would explain why this is slower than the previous solution. I'm not saying that this is the reason, but it is plausible.

Solution #4: Using a map for convenience

You can use a map, too, instead of a set, which allows you to leverage the syntax where you can treat the map like an associative array. The code is in src/colorful_4_map.cpp:

/* This solution is like the set, but uses a map because the
   syntax is easier.  The map counts occurrences of each 
   character, but that's not used to solve the problem. */

#include "colorful_bricks.hpp"
#include <string>
#include <map>
#include <iostream>
using namespace std;

int ColorfulBricks::countLayouts(const string &bricks)
{
  size_t i;
  map <char, int> s;

  for (i = 0; i < bricks.size(); i++) s[bricks[i]]++;
  if (s.size() == 1) return 1;
  if (s.size() == 2) return 2;
  return 0;
}

It works, and is faster than the set, but slower than sorting:

UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_4_map | openssl md5
(stdin)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_4_map > /dev/null"
real	0m2.040s
user	0m1.723s
sys	0m0.508s
UNIX> 

Solutions #5 & #6: Using a vector to count the number of strings

You can instead leverage the fact that there are only 26 potential values of bricks[i], to use a vector. In the code below, (src/colorful_5_vec.cpp), we have a vector s, where we set s[i] to one if ('A'+i) is in the string: Then you count up the elements of s which equal one:

/* In this implementation, we maintain an array indexed by characters,
   and set a character's value to 1 when we see it.  At the end, we count
   up the characters that have ones to determine the answer. */

#include "colorful_bricks.hpp"
#include <string>
#include <vector>
#include <iostream>
using namespace std;

int ColorfulBricks::countLayouts(const string &bricks)
{
  size_t i;
  int nc;
  vector <int> s;

  s.resize(26, 0);
  nc = 0;
  for (i = 0; i < bricks.size(); i++) s[bricks[i]-'A'] = 1;
  for (i = 0; i < 26; i++) if (s[i] != 0) nc++;
  if (nc == 1) return 1;
  if (nc == 2) return 2;
  return 0;
}

You can tweak the above to use the characters themselves as indices into s (src/colorful_6_vec.cpp). The maximum character that you'll see is 'Z', so that's why we resize s to be ('Z'+1).

  s.resize('Z'+1, 0);
  nc = 0;
  for (i = 0; i < bricks.size(); i++) s[bricks[i]] = 1;
  for (i = 'A'; i <= 'Z'; i++) if (s[i] != 0) nc++;
  if (nc == 1) return 1;
  if (nc == 2) return 2;
  return 0;
}

I'm not sure which is easier to read -- this program or the last one. They are both about the same to me. They are both correct, though, and run at roughly the same speed:

UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_5_vec | openssl md5
(stdin)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_6_vec | openssl md5
(stdin)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_5_vec > /dev/null"
real	0m1.306s
user	0m1.003s
sys	0m0.496s
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_6_vec > /dev/null"
real	0m1.299s
user	0m0.994s
sys	0m0.493s
UNIX> 

Solution 7: No Data Structures -- Just Two Variables

Finally, you can simply use one variable to keep track of the first letter and another to keep track of the second. If you read a character that is different from these, then return zero. This is in src/colorful_7_two_chars.cpp:

/* This is the simplest implementation, data-structure wise -- we just
   keep track of two characters -- the first one that we encounter, and
   the second one that we encounter that is different from the first.
   If we see a third character, we're done and return zero.  Otherwise, 
   we return 1 or 2 depending on whether or not we saw a second character. */

#include "colorful_bricks.hpp"
#include <string>
#include <iostream>
using namespace std;

int ColorfulBricks::countLayouts(const string &bricks)
{
  size_t i;
  int n_distinct;
  char c1, c2;

  /* c1 is the first character.
     c2 is the second character.
     n_distinct is the number of distinct characters we have seen so far. */

  c1 = bricks[0];
  n_distinct = 1;
  
  for (i = 1; i < bricks.size(); i++) {
    if (n_distinct == 1 && bricks[i] != c1) {
      n_distinct = 2;
      c2 = bricks[i];
    } else if (n_distinct == 2 && bricks[i] != c1 && bricks[i] != c2) return 0;
  }

  if (n_distinct == 1) return 1;
  if (n_distinct == 2) return 2;
  return 0;
}

This one, of course, works as well, and it is the fastest, but not by much:

UNIX> sh scripts/test_colorful_bricks.sh bin/colorful_7_two_chars | openssl md5
(stdin)= b3f7a19dc41d8e46a7ffa7cea70daca9
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_7_two_chars > /dev/null"

real	0m1.185s
user	0m0.882s
sys	0m0.486s
UNIX> 

Evaluating all of those solutions

They all work, so which is the best? I'll hedge a bit. First, let's think about running time and space complexity. Suppose that there are n elements in bricks. Then here are the running times of the various solutions (where n is the number of characters in the string): To create the graph below, I also wrote src/colorful_0_control.cpp, which does nothing. In that way, I can subtract out everything but solving the problem, to better compare the solutions. To be specific, let's consider bin/colorful_1_sort. I first time bin/colorful_0_control, and then bin/colorful_1_sort:
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_0_control > /dev/null"

real	0m1.049s
user	0m0.750s
sys	0m0.488s
UNIX> time sh -c "sh scripts/test_colorful_bricks.sh bin/colorful_1_sort > /dev/null"

real	0m1.398s
user	0m1.090s
sys	0m0.498s
UNIX>
I subtract the user times, to say that bin/colorful_1_sort took 0.349 seconds to solve the problem. Here are all of the solutions:

Back to my hedging -- which on is best? Well, I'm hedging because you need to consider multiple factors when you ask about the "best." Speed is one thing. Memory is another (the last one wins clearly on both counts). Readability is a third (I think the set solution is the most readable). And of course, in Topcoder, speed of programming and disinclination for bugs is also important. Since the Topcoder constraints ensure that all implementations will be super-fast (n is less than or equal to 50), these last considerations may be the most important, in which case the one that uses sorting is the best (in my opinion). That's what I used when I solved this one for fun (it took 3 minutes for 246.70 points).

Regardless of which one is "best," you should be able to tell me the running time of all of these, and you should be able to discuss their various characteristics.


Bottom Lines From This Program

There are many ways to solve a problem. The "best" way depends on what is important for the problem at hand, such as:

SRM 353, D1, 250-pointer: Glossary

The problem description is on Topcoder's web site at: http://community.topcoder.com/stat?c=problem_statement&pm=7838&rd=10710, but you can read the description below if you want. I know from experience that y'all hate these problems, because there's so much detail in the formatting. However specifications are specifications. What I'm going to do is build this one up and show you how I'd attack it. There's a temptation to do everything at once -- build a big data structure and then traverse it and print. That really doesn't work well here. Instead, it's better to go small, and build up, testing as you go.

Here's a summary of the problem:

So, in Topcoder example 0, you have:

terms = {"Canada", "France", "Germany", "Italy", "Japan", "Russia", 
         "United Kingdom", "United States"}

And the output needs to look like:

{"C                    R                  ",
 "-------------------  -------------------",
 "  Canada               Russia           ",
 "F                    U                  ",
 "-------------------  -------------------",
 "  France               United Kingdom   ",
 "G                      United States    ",
 "-------------------                     ",
 "  Germany                               ",
 "I                                       ",
 "-------------------                     ",
 "  Italy                                 ",
 "J                                       ",
 "-------------------                     ",
 "  Japan                                 " }

Example two is another good one. Here, the terms are:

terms = {"AVL tree", "backtracking", "array", "balanced tree", "binary search"}

and the output is:

{"A                                       ",
 "-------------------                     ",
 "  array                                 ",
 "  AVL tree                              ",
 "B                                       ",
 "-------------------                     ",
 "  backtracking                          ",
 "  balanced tree                         ",
 "  binary search                         " }


Header and Main

The header is in include/glossary.hpp:

#include <string>
#include <vector>

class Glossary {
  public:
    std::vector <std::string> buildGlossary(const std::vector <std::string> &items);
};

There is a driver program in src/glossary_main.cpp, which works like the previous ones:

There is a skeleton implementation in src/glossary.cpp, which simply prints the input and exits:
UNIX> bin/glossary 0
Canada
France
Germany
Italy
Japan
Russia
United Kingdom
United States
UNIX> bin/glossary 8
PNiscrus
OxWAPJUF
MycwqFCi
IBsxQTjp
zpIKLEWD
UNIX> 
And there is a testing script and answer file.

Solving the problem

Let's attack the problem:

The first thing to notice is that you have to sort the output, ignoring case. So, the first thing that I'm going to do is read all the terms, convert them to uppercase, and then store them in a map. The key will be the uppercase term, and the val is the original term. Then, I'll print it out to make sure I've got it right. This is in src/glossary_1_sort.cpp

/* In my first pass at a solution, I create a map from the items.  I
   convert each string to upper-case, and then store that string in 
   the map as a key, with the value being the original string.  I
   print it out at the end. */

#include "glossary.hpp"
#include <map>
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;

vector <string> Glossary::buildGlossary(const vector <string> &items)
{
  size_t i, j;
  string s;
  map <string, string> g;
  map <string, string>::iterator git;
  vector <string> rv;

  for (i = 0; i < items.size(); i++) {
    s = items[i];                            // s is the upper-case string.
    for (j = 0; j < s.size(); j++) {
      if (s[j] >= 'a' && s[j] <= 'z') s[j] += ('A'-'a');
    }
    g[s] = items[i];                         // This puts s and items[i] into the map.
  }

  for (git = g.begin(); git != g.end(); git++) {     // Print the map
    cout << git->first << " " << git->second << endl;
  }
  return rv;
}

Remember, the line "g[s] = items[i]" is equivalent to "g.insert(make_pair(s, items[i]))." I return an empty vector so that everything compiles nicely. When we run it, it works as anticipated:

UNIX> make bin/glossary_1_sort
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_1_sort.o src/glossary_1_sort.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_main.o src/glossary_main.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/glossary_1_sort obj/glossary_1_sort.o obj/glossary_main.o
UNIX> bin/glossary_1_sort 0
CANADA Canada
FRANCE France
GERMANY Germany
ITALY Italy
JAPAN Japan
RUSSIA Russia
UNITED KINGDOM United Kingdom
UNITED STATES United States
UNIX> bin/glossary_1_sort 8
IBSXQTJP IBsxQTjp
MYCWQFCI MycwqFCi
OXWAPJUF OxWAPJUF
PNISCRUS PNiscrus
ZPIKLEWD zpIKLEWD
UNIX> 
Next, I'm going to add a second map. It has single characters as its keys, and vectors of strings as vals. These are the glossary entries. Since we create this by traversing g, we can be guaranteed that the vectors are in the proper order. The code is in src/glossary_2_keymap.cpp. Here are the relevant code changes.

vector <string> Glossary::buildGlossary(const vector <string> &items)
{
  size_t i, j;
  string s;
  map <string, string> g;                    // Map with upper-case keys, original strings as vals.
  map <string, string>::iterator git;
  map <char, vector <string> > k;            // Map with first letters as keys
  map <char, vector <string> >::iterator kit;
  vector <string> rv;

  /* Omitted code to create the first map... */

  /* Create the second map from the first letters of the upper-case strings: */

  for (git = g.begin(); git != g.end(); git++) {     
    k[git->first[0]].push_back(git->second);
  }

  for (kit = k.begin(); kit != k.end(); kit++) {     // Print the second map
    for (i = 0; i < kit->second.size(); i++) {
      cout << kit->first << " " << kit->second[i] << endl;
    }
  }
  return rv;
}

We print out this new map and test again:

UNIX> make bin/glossary_2_keymap
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_2_keymap.o src/glossary_2_keymap.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_main.o src/glossary_main.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/glossary_2_keymap obj/glossary_2_keymap.o obj/glossary_main.o
UNIX> bin/glossary_2_keymap 0
C Canada
F France
G Germany
I Italy
J Japan
R Russia
U United Kingdom
U United States
UNIX> bin/glossary_2_keymap 8
I IBsxQTjp
M MycwqFCi
O OxWAPJUF
P PNiscrus
Z zpIKLEWD
UNIX> 
Now, instead of printing out those entries, let's create strings from them that are in the proper format, and then put them into one of two vectors -- a vector for A-M, and a vector for N-Z. This is in src/glossary_3_makestrings.cpp:

/* Here are the new variable declarations: */
  vector <string> atm, ntz;                  // The two string vectors for A-M & N-Z
  vector <string> rv, *v;                    // Return value and a pointer that makes life easier.
  char cs[100];                              // A buffer so we can use sprintf to create strings.


  /* And here is the new code:
     Create the formatted strings for each column.  Put the
     strings onto a vector for each column. */

  for (kit = k.begin(); kit != k.end(); kit++) {

    v = (kit->first < 'N') ? &atm : &ntz;    // V points to the proper string vector.
    s.clear();                               // Create the key string and the dashes.
    s.resize(19, ' ');
    s[0] = kit->first;
    v->push_back(s);
    s.clear();
    s.resize(19, '-');
    v->push_back(s);

    for (i = 0; i < kit->second.size(); i++) {         // Now, for each term, format it and push
      sprintf(cs, "  %-17s", kit->second[i].c_str());  // it onto the string vector.
      s = cs;
      v->push_back(s);
    }
  }

  /* Print the vectors to error check. */

  cout << "A to M:" << endl;
  for (i = 0; i < atm.size(); i++) cout << atm[i] << endl;
  cout << "N to Z:" << endl;
  for (i = 0; i < ntz.size(); i++) cout << ntz[i] << endl;

  return rv;
}

A few things -- first, note how I use a pointer to a vector of strings sure that I'm putting the strings onto the correct list. Second, I'm using sprintf() to create the strings for the terms. I do that because it's easier than using an ostringstream. The only sublety is that I need to make sure the memory is allocated for cs. It could be only 20 characters (19 for the string and one for the null character, but I'm using 100 just to be safe. The string " %-17s" says to create the string with two spaces, and then pad the argument to 17 characters, left justified. That makes the resulting string 19 characters. Saying "s = cs" creates a C++ string from cs.

When we compile and run it, all looks good. We can pipe the output to cat -e to make sure that all of the strings are 19 characters:

UNIX> make bin/glossary_3_makestrings
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_3_makestrings.o src/glossary_3_makestrings.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_main.o src/glossary_main.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/glossary_3_makestrings obj/glossary_3_makestrings.o obj/glossary_main.o
UNIX> bin/glossary_3_makestrings 0 | cat -e
A to M:$
C                  $
-------------------$
  Canada           $
F                  $
-------------------$
  France           $
G                  $
-------------------$
  Germany          $
I                  $
-------------------$
  Italy            $
J                  $
-------------------$
  Japan            $
N to Z:$
R                  $
-------------------$
  Russia           $
U                  $
-------------------$
  United Kingdom   $
  United States    $
UNIX> bin/glossary_3_makestrings 8 | cat -e
A to M:$
I                  $
-------------------$
  IBsxQTjp         $
M                  $
-------------------$
  MycwqFCi         $
N to Z:$
O                  $
-------------------$
  OxWAPJUF         $
P                  $
-------------------$
  PNiscrus         $
Z                  $
-------------------$
  zpIKLEWD         $
UNIX> 
Finally, let's create rv. The first thing we do is make sure that both atm and ntz are the same size, by adding 19-character strings to the smaller one. Then we simply traverse them and create the strings in rv using string concatenation. The final program is in src/glossary_4_finish.cpp. I only include the loop that creates rv:

  /* Make atm and ntz the same size. */

  s.clear();
  s.resize(19, ' ');
  while (atm.size() < ntz.size()) atm.push_back(s);
  while (ntz.size() < atm.size()) ntz.push_back(s);

  /* Make the final strings. */

  s = "  ";
  for (i = 0; i < atm.size(); i++) {
    rv.push_back(atm[i] + s + ntz[i]);
  }
  return rv;
}

UNIX> make bin/glossary_4_finish
g++ -std=c++11 -Wall -Wextra -Iinclude -c -o obj/glossary_4_finish.o src/glossary_4_finish.cpp
g++ -std=c++11 -Wall -Wextra -Iinclude -o bin/glossary_4_finish obj/glossary_4_finish.o obj/glossary_main.o
UNIX> bin/glossary_4_finish 0 | cat -e
C                    R                  $
-------------------  -------------------$
  Canada               Russia           $
F                    U                  $
-------------------  -------------------$
  France               United Kingdom   $
G                      United States    $
-------------------                     $
  Germany                               $
I                                       $
-------------------                     $
  Italy                                 $
J                                       $
-------------------                     $
  Japan                                 $
UNIX> bin/glossary_4_finish 8 | cat -e
I                    O                  $
-------------------  -------------------$
  IBsxQTjp             OxWAPJUF         $
M                    P                  $
-------------------  -------------------$
  MycwqFCi             PNiscrus         $
                     Z                  $
                     -------------------$
                       zpIKLEWD         $
UNIX> 
Time to submit!!

Bottom Lines From This Program


Review of STL Running Times

vector deque list set/map unordered set/map
Accessing with an index
v[i]
O(1) O(1) Not supported Not supported Not supported
Appending
push_back()
O(1) O(1) O(1) Not supported Not supported
Prepending
push_front()
v.insert(v.begin(),...)
O(n) O(1) O(1) Not supported Not supported
General Insertion O(n) O(n) O(1) O(log n) O(1)
Deleting from the back
pop_back()
v.erase(v.rbegin())
O(1) O(1) O(1) O(log n) O(1)
Deleting from the front
pop_front()
v.erase(v.begin())
O(n) O(1) O(1) O(log n) O(1)
General Deletion O(n) O(n) O(1) O(log n) O(1)
Finding an element O(n) O(n) O(n) O(log n) O(1)
Traversing O(n) O(n) O(n) O(n) O(n)
Clearing
v.clear()
O(n) O(n) O(n) O(n) O(n)
Creating from n elements Using a loop with
push_back():
O(n)
Using a loop with
push_back():
O(n)
Using a loop with
push_back():
O(n)
Using a loop with
insert():
O(n log n)
Using a loop with
insert():
O(n)
Sorted No No No Yes No