CS302 Lecture Notes - Dynamic Programming
Example program #5: PageNumbers

James S. Plank

Original Notes: Thu Nov 20 09:06:39 EST 2014
Latest Revision: Mon Nov 9 11:21:10 EST 2020

This is from the 2009 Topcoder Algorithm Qualifier, Round 2, 500-pointer.
Problem Statement.

You can type "make page" to compile the programs for this file.


Description

In case Topcoder's servers are not working, here is a summary of the problem:

Examples

Num         N      RV[0]     RV[1]     RV[2]     RV[3]     RV[4]     RV[5]     RV[6]     RV[7]     RV[8]     RV[9]
---  --------  --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
 0          7          0         1         1         1         1         1         1         1         0         0
Comment: These are pages 1, 2, 3, 4, 5, 6, 7 -- one of each digit.

Num         N      RV[0]     RV[1]     RV[2]     RV[3]     RV[4]     RV[5]     RV[6]     RV[7]     RV[8]     RV[9]
---  --------  --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
 1         11          1         4         1         1         1         1         1         1         1         1
Comment: Now pages 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11.  That's why we have four ones.

Num         N      RV[0]     RV[1]     RV[2]     RV[3]     RV[4]     RV[5]     RV[6]     RV[7]     RV[8]     RV[9]
---  --------  --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
 2         19          1        12         2         2         2         2         2         2         2         2
Comment: All of the single digit numbers, and all of the teens.  That's a lot of ones.

Num         N      RV[0]     RV[1]     RV[2]     RV[3]     RV[4]     RV[5]     RV[6]     RV[7]     RV[8]     RV[9]
---  --------  --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
 3        999        189       300       300       300       300       300       300       300       300       300
Comment: There's a nice symmetry here.  Topcoder gave this example to throw you off your game.

Num         N      RV[0]     RV[1]     RV[2]     RV[3]     RV[4]     RV[5]     RV[6]     RV[7]     RV[8]     RV[9]
---  --------  --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
 4  543212345  429904664 541008121 540917467 540117067 533117017 473117011 429904664 429904664 429904664 429904664
Comment: At least there's a big test case to test.


Approach that doesn't work

This was a tricky one, that only 20 percent of the coders got in the qualification tournament. It shows the power of thinking recursively, and of dynamic programming.

The simplest solution would be to loop from 1 to N, call sprintf() and add up the digits. The problem with that solution is that it is linear in N, and N can be as big as 1,000,000,000. So that won't do.

On the flip side, it's useful to program it up so that you verify your faster solution -- that's what I did when I solved this one in real time.


Recursion

What we're going to do is structure our recursion on the first digit of the number. Our base case will be when N is a one-digit number. We can solve that case directly. In fact, let's do that first.

As with the other problems in these lecture notes, I solve this incrementally with a number of programs. The first is src/page-numbers-1.cpp. I have a main() in this file which reads N from the command line, and then uses the Topcoder structure to call PageNumbers::getCounts():

#include <string>
#include <vector>
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;

class PageNumbers {
  public:
   vector <int> getCounts(int N);
};

vector <int> PageNumbers::getCounts(int N)
{
  vector <int> rv;
  int i;

  /* We're only solving the base case -- when N is a one-digit number. */

  if (N < 10) {
    rv.resize(10, 0);
    for (i = 1; i <= N; i++) rv[i] = 1;
    return rv;
  }
 
  printf("We haven't solved the problem for N >= 10 yet.\n");
  return rv; 
}

/* Our main() reads N from the command line, and calls getCounts().
   It prints the return vector. */

int main(int argc, char **argv)
{
  size_t i;
  PageNumbers c;
  int N; 
  vector <int> retval;

  if (argc != 2) {
    fprintf(stderr, "usage: PageNumbers N\n");
    exit(1);
  }

  N = atoi(argv[1]);

  retval = c.getCounts(N);
  if (retval.size() == 0) exit(0);
  printf("Answer:");
  for (i = 0; i < retval.size(); i++) printf(" %d", retval[i]);
  cout << endl;

  exit(0);
}

We test it out, and it looks good. That's good for our self-esteem:

UNIX> bin/page-numbers-1 0
Answer: 0 0 0 0 0 0 0 0 0 0
UNIX> bin/page-numbers-1 3
Answer: 0 1 1 1 0 0 0 0 0 0
UNIX> bin/page-numbers-1 9
Answer: 0 1 1 1 1 1 1 1 1 1
UNIX> bin/page-numbers-1 10
We haven't solved the problem for N >= 10 yet.
UNIX> 
Now, suppose N is two digits or more. As I said above, we're going to structure our recursion around the first digit of N. Let's call that first_digit. We're going to use first_digit to define another number, which we'll call middle_number. This number has the same number of digits as N (which we'll call digits), and it has the same first_digit. However, its remaining digits are zeros. Finally, we'll define another number called remainder, which is defined to be (N - middle_number).

Let's give an example. Suppose N is 3659. Then first_digit is 3, digits is 4, middle_number is 3000, and remainder is 659.

Let's write the code to set these variables. That is in src/page-numbers-2.cpp. Here's getCounts().

vector <int> PageNumbers::getCounts(int N)
{
  vector <int> rv;
  int i;
  char buf[20];
  string n_str;
  int first_digit;        /* The first digit of N. */
  int digits;             /* The number of digits in N. */
  int middle_number;      /* This number has the same first digit of N, followed by zeros. */
  int remainder;          /* This is (N-middle_number). */

  /* Base case -- when N is a single-digit number. */

  if (N < 10) {
    rv.resize(10, 0);
    for (i = 1; i <= N; i++) rv[i] = 1;
    return rv;
  }
 
  /* Convert N to a string using sprintf(). */

  sprintf(buf, "%d", N);
  n_str = buf;

  /* Now calculate first_digit, digits, middle_number and remainder. */

  first_digit = n_str[0] - '0';
  digits = n_str.size();
  for (i = 1; i < digits; i++) n_str[i] = '0';
  middle_number = atoi(n_str.c_str());
  remainder = N - middle_number;

  /* Print them out and exit. */

  printf("First digit   = %10d\n", first_digit);
  printf("Digits        = %10d\n", digits);
  printf("Middle number = %10d\n", middle_number);
  printf("Remainder     = %10d\n", remainder);

  return rv;
}

As you can see, I used sprintf() to convert N to a string, and then atoi to create middle_number from the string. You could use stringstreams to do this, or you could use div and mod. It's up to you.

Again, we test it and see that all is as it should be:

UNIX> bin/page-numbers-2 3659
First digit   =          3
Digits        =          4
Middle number =       3000
Remainder     =        659
Answer:
UNIX> bin/page-numbers-2 987654321
First digit   =          9
Digits        =          9
Middle number =  900000000
Remainder     =   87654321
Answer:
UNIX> bin/page-numbers-2 10
First digit   =          1
Digits        =          2
Middle number =         10
Remainder     =          0
Answer:
UNIX> 
Now, we're going to split our problem into three cases:
  1. Calculate the page numbers for pages from 1 to (middle_number-1).
  2. Calculate the page numbers for middle_number.
  3. Calculate the page numbers for page (middle_number+1) to N.
Hopefully, you can see that the first one is a simple call to getCounts(middle_number-1). The second one you can do directly from first_digit and the size of n_str.

The third one is a little more tricky, so let's solve the second one and test it. That code is in src/page-numbers-3.cpp

  /* Calculate the answer for middle_number and return it.  */

  rv.resize(10, 0);
  rv[first_digit]++;
  for (i = 0; i < digits-1; i++) rv[0]++;

  return rv;
}

We test it, and all looks good:

UNIX> bin/page-numbers-3 3659
First digit   =          3
Digits        =          4
Middle number =       3000
Remainder     =        659
Answer: 3 0 0 1 0 0 0 0 0 0           # This isn't the correct answer.  It's just the digits in 3000.
UNIX> bin/page-numbers-3 987654321
First digit   =          9
Digits        =          9
Middle number =  900000000
Remainder     =   87654321
Answer: 8 0 0 0 0 0 0 0 0 1           # This isn't the correct answer.  It's just the digits in 900000000.
UNIX> bin/page-numbers-3 10
First digit   =          1
Digits        =          2
Middle number =         10
Remainder     =          0
Answer: 1 1 0 0 0 0 0 0 0 0           # This isn't the correct answer.  It's just the digits in 10.
UNIX> 
Ok -- let's do the hard case -- solving the problem for the pages from (middle_number+1) to N. First, how many of these numbers are there? The answer is remainder. Second, what digit do they all start with? The answer is first_digit. So, we can add remainder digits whose values are first_digit to the return value, and now we only have to worry about the remaining digits.

These make up a subproblem which is almost like getCounts(). You want to calculate digits for all of the pages from 1 to remainder, however you need to include leading zeros. Think about the case where N is 1002. Then, remainder is 2, and when you want to solve the subproblem from pages 1001 to 1002. You'll do that by adding two '1' digits, and then you'd like to call getCount(2). However, you need those four zeros, and getCount(2) is not going to calculate them.

What you do is use the following observation: You know exactly how many digits are going to be in pages (middle_number+1) to N: (remainder * digits). We've already demonstrated that the remainder of these are equal to first_digit. To calculate the rest, we can call getCount(remainder). The return vector of that call will have all of the digits except for those leading zeros. Since you know how many total digits there should be, you know that the ones not calculated by the recursive getCount(remainder) call must be zeros. That lets you solve the problem.

Let's use 3659 as an example. We're going to solve the three subproblems as follows:

  1. We'll call getCounts(2999) to get all of the page numbers from 1 to (middle_number-1).
  2. We'll add one to rv[3] and three zeros to rv[0] to account for middle_number.
  3. We'll add 659 to rv[3], and then we'll call getCounts(659) recursively. We'll add up the digits in that return vector and subtract that number from (3*659). That is the number of extra zeros that we add to rv[0].
This code is in src/page-numbers-4.cpp. Here is the relevant code:

vector <int> PageNumbers::getCounts(int N)
{
  vector <int> rv, rv2;  /* I've added rv2 for the recursion. */

  ...

  /* Make the first recursive call to middle_number-1 */

  rv = getCounts(middle_number-1);

  /* Add in the answer for middle_number. */

  rv[first_digit]++;
  for (i = 0; i < digits-1; i++) rv[0]++;

  /* Add the first digit of (middle_number+1) to N: */

  rv[first_digit] += remainder;

  /* Now, call this recursively on remainder, and count up how
     many digits that is.  Subtract this from (digits-1)*remainder 
     to get the number of leading zeros that you're missing.  
     Then add everything to the final return value. */

  rv2 = getCounts(remainder);
  d = 0;
  for (i = 0; i < rv2.size(); i++) d += rv2[i];
  rv[0] += ((digits-1)*remainder - d);
  for (i = 0; i < rv2.size(); i++) rv[i] += rv2[i];

  return rv;
}

We'll test it on examples 1-3 from Topcoder. Example 3, where N equals 999, makes a ton of recursive calls, so I just print out the last line, to confirm that we have the right answer:

UNIX> bin/page-numbers-4 11
First digit   =          1
Digits        =          2
Middle number =         10
Remainder     =          1
Answer: 1 4 1 1 1 1 1 1 1 1
UNIX> bin/page-numbers-4 19
First digit   =          1
Digits        =          2
Middle number =         10
Remainder     =          9
Answer: 1 12 2 2 2 2 2 2 2 2
UNIX> bin/page-numbers-4 999 | tail -n 1
Answer: 189 300 300 300 300 300 300 300 300 300
UNIX> bin/page-numbers-4 999 | wc
     397    1397   10740
UNIX> 
Looks like we have to memoize. This turns out to be really easy, because getCounts() only returns its answer in one place after the base case. The final code is in src/page-numbers-5.cpp:

/* Add a cache to PageNumbers */

class PageNumbers {
  public:
   vector <int> getCounts(int N);
   map < int, vector <int> > Cache;    
};

vector <int> PageNumbers::getCounts(int N)
{
  [... Variable declarations]

  /* Base case -- when N is a single-digit number. */

  if (N < 10) {
    rv.resize(10, 0);
    for (i = 1; i <= N; i++) rv[i] = 1;
    return rv;
  }

  /* Get the answer from the Cache if it's there. */

  if (Cache.find(N) != Cache.end()) return Cache[N];

  [... The rest of the code]

  /* Insert the answer into the cache before returning. */

  Cache[N] = rv;

  return rv;
}
}

UNIX> bin/page-numbers-5 11
First digit   =          1
Digits        =          2
Middle number =         10
Remainder     =          1
Answer: 1 4 1 1 1 1 1 1 1 1
UNIX> bin/page-numbers-5 19
First digit   =          1
Digits        =          2
Middle number =         10
Remainder     =          9
Answer: 1 12 2 2 2 2 2 2 2 2
UNIX> bin/page-numbers-5 999 | tail -n 1
Answer: 189 300 300 300 300 300 300 300 300 300
UNIX> bin/page-numbers-5 999 | wc
      73     263    1992
UNIX> bin/page-numbers-5 543212345 | tail -n 1
Answer: 429904664 541008121 540917467 540117067 533117017 473117011 429904664 429904664 429904664 429904664
UNIX> bin/page-numbers-5 543212345 | wc
     301    1061    8208
UNIX> 

What's the running time complexity?

Well, your first set of recursive calls are always going to be to numbers of the form d99999... With N capped at 1,000,000,000, there are only 99 of those. The second set of recursive calls is going to be restricted to suffixes of N. For example, when N is 3658, then you will make recursive calls to 658, 58 and 8. So, it looks like there will be only 110 or so entries in the cache. That's pretty efficient!