CS302 Lecture Notes - Debuggers


If you're going through this as a lab

Log into a lab machine, and do:
UNIX> cp -r /home/jplank/cs302/Notes/GDB .
UNIX> cd GDB
UNIX> make clean ; make

Debuggers

I'll quote Wikipedia on the definition of a debugger:

I only use a debugger in two cases:

  1. 1. My program is seg-faulting, and I can't reason as to why.
  2. 2. I've received code from someone else, and I can't make sense of it.
Neither of these is a good place to be, but sometimes it happens, and a debugger is a good tool to help you, so you should have some basic weapons in your arsenal.

I'll go over gdb, as it's a basic, largely ubiquitous command line debugger. Here's a good source of reference material for gdb: https://sourceware.org/gdb/current/onlinedocs/gdb/index.html.

Before I do so, let me quote some text from Dr. Ruoti, who finds great value in GUI-based debuggers:

I include this because it's good to hear from someone who thinks that debuggers are valuable, and to get some recommendations from them. Also, if you feel as though gdb is clunky, you have support for that feeling above, and can attempt to move on to a GUI-based debugger if you want.

Compiling for GDB

To compile for gdb, use the -g flag. This puts a bunch of scaffolding into your code, which the debugger uses to help you debug. It also makes your code slower, so once you're done debugging, you should compile again without the flag (and with optimization) to make your code fast.

Where's my seg fault happening?

I have a buggy program in src/sf1.cpp. This program reads "events" into a multimap, sorted on time. It treats events that are termed "Arrival" differently from the other events, counting them in addition to putting them onto the multimap. At the end of the program, it prints out the multimap.

#include <iostream>
#include <map>
#include <cstdio>
#include <cstdlib>
using namespace std;

class Event {
  public: 
    string type;
    double time;
};

int main()
{
  multimap <double, Event *> events;
  multimap <double, Event *>::iterator eit;
  Event *e;
  string type;
  double time;
  int arrival_events;

  arrival_events = 0;
  while (cin >> type >> time) {
    if (type != "Arrival") {
      e = new Event;
      e->type = type;
      e->time = time;
      events.insert(make_pair(time, e));
    } else {
      arrival_events++;
      e->type = "Arrival";
      e->time = time;
      events.insert(make_pair(time, e));
    }
  }

  for (eit = events.begin(); eit != events.end(); eit++) {
    e = eit->second;
    printf("%7.4lf %s\n", e->time, e->type.c_str());
  }
  printf("Arrival events: %d\n", arrival_events);

  return 0;
}

The makefile compiles it with the debugging flag:

UNIX> make clean
rm -f a.out bin/*
UNIX> make bin/sf1
g++ -g -o bin/sf1 src/sf1.cpp
UNIX> 
When we run it on txt/input0.txt, it seg faults:
UNIX> bin/sf1 < txt/input0.txt 
Segmentation fault (core dumped)
UNIX> 
Here's a common use for gdb. First, you run the program under the control of gdb:
 
UNIX> gdb bin/sf1
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7
Copyright (C) 2013 Free Software Foundation, Inc.
# ...
Reading symbols from /home/jplank/cs302/Notes/GDB/bin/sf1...done.
(gdb) run < txt/input0.txt
Starting program: /home/jplank/cs302/Notes/GDB/bin/sf1 < txt/input0.txt

Program received signal SIGSEGV, Segmentation fault.
std::string::assign (this=0x400bd0 <_start>, __s=0x401b00 "Arrival", __n=7)
    at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:262
262       _M_check_length(this->size(), __n, "basic_string::assign");
(gdb)
The output is a little noisy, but you can see that there has been a segmentation violation. It appears to be in line 262 of the file basic_string.tcc, which of course is a file that you don't use directly, but is part of the C++ string library. There won't be a bug there -- however, where it means is that you've made a bug which causes a segmentation violation in their code. The command backtrace will tell you what procedures have been called in the execution path to the segmentation violation:
 
(gdb) backtrace
#0  std::string::assign (this=0x400bd0 <_start>, __s=0x401b00 "Arrival", __n=7)
    at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:262
#1  0x0000000000400dd7 in main () at src/sf1.cpp:32
(gdb)
This is the number one helpful thing about gdb -- your segmentation violation is happening in line 32 of src/sf1.cpp. You can go to your favorite editor and see line 32, but you can also see it in gdb. Your procedure call stack is partitioned into frames, which are numbered in the output to backtrace. Frame 0 is in std::string::assign(). Frame 1 is in your main(). You are currently "at" frame 0. To "go" to frame 1, type "up", or perhaps more clearly, "frame 1".
 
(gdb) frame 1
#1  0x0000000000400dd7 in main () at src/sf1.cpp:32
32            e->type = "Arrival";
(gdb) 
You'll note, it shows you line 32, which looks pretty harmless. How can that line cause a segmentation violation? There is really only one way -- e must be a bad pointer. Often that's all you need to find your bug. You don't need to muck around with a debugger -- figure out why e is a bad pointer.

It shouldn't be hard to realize that you forgot to make e a new Event. But let's have the debugger help us. Maybe. First, you can print the local variables in your frame with "info locals":

 
(gdb) info locals
e = 0x400bd0 <_start>
type = "Arrival"
events = std::multimap with 0 elements
eit = 
time = 80.213791000000001
arrival_events = 1
(gdb) print e->type
That address for e looks suspicious, doesn't it? Of course, hex digits can be anything. You can print individual variables -- that will help us see that e is a bad address:
(gdb) print e
$2 = (Event *) 0x400bd0 <_start>
(gdb) print e->type
$3 = 
(gdb) 
Bingo! Something's wrong with e.

Setting breakpoints and setting variables to take a look at what's going on

Let's confirm that e is being used uninitialized. To do that, we're going to do the following: First, we'll use the "list" command to see numbered lines of main():
(gdb) list main
9     string type;
10      double time;
11  };
12  
13  int main()
14  {
15    multimap  events;
16    multimap ::iterator eit;
17    Event *e;
18    string type;
(gdb) list
19    double time;
20    int arrival_events;
21  
22    arrival_events = 0;
23  
24    while (cin >> type >> time) {
25      if (type != "Arrival") {
26        e = new Event;
27        e->type = type;
28        e->time = time;
(gdb) list
29        events.insert(make_pair(time, e));
30      } else {
31        arrival_events++;
32        e->type = "Arrival";
33        e->time = time;
34        events.insert(make_pair(time, e));
35      }
36    }
37  
38    for (eit = events.begin(); eit != events.end(); eit++) {
(gdb) 
Then, we'll put "breakpoints" at lines 22 and 31:
(gdb) b 22
Breakpoint 1 at 0x400cf5: file src/sf1.cpp, line 22.
(gdb) b 31
Breakpoint 2 at 0x400dc2: file src/sf1.cpp, line 31.
(gdb) 
And run the program on txt/input0.txt again. You'll note that the previous run hasn't completed, so we restart it. That will throw away the current run of the program:
(gdb) run < txt/input0.txt
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/jplank/cs302/Notes/GDB/bin/sf1 < txt/input0.txt

Breakpoint 1, main () at src/sf1.cpp:22
22    arrival_events = 0;
(gdb) 
It has stopped at our breakpoint. We can examine e, which is uninitialized:
(gdb) print e
$4 = (Event *) 0x400bd0 <_start>
(gdb) 
Let's set it to NULL -- you can do that with "set". Of course, the debugger doesn't recognize NULL, but we're smart enough to know that NULL is zero:
(gdb) set var e=0
(gdb) print e
$6 = (Event *) 0x0                           # The dollars sign numbers may differ for you.
(gdb) print e->type
Cannot access memory at address 0x0
(gdb) 
We type "cont" to continue to the next breakpoint, which is where we get the first "Arrival" event. You'll note that events has 0 elements, so this is the first line of input. That means that e is still NULL (which you can see in the "info locals" command):
(gdb) cont
Continuing.

Breakpoint 2, main () at src/sf1.cpp:31
31        arrival_events++;
(gdb) info locals
e = 0x0
type = "Arrival"
events = std::multimap with 0 elements
eit = 
time = 80.213791000000001
arrival_events = 0
(gdb) 
Typing "cont" will now generate the segmentation violation, as you try to set e->type on line 32:
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
std::string::assign (this=0x0, __s=0x401b00 "Arrival", __n=7)
    at /usr/src/debug/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/basic_string.tcc:262
262       _M_check_length(this->size(), __n, "basic_string::assign");
(gdb) 
The segfault made finding that bug easier, didn't it? A corollary to this is that you may want to initialize your pointers to NULL, even though a correct program will never read them. The reason is that if you have a bug, where the pointer gets read uninitialized, you'll find it easier with the segmentation violation.


Setting watchpoints

Now, let's try to find the same bug, but in a more difficult scenario. The input file in txt/input1.txt doesn't start with an "Arrival" event. That means that when you do get to an "Arrival" event, you're going to modify a previous event which is already on the map. The map will print out too many "Arrival" events, and some non-arrival events will be eaten. In particular, the "Ava-Rafael" event on line 6 will be turned into an Arrival event. Let's confirm all of that with some standard Unix command line tools:
UNIX> cat -n txt/input1.txt | grep Arrival
     7  Arrival 66.629794
    13  Arrival 82.201580
    24  Arrival 77.338365
    39  Arrival 84.262522
    69  Arrival 20.367907
UNIX> cat -n txt/input1.txt | sed -n 6p                     # This line is one before an Arrival event.
     6  Ava-Rafael 62.063342
UNIX> bin/sf1 < txt/input1.txt > txt/output1-wrong.txt     
UNIX> bin/sf2 < txt/input1.txt > txt/output1-correct.txt    # sf2 is the correct version of the program
UNIX> cat -n txt/output1-correct.txt | grep Ava-R           # In the correct output, Ava-Rafael is on line 64.
    64  62.0633 Ava-Rafael
UNIX> cat -n txt/output1-wrong.txt | sed -n 64p             # In the incorrect output, this has been turned into the Arrival event from line 7.
    64  66.6298 Arrival
UNIX> cat -n txt/output1-correct.txt | grep 66.6298         # Looking at this in another way, the Arrival event is line 69 of the correct output.
    69  66.6298 Arrival
UNIX> cat -n txt/output1-wrong.txt | grep 66.6298           # In the incorrect output, it's on both lines 69 and 64.
    64  66.6298 Arrival
    69  66.6298 Arrival
UNIX> 
Great. All of this makes sense, because we already know about the bug. Now, let's suppose that we didn't know about the bug. We've written our program, and it's getting a gradescript wrong, because your output is not matching the correct output:
 
UNIX> openssl md5 txt/output1*               # This confirms that the two outputs differ.
MD5(txt/output1-correct.txt)= 8e7d6cb48c0a5891955254337826f3b1
MD5(txt/output1-wrong.txt)= bee14bd84a63482581997c3b148a6852  
UNIX> diff -y txt/output1* | grep '|'        # This shows the lines where they differ.                   
61.8246 Gabriel-Writeup                                       | 20.3679 Arrival
62.0633 Ava-Rafael                                            | 66.6298 Arrival
87.8991 Sophia-Whiz                                           | 84.2625 Arrival
91.0791 Riley-Inception                                       | 77.3384 Arrival
92.9222 Zoe-Checkerberry                                      | 82.2016 Arrival
UNIX> 
So, there are five lines in the output where the correct output is being turned into an "Arrival". Let's use a watchpoint to figure out where this is happening. First, here are the first ten lines of the input file. As you can see, the "Ava-Rafael" event is on line 6, and there's an "Arrival" event on line 7.
UNIX> cat -n txt/input1.txt | head
     1  Abigail-Johnson 80.213791
     2  Bailey-Concordant 93.932086
     3  Grace-Pentagon 94.684299
     4  Riley-Disposal 57.560029
     5  Jack-Goer 84.855482
     6  Ava-Rafael 62.063342
     7  Arrival 66.629794
     8  Joseph-Nostalgic 19.156007
     9  Nicholas-Deneb 99.282122
    10  Cameron-Impunity 16.390366
UNIX> 
What we want to do is have the debugger stop when we see "Ava-Rafael". We'll do that by putting a breakpoint at line 27, and then going to the 6th time that we get to that line. At that point, type should equal "Ava-Rafael":
(gdb) cat -n src/sf1.cpp | sed -n 25,29p
    25      if (type != "Arrival") {
    26        e = new Event;
    27        e->type = type;
    28        e->time = time;
    29        events.insert(make_pair(time, e));
(gdb) gdb bin/sf1
# .......
Reading symbols from /home/jplank/cs302/Notes/GDB/bin/sf1...done.
(gdb) break 27                    # Set a breakpoint at line 27
Breakpoint 1 at 0x400d36: file src/sf1.cpp, line 27.
(gdb) run < txt/input1.txt        # Run this on the input file.
Starting program: /home/jplank/cs302/Notes/GDB/bin/sf1 < txt/input1.txt

Breakpoint 1, main () at src/sf1.cpp:27
27        e->type = type;
(gdb) print type                  # It stops on line 27.
$1 = "Abigail-Johnson"            # We see it's on the "Abigail-Johnson" event.
(gdb) cont 5                      # This says to run, but skip the next 4 breakpoints
Will ignore next 4 crossings of breakpoint 1.  Continuing.

Breakpoint 1, main () at src/sf1.cpp:27
27        e->type = type;
(gdb) print type                  # This is the 6th iteration, so type = "Ava-Rafael"
$2 = "Ava-Rafael"
(gdb) 
The "next" command performs the next line. If the line is a procedure or method call, it will make the call and resume control after the call is done. The "step" command is the same as "next", but it goes "into" functions and lets you step through them. We won't call "step" here -- just "next":
(gdb) info locals
e = 0x604430
type = "Ava-Rafael"
events = std::multimap with 5 elements = {[57.560029] = 0x6042f0, [80.213791000000001] = 0x604100, 
  [84.855481999999995] = 0x604390, [93.932085999999998] = 0x6041f0, 
  [94.684298999999996] = 0x604250}
eit = 
time = 62.063341999999999
arrival_events = 0
(gdb) next
28        e->time = time;
(gdb) next
29        events.insert(make_pair(time, e));
(gdb) next
24    while (cin >> type >> time) {
(gdb) info locals
e = 0x604430
type = "Ava-Rafael"
events = std::multimap with 6 elements = {[57.560029] = 0x6042f0, [62.063341999999999] = 0x604430, 
  [80.213791000000001] = 0x604100, [84.855481999999995] = 0x604390, 
  [93.932085999999998] = 0x6041f0, [94.684298999999996] = 0x604250}
eit = 
time = 62.063341999999999
arrival_events = 0
(gdb) 
As you can see, we've put e onto the multimap, and that added an element to the multimap. Now, what I want to do is have the debugger tell me when something happens to "Ava-Rafael"'s event. That's harder than I'd like it to be. What I'm going to do is "watch" the address of e->time. If it gets overwritten, the debugger will tell me. Here we go, with commentary inline:
(gdb) print &(e->time)    # Show me the address of e->time
$5 = (double *) 0x604438
(gdb) watch *0x604438     # Tell the debugger to watch it.
Hardware watchpoint 2: *0x604438
(gdb) delete 1            # Delete the breakpoint on line 27
(gdb) cont                # Continue running
Continuing.
Hardware watchpoint 2: *0x604438                            # The hardware has detected that e->time has changed.

Old value = -1758119093
New value = -1954656796
main () at src/sf1.cpp:34
34        events.insert(make_pair(time, e));
(gdb) info locals         # Where are we?  We have just read an "Arrival" event, and as you can see,
e = 0x604430             # e is the same as it was before, so we've overwritten "Ava-Rafael"
type = "Arrival"
events = std::multimap with 6 elements = {[57.560029] = 0x6042f0, [62.063341999999999] = 0x604430, 
  [80.213791000000001] = 0x604100, [84.855481999999995] = 0x604390, 
  [93.932085999999998] = 0x6041f0, [94.684298999999996] = 0x604250}
eit = 
time = 66.629794000000004
arrival_events = 1
(gdb) next           # I call next, which inserts e into the map
24    while (cin >> type >> time) {
(gdb) print events   # The map now has two events at 0x604430.  That's a bug!
$6 = std::multimap with 7 elements = {[57.560029] = 0x6042f0, [62.063341999999999] = 0x604430, 
  [66.629794000000004] = 0x604430, [80.213791000000001] = 0x604100, 
  [84.855481999999995] = 0x604390, [93.932085999999998] = 0x6041f0, 
  [94.684298999999996] = 0x604250}
(gdb) 
Why did I watch the address of e->time and not e or e->type? The reason is that e->type is a C++ string, and its semantics are a little different. The time is simply 8 bytes, so it's easy to watch.


Commentary

I hope this gives you a little of the flavor of using a command-line debugger. As Dr. Ruoti says above, there are better debuggers with much better features. Whenever you use an IDE, you should explore its features for debuggin.

Lab Project1

I have taken the program sudoku5.cpp from the Recursion Review lecture notes, and annotated it so that it prints out the number of times Solve() is called. When you run it on the puzzle in txt/s_test.txt, it takes 7,789 calls.

-7- --5 ---
-2- --- -35
--- 74- 9--

--8 -1- 3--
7-- 5-3 --8
--5 -6- 1--

--3 -72 ---
19- --- -6-
--- 9-- -8-

UNIX> bin/sudoku5 < txt/s_test.txt 
976 235 814
824 196 735
351 748 926

268 417 359
719 523 648
435 869 172

583 672 491
192 384 567
647 951 283

# of calls to Solve: 7789
UNIX> 
Your job is to run bin/sudoku5 under the control of gdb, and within gdb, after you've read in the puzzle, but before you call Solve, set the first element of the puzzle to '9'. Then see how many calls to Solve() it takes to solve the puzzle. The answer is 1956.

Show your TA how you did it to get credit for the lab. At the end of lab, if you didn't get it, your TA will show you.