CS140 Lecture notes -- Pointers, String Routines, Malloc

  • Jim Plank
  • Directory: ~cs140/www-home/notes/PointMalloc
  • Lecture notes: http://www.cs.utk.edu/~cs140/notes/PointMalloc
  • Thu Sep 10 11:08:07 EDT 1998

    Little Quiz

    Behold the following C program (in quiz.c):
    #include < stdio.h >
    
    main()
    {
      int i, array[10];
      int *ip, *a1;
      int **ipp;
    
      ip = &i;
      ipp = &ip;
      a1 = &(array[1]);
    
      for (i = 0; i < 10; i++) array[i] = i;
    
      i = 11;
    
      printf("ip: 0x%x, &ip: 0x%x, array: 0x%x\n", ip, &ip, array);
      printf("\n");
      
    
      printf("&i: 0x%x\n", &i);
      printf("ipp: 0x%x, *ipp: 0x%x, **ipp: 0x%x\n", ipp, *ipp, **ipp);
      printf("\n");
      printf("a1: 0x%x, *a1: 0x%x\n", a1, *a1);
    
      a1 += 4;
      *a1 = 500;
      
      for (i = 0; i < 10; i++) {
        printf("%d ", array[i]);
      }
      printf("\n");
    }
      
    
    When you run this, the first line of output is:
    UNIX> quiz
    ip: 0xeffff9fc, &ip: 0xeffff9cc, array: 0xeffff9d0
    
    
    What is the rest of the output?

    This is tricky, but you should be able to do it with all you currently know about pointers. This is the kind of question I am fond of asking on tests. Here's the answer. If you want to make sure you're doing things right, try to draw a picture of memory and fill in what that first line tells you.

    ip: 0xeffff9fc, &ip: 0xeffff9cc, array: 0xeffff9d0
    
    &i: 0xeffff9fc
    ipp: 0xeffff9cc, *ipp: 0xeffff9fc, **ipp: 0xb
    
    a1: 0xeffff9d4, *a1: 0x1
    0 1 2 3 4 500 6 7 8 9 
    
    The statement ``a1 += 4'' is what's known as pointer arithmetic. It sets a1 ahead four ints. Therefore, it adds 16 to the value of a1 -- 16 because ints are 4 bytes: 4*4 = 16.

    String routines

    I assume you have seen many of these before, but I'll go over them again because they are extremely useful:

    strcpy()

    char *strcpy(char *s1, char *s2);
    
    Strcpy() assumes that s2 is a null-terminated string, and that s1 is a (char *) with enough characters to hold s2, including the null character at the end. Strcpy() then copies s2 to s1. It also returns s1. Why would you return your first argument? We'll get to that when we show an easy way to implement strdup().

    Here's a simple program that uses strcpy() to initialize three strings and print them out (this is in strcpy.c):

    #include < stdio.h >
    #include < string.h >
    
    main()
    {
      char give[5];
      char him[5];
      char six[5];
    
      strcpy(give, "Give");
      strcpy(him, "Him");
      strcpy(six, "Six!");
    
      printf("%s %s %s\n", give, him, six);
    }
    
    It runs fine:
    UNIX> strcpy
    Give Him Six!
    UNIX>
    
    Suppose I try to copy a string that's too big. For example, look at strcpy2.c:
    #include < stdio.h >
    #include < string.h >
    
    main()
    {
      char give[5];
      char him[5];
      char six[5];
    
      printf("give: 0x%x  him: 0x%x  six: 0x%x\n", give, him, six);
    
      strcpy(give, "Give");
      strcpy(him, "Him");
      strcpy(six, "Six!");
    
      printf("%s %s %s\n", give, him, six);
    
      strcpy(him, "Jamal Lewis");
    
      printf("%s %s %s\n", give, him, six);
    }
    
    Now run it:
    UNIX> strcpy2
    give: 0xeffff9f8  him: 0xeffff9f0  six: 0xeffff9e8
    Give Him Six!
    wis Jamal Lewis Six!
    UNIX> 
    
    Take a minute and try to figure out what's going on. Look at the following picture of memory. When we start, space has been allocated for give, him and six:
                        |----4 bytes----|           
                   
                        |               |           
                        |               | 0xeffff9e0
                        |               | 0xeffff9e4
         six----------> |               | 0xeffff9e8
                        |               | 0xeffff9ec
         him----------> |               | 0xeffff9f0
                        |               | 0xeffff9f4
         give---------> |               | 0xeffff9f8
                        |               | 0xeffff9fc
                        |               | 0xeffffa00
                        |               | 0xeffffa04
                        |               | 0xeffffa08
                        |               | 0xeffffa0c
    
    Now, we make the first three strcpy() calls. At the point of the first printf() statement, memory looks like:
                        |----4 bytes----|           
                   
                        |               |           
                        |               | 0xeffff9e0
                        |               | 0xeffff9e4
         six----------> |'S'|'i'|'x'|'!'| 0xeffff9e8
                        | 0 |   |   |   | 0xeffff9ec
         him----------> |'H'|'i'|'m'| 0 | 0xeffff9f0
                        |   |   |   |   | 0xeffff9f4
         give---------> |'G'|'i'|'v'|'e'| 0xeffff9f8
                        | 0 |   |   |   | 0xeffff9fc
                        |               | 0xeffffa00
                        |               | 0xeffffa04
                        |               | 0xeffffa08
                        |               | 0xeffffa0c
    
    Now, we make the call strcpy(him, "Jamal Lewis"). What happens is that the entire string is copied to him, and this overruns the memory allocated for give:
                        |----4 bytes----|           
                   
                        |               |           
                        |               | 0xeffff9e0
                        |               | 0xeffff9e4
         six----------> |'S'|'i'|'x'|'!'| 0xeffff9e8
                        | 0 |   |   |   | 0xeffff9ec
         him----------> |'J'|'a'|'m'|'a'| 0xeffff9f0
                        |'l'|' '|'L'|'e'| 0xeffff9f4
         give---------> |'w'|'i'|'s'| 0 | 0xeffff9f8
                        | 0 |   |   |   | 0xeffff9fc
                        |               | 0xeffffa00
                        |               | 0xeffffa04
                        |               | 0xeffffa08
                        |               | 0xeffffa0c
    
    So this means that him is indeed "Jamal Lewis", but give has been modified as well, to be "wis". This accounts for the printout of:
    wis Jamal Lewis Six!
    
    The bottom line is that when you modify memory that you have not allocated (as I did when I called strcpy(him, "Jamal Lewis");), then strange things will happen. They have explanations, but until you figure it out, it will be confusing. If you're lucky, you get a segmentation violation or a bus error. If you're unlucky, you get wierd, inexplicable output. A corollary of this is that when you get a segmentation violation, a bus error, or wierd, inexplicable output, then chances are you have modified memory that you didn't allocate.

    strcat()

    char *strcat(char *s1, char *s2);
    
    Strcat() assumes that s1 and s2 are both null-terminated strings. Strcat() then concatenates s2 to the end of s1. I don't know what it returns -- read the man page if you care. Strcat() assumes that there is enough space in s1 to hold these extra characters. Otherwise, you'll start stomping over memory that you didn't allocate. Here is a simple example: (this is in strcat.c):
    #include < stdio.h >
    #include < string.h >
    
    main()
    {
      char givehimsix[15];
    
      strcpy(givehimsix, "Give");
      printf("%s\n", givehimsix);
      strcat(givehimsix, " Him");
      printf("%s\n", givehimsix);
      strcat(givehimsix, " Six!");
      printf("%s\n", givehimsix);
    }
    
    The output is predictable:
    UNIX> strcat
    Give
    Give Him
    Give Him Six!
    UNIX> 
    
    Look at strcat2.c. Can you explain why the output is the way that it is? Try filling memory as in the strcpy2 example above.
    UNIX> strcat2
    give: 0xeffff9f0  him: 0xeffff9e8  six: 0xeffff9e0
    Give Him Six!
    wis Jamal Lewis Six!
    wis Help! Jamal Lewis Help! Six!
    UNIX> 
    

    strlen()

    int strlen(char *s);
    
    Strlen() assumes that s is a null-terminated string. It returns the number of characters before the null character. Strlen() is pretty obvious: (this is in strlen.c):
    #include < stdio.h >
    #include < string.h >
    
    main()
    {
      char give[5];
      char him[5];
      char six[5];
    
      strcpy(give, "Give");
      strcpy(him, "Him");
      strcpy(six, "Six!");
    
      printf("%s %s %s\n", give, him, six);
      printf("%d %d %d\n", strlen(give), strlen(him), strlen(six));
    }
    
    Output:
    UNIX> strlen
    Give Him Six!
    4 3 4
    
    Note, strlen() doesn't care about allocation -- it just reads bytes until it hits a null character. If you give it a bad argument, you'll get wierd things. Try strlen2.c. Can you explain its output?
    UNIX> strlen2
    s1: 0xeffff988
    0
    &i: 0xeffff98c   &j: 0xeffff988
    s1: 0xeffff988
    8
    UNIX> 
    

    strchr()

    char *strchr(char *s, int c);
    
    Strchr() assumes that s is a null-terminated string. C is an integer, but it is treated as a character. Strchr() returns a pointer to the first occurrence of the character equal to c in s. If s does not contain c, then it returns NULL.

    Here is a simple program that prints out whether each line of standard input contains a space (this is in strchr.c):

    #include < stdio.h >
    #include < string.h >
    
    main()
    {
      char line[100];
      char *ptr;
    
      while (gets(line) != NULL) {
        ptr = strchr(line, ' ');
        if (ptr == NULL) {
          printf("No spaces\n");
        } else {
          printf("Space at character %d\n", ptr-line);
        }
      }
    }
    
    Note, I'm doing a little pointer arithmetic here -- ptr-line returns the number of characters between line and ptr. Here's an example of this running:
    UNIX> strchr
    Jim
    No spaces
    Jim Plank
    Space at character 3
    Jamal Lewis
    Space at character 5
     HI!
    Space at character 0
       HI!
    Space at character 0
    UNIX> 
    
    We can modify this to print out where all the spaces are. Check out strchr2.c:
    UNIX> strchr2
    Jim
    No spaces
    Jim Plank
    Space at character 3
    Jim  Plank
    Space at character 3
    Space at character 4
       Give   Him   Six!!!
    Space at character 0
    Space at character 1
    Space at character 2
    Space at character 7
    Space at character 8
    Space at character 9
    Space at character 13
    Space at character 14
    Space at character 15
    UNIX> 
    
    Go over the code -- why do I say
            ptr = strchr(ptr+1, ' ');
    
    instead of
            ptr = strchr(ptr, ' ');
    
    If you don't know, copy the code, modify it, and see for yourself!

    tail10

    Suppose you want to write a program called tail10, that prints out the last ten lines of standard input. A straightforward design of this is to have an array of ten (char *)'s. Each time we read in a line of text, we set the next array entry to be that line. After filling in the 10th entry, we go back to the beginning. At the end, we print out the last 10 entries filled. Here is tail10a.c:
    #include < stdio.h >
    
    main()
    {
      char line[1000];
      char *last10[10];
      int nlines;
    
      nlines = 0;
      while (gets(line) != NULL) {
        last10[nlines%10] = line;
        nlines++;
      }
    
      if (nlines <= 10) {
        for (i = 0; i < nlines; i++) puts(last10[i]);
      } else {
        for (i = nlines-10; i < nlines; i++) {
          puts(last10[i%10]);
        }
      }
    }
    
    Go over this until you understand what's going on. Unfortunately, it won't work right:
    UNIX> cat tailinput
    Line 1
    Line 2
    Line 3
    Line 4
    Line 5
    Line 6
    Line 7
    Line 8
    Line 9
    Line 10
    Line 11
    UNIX> tail10a < tailinput
    Line 11
    Line 11
    Line 11
    Line 11
    Line 11
    Line 11
    Line 11
    Line 11
    Line 11
    Line 11
    UNIX> 
    
    Why? Because you're setting each entry of last10 to be line, which is getting overwritten at each gets() call.

    One easy way to fix this is to make last10 an array of 10 arrays of 1000 characters each, and then to use strcpy(). This is in tail10b.c. It works correctly:

    UNIX> tail10b < tailinput
    Line 2
    Line 3
    Line 4
    Line 5
    Line 6
    Line 7
    Line 8
    Line 9
    Line 10
    Line 11
    UNIX> tail10b < tail10b.c
      }
    
      if (nlines <= 10) {
        for (i = 0; i < nlines; i++) puts(last10[i]);
      } else {
        for (i = nlines-10; i < nlines; i++) {
          puts(last10[i%10]);
        }
      }
    }
    UNIX> 
    

    tailany

    Now, suppose you want to modify tail10 so that it prints the last n lines, where n is defined on the command line. This is a problem because you can't statically allocate the last10 array as you did in tail10.c. This is an example where you can either In general, it is best to use the second approach. An example of where we use the first approach is whenever we assume that an input line is less than 1000 characters. For the purposes of this class, this is a reasonable assumption. However, if you are writing production code, you should never make such assumptions, because they will always come back to haunt you.

    So we'll use malloc(). Malloc(n) returns a pointer to n bytes of memory, given to you by the operating system. We say that this memory comes from the ``heap.'' Once you get this memory, you can use it anywhere -- you can pass it to and from procedure calls without worrying about it going away like you do local memory.

    Below, we show tailany1.c, which uses malloc() to allocate the array of 1000 character strings. Note how the code basically looks like tail10b.c:

    #include < stdio.h >
    #include < string.h >
    
    main(int argc, char **argv)
    {
      char line[1000];
      char **lastn;
      int nlines, i, n;
    
      /* Error check the input */
    
      if (argc != 2) {
        fprintf(stderr, "usage: tailany1 n\n");
        exit(1);
      }
      n = atoi(argv[1]);
      if (n <= 0) exit(0);
    
      /* Allocate the array */
    
      lastn = (char **) malloc(sizeof(char *)*n);
      for (i = 0; i < n; i++) {
        lastn[i] = (char *) malloc(sizeof(char)*1000);
      }
      
      /* Read the input */
    
      nlines = 0;
      while (gets(line) != NULL) {
        strcpy(lastn[nlines%n], line);
        nlines++;
      }
    
      /* Print the last n lines */
    
      if (nlines <= n) {
        for (i = 0; i < nlines; i++) puts(lastn[i]);
      } else {
        for (i = nlines-n; i < nlines; i++) {
          puts(lastn[i%n]);
        }
      }
    }
    
    Note that when we call malloc(), we cast its return value to be the type we want. This is to keep the compiler happy -- malloc() returns a pointer -- we make the type cast statement to tell the compiler that we know what kind of pointer we want.

    This works just like we think it should:

    UNIX> tailany1 < tailinput
    usage: tailany1 n
    UNIX> tailany1 1 < tailinput
    Line 11
    UNIX> tailany1 2 < tailinput
    Line 10
    Line 11
    UNIX> tailany1 50 < tailinput
    Line 1
    Line 2
    Line 3
    Line 4
    Line 5
    Line 6
    Line 7
    Line 8
    Line 9
    Line 10
    Line 11
    UNIX> tailany1 100000 < tailinput
    Line 1
    Line 2
    Line 3
    Line 4
    Line 5
    Line 6
    Line 7
    Line 8
    Line 9
    Line 10
    Line 11
    UNIX> 
    
    You'll note that if you run tailany1 100000 < tailinput, it will take a little while. This is the time that it takes doing all those mallocs(). How many bytes is it allocating? 100000*4 for lastn, and 1000 for each entry of lastn. That makes 400000+100000*1000 = 100400000. That's roughly 100 megabytes, which seems kind of wasteful. It is. Try it with a bigger number, and see if you can get tailany1 to break!

    Here's one solution. Instead of allocating arrays of 1000 characters, let's go back to using (char *)'s instead. Then after reading line, well call strdup() instead of strcpy().

    Strdup(s) basically does the following:

    char *strdup(char *s)
    {
      char *news;
    
      news = (char *) malloc((strlen(s)+1)*sizeof(char));
      strcpy(news, s);
      return news;
    }
    
    In other words, it gives you a new copy of s that it has malloc()'d for you.

    Tailany2.c uses strdup() -- check it out and make sure you understand how it works. Now you'll see that tailany2 100000 < tailinput runs much faster.

    There's one last problem with tailany2. That is that if nlines is much larger than n, then you are wasting a lot of memory. If I have time, I'll talk about it in class. It simply involves calling free() when you overwrite an entry of lastn. The code is in tailany3.c.