CS360 Lecture notes -- Pointer Arithmetic (Small lecture)

  • James S. Plank
  • Directory: /home/plank/cs360/notes/Pointer-Arithmetic
  • Lecture notes: http://web.eecs.utk.edu/~plank/plank/classes/cs360/360/notes/Pointer-Arithmetic/index.html
  • Lecture notes directory: /home/plank/cs360/notes/Pointer-Arithmetic
  • Bitbucket: https://bitbucket.org/jimplank/cs360-lecture-notes.
  • Original lecture notes ("PointMalloc"): Fri Aug 31 10:39:16 EDT 2007.
  • Last modified: Wed Jan 17 16:45:00 EST 2018
    There's really nothing new in this small lecture - just some reinforcement from the last lecture.
    We've used pointers in CS140 and CS302. If you want some review, please see pointer lecture notes from CS140. For additional reinforcement, I have a set of old lecture notes from CS302 where I set up more STL data structures that point to each other with pointers. Both of these are in C++.

    There is no standard template library in C. This means that vectors, lists, sets and maps are gone. We will replace all of them in the next few lectures. We'll start with vectors. In C, we use arrays instead of vectors. You can statically declare an array by putting [size] in the variable declaration. For example, the following variable declaration will create an array iarray of ten integers:

    int iarray[10];
    

    You can access the elements of iarray in square brackets. Unlike C++, iarray has no methods. In particular, the size of iarray is not stored anywhere -- you have to keep track of it yourself.

    In reality, iarray is a pointer to the first element of the array. In other words, there are 40 bytes allocated for the array (since integers are four bytes each), and iarray points to the first of these. If we want, we can set a second pointer to iarray, and we can print the elements of iarray by incrementing the pointer and dereferencing it. We do all of that in ptr1.c:

    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
      int iarray[10];
      int *ip;
      int i;
    
      for (i = 0; i < 10; i++) iarray[i] = 100+i;
    
      printf("iarray = 0x%lx\n", (unsigned long) iarray);
    
      ip = iarray;
    
      for (i = 0; i < 10; i++) {
        printf("i=%d.  iarray[i]=%d.  ip = 0x%lx.  *ip=%d.  (ip-iarray)=%d\n", 
            i, iarray[i], (unsigned long) ip, *ip, (int) (ip-iarray));
        ip++;
      }
    }
    

    In the for loop, we print out five quantities:

    Let's take a look at output:
    UNIX> ./ptr1
    iarray = 0x7fff5fbfdc40
    i=0.  iarray[i]=100.  ip = 0x7fff5fbfdc40.  *ip=100.  (ip-iarray)=0
    i=1.  iarray[i]=101.  ip = 0x7fff5fbfdc44.  *ip=101.  (ip-iarray)=1
    i=2.  iarray[i]=102.  ip = 0x7fff5fbfdc48.  *ip=102.  (ip-iarray)=2
    i=3.  iarray[i]=103.  ip = 0x7fff5fbfdc4c.  *ip=103.  (ip-iarray)=3
    i=4.  iarray[i]=104.  ip = 0x7fff5fbfdc50.  *ip=104.  (ip-iarray)=4
    i=5.  iarray[i]=105.  ip = 0x7fff5fbfdc54.  *ip=105.  (ip-iarray)=5
    i=6.  iarray[i]=106.  ip = 0x7fff5fbfdc58.  *ip=106.  (ip-iarray)=6
    i=7.  iarray[i]=107.  ip = 0x7fff5fbfdc5c.  *ip=107.  (ip-iarray)=7
    i=8.  iarray[i]=108.  ip = 0x7fff5fbfdc60.  *ip=108.  (ip-iarray)=8
    i=9.  iarray[i]=109.  ip = 0x7fff5fbfdc64.  *ip=109.  (ip-iarray)=9
    UNIX> 
    
    Everything in hex will change from machine to machine. However, their interrelationship will always be the same. In the for loop, i, iarray[i] and *ip should all be straightforward and require no explanation. I'll explain the others in detail.

    When this program starts to run, the operating system has set it up so that the 40 bytes starting with 0x7fff5fbfdc40 are where iarray is stored. That is why iarray is equal to 0x7fff5fbfdc40. If iarray[0] is the four bytes that start at 0x7fff5fbfdc40, then iarray[1] must be the four bytes that start at 0x7fff5fbfdc44. This is why ip is equal to 0x7fff5fbfdc44 on the second iteration of the for loop.

    While this is all logical, it is a little confusing: adding one to ip actually adds four to the value of the pointer. This is called "pointer arithmetic" -- when you add x to a pointer, it really adds sx to it, where s is the size of data to which the pointer points.

    The last column printed in the for loop is a little confusing too. Again, focus on the line where i equals one. ip is equal to 0x7fff5fbfdc44, so you would think that (ip-iarray) would equal four. It does not, because the compiler is doing pointer arithmetic -- from the point of view of the compiler, when you say "ip-iarray," you are asking for the number of elements between ip and iarray. That will be the difference between the pointers, divided by the size of the element. In this case, it is (0x7fff5fbfdc44-0x7fff5fbfdc40)/4, which equals one.

    To help hammer this home a little further, I have three other programs where are nearly identical to ptr.c:

    That last one is worth looking at it its entirety:

    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct {
      double d1;
      double d2;
    } Two;
    
    int main()
    {
      Two iarray[10];
      Two *ip;
      int i;
    
      for (i = 0; i < 10; i++) { 
        iarray[i].d1 = 100+i;
        iarray[i].d2 = 200+i;
      }
    
      printf("iarray = 0x%lx\n", (unsigned long) iarray);
    
      ip = iarray;
    
      for (i = 0; i < 10; i++) {
        printf("i=%d.  iarray[i]={%.2lf,%.2lf}.  ip = 0x%lx.  *ip={%.2lf,%.2lf}.  (ip-iarray)=%d\n", 
            i, iarray[i].d1, iarray[i].d2, (unsigned long) ip, ip->d1, ip->d2, (int) (ip-iarray));
        ip++;
      }
    }
    

    In particular, I use the arrow to dereference different parts of the struct from the pointer: ip->d1 accesses d1, and ip->d2 accesses d2.

    You can force your machine into 32-bit mode by using the compiler directive -m32. Let's do that with the last program:

    UNIX> gcc -m32 -o sptr sptr.c
    UNIX> ./sptr
    iarray = 0xbfffdd98
    i=0.  iarray[i]={100.00,200.00}.  ip = 0xbfffdd98.  *ip={100.00,200.00}.  (ip-iarray)=0
    i=1.  iarray[i]={101.00,201.00}.  ip = 0xbfffdda8.  *ip={101.00,201.00}.  (ip-iarray)=1
    i=2.  iarray[i]={102.00,202.00}.  ip = 0xbfffddb8.  *ip={102.00,202.00}.  (ip-iarray)=2
    i=3.  iarray[i]={103.00,203.00}.  ip = 0xbfffddc8.  *ip={103.00,203.00}.  (ip-iarray)=3
    i=4.  iarray[i]={104.00,204.00}.  ip = 0xbfffddd8.  *ip={104.00,204.00}.  (ip-iarray)=4
    i=5.  iarray[i]={105.00,205.00}.  ip = 0xbfffdde8.  *ip={105.00,205.00}.  (ip-iarray)=5
    i=6.  iarray[i]={106.00,206.00}.  ip = 0xbfffddf8.  *ip={106.00,206.00}.  (ip-iarray)=6
    i=7.  iarray[i]={107.00,207.00}.  ip = 0xbfffde08.  *ip={107.00,207.00}.  (ip-iarray)=7
    i=8.  iarray[i]={108.00,208.00}.  ip = 0xbfffde18.  *ip={108.00,208.00}.  (ip-iarray)=8
    i=9.  iarray[i]={109.00,209.00}.  ip = 0xbfffde28.  *ip={109.00,209.00}.  (ip-iarray)=9
    UNIX> 
    
    As you can see, the pointers are smaller (8 hex digits). However, their interrelationship is the same, and each time you increment ip, its value is increased by 16.

    Little Quiz

    More pointers: Behold the following C program (in quiz.c):

    /* Line 1 */   #include <stdio.h>
    /* Line 2 */  
    /* Line 3 */   int main()
    /* Line 4 */   {
    /* Line 5 */     int i, array[10];
    /* Line 6 */     int *ip, *a1;
    /* Line 7 */     int **ipp;
    /* Line 8 */   
    /* Line 9 */     ip = &i;
    /* Line 10 */    ipp = &ip;
    /* Line 11 */    a1 = &(array[1]);
    /* Line 12 */  
    /* Line 13 */    for (i = 0; i < 10; i++) array[i] = i;
    /* Line 14 */  
    /* Line 15 */    i = 11;
    /* Line 16 */  
    /* Line 17 */    printf("ip: 0x%lx, &ip: 0x%lx, array: 0x%lx\n", (unsigned long) ip, (unsigned long) &ip, (unsigned long) array);
    /* Line 18 */    printf("\n");
    /* Line 19 */    
    /* Line 20 */  
    /* Line 21 */    printf("&i: 0x%lx\n", (unsigned long) &i);
    /* Line 22 */    printf("ipp: 0x%lx, *ipp: 0x%lx, **ipp: 0x%lx\n", (unsigned long) ipp, (unsigned long) *ipp, (unsigned long) **ipp);
    /* Line 23 */    printf("\n");
    /* Line 24 */    printf("a1: 0x%lx, *a1: 0x%lx\n", (unsigned long) a1, (unsigned long) *a1);
    /* Line 25 */  
    /* Line 26 */    a1 += 4;
    /* Line 27 */    *a1 = 500;
    /* Line 28 */    
    /* Line 29 */    for (i = 0; i < 10; i++) {
    /* Line 30 */      printf("%d ", array[i]);
    /* Line 31 */    }
    /* Line 32 */    printf("\n");
    /* Line 33 */  }
    

    When you run this, the first line of output is:

    UNIX> ./quiz
    ip: 0xeffff9fc, &ip: 0xeffff9cc, array: 0xeffff9d0
    
    
    What is the rest of the output?

    (In class, I used the following drawing to help illustrate. You can get them in Little-Quiz-Helper.odp (Open Office) or Little-Quiz-Helper.pdf (PDF).)

    This is tricky, but you should be able to do it with all you currently know about pointers. This is the kind of question I am fond of asking on tests. Here's the answer. If you want to make sure you're doing things right, try to draw a picture of memory and fill in what that first line tells you. Here would be my picture. We'll start with a blank slate with the relevant addresses from the first line of the program:

    0xeffff9cc: |                        |
    0xeffff9d0: |                        |
    0xeffff9d4: |                        |
    0xeffff9d8: |                        |
    0xeffff9dc: |                        |
    0xeffff9e0: |                        |
    0xeffff9e4: |                        |
    0xeffff9e8: |                        |
    0xeffff9ec: |                        |
    0xeffff9f0: |                        |
    0xeffff9f4: |                        |
    0xeffff9f8: |                        |
    0xeffff9fc: |                        |
    

    Now, what do we know from the first line of output. Well, the address of ip is 0xeffff9cc, and its value is 0xeffff9fc. So we can draw in its value at that address:

    0xeffff9cc: | ip = 0xeffff9fc        |
    0xeffff9d0: |                        |
    0xeffff9d4: |                        |
    0xeffff9d8: |                        |
    0xeffff9dc: |                        |
    0xeffff9e0: |                        |
    0xeffff9e4: |                        |
    0xeffff9e8: |                        |
    0xeffff9ec: |                        |
    0xeffff9f0: |                        |
    0xeffff9f4: |                        |
    0xeffff9f8: |                        |
    0xeffff9fc: |                        |
    

    From line 9, we know that the address of i is equal to ip. Moreover, i's value is 11, so we can draw that in:

    0xeffff9cc: | ip = 0xeffff9fc        |
    0xeffff9d0: |                        |
    0xeffff9d4: |                        |
    0xeffff9d8: |                        |
    0xeffff9dc: |                        |
    0xeffff9e0: |                        |
    0xeffff9e4: |                        |
    0xeffff9e8: |                        |
    0xeffff9ec: |                        |
    0xeffff9f0: |                        |
    0xeffff9f4: |                        |
    0xeffff9f8: |                        |
    0xeffff9fc: | i = 11                 |
    

    Now, array is a pointer to the first element of the 10-element array. Since its value is 0xeffff9d0, we can draw in all ten elements of the array:

    0xeffff9cc: | ip = 0xeffff9fc        |
    0xeffff9d0: | array[0] = 0           |
    0xeffff9d4: | array[1] = 1           |
    0xeffff9d8: | array[2] = 2           |
    0xeffff9dc: | array[3] = 3           |
    0xeffff9e0: | array[4] = 4           |
    0xeffff9e4: | array[5] = 5           |
    0xeffff9e8: | array[6] = 6           |
    0xeffff9ec: | array[7] = 7           |
    0xeffff9f0: | array[8] = 8           |
    0xeffff9f4: | array[9] = 9           |
    0xeffff9f8: |                        |
    0xeffff9fc: | i = 11                 |
    

    Now we know all we need to know. Since &i equals ip, the first line of output is "&i: 0xeffff9fc." Next, from line 10 of the program, we know that ipp equals the address of ip. So the next line is:

    ipp: 0xeffff9cc, *ipp: 0xeffff9fc, **ipp: 0xb
    
    Note, that last word is 0xb, and not 11, because we are printing 11 in hexadecimal.

    Now, since a1 is a pointer to array[1], its value is 0xeffff9d4. Thus, our next line of output (after the blank line) is:

    a1: 0xeffff9d4, *a1: 0x1
    
    Finally, the statement ``a1 += 4'' is pointer arithmetic. It sets a1 ahead four ints. Therefore, it adds 16 to the value of a1 -- 16 because ints are 4 bytes: 4*4 = 16. After the statement it points to array[5]. Therefore, the last line is
    0 1 2 3 4 500 6 7 8 9 
    
    Here is the entire output:

    ip: 0xeffff9fc, &ip: 0xeffff9cc, array: 0xeffff9d0
    
    &i: 0xeffff9fc
    ipp: 0xeffff9cc, *ipp: 0xeffff9fc, **ipp: 0xb
    
    a1: 0xeffff9d4, *a1: 0x1
    0 1 2 3 4 500 6 7 8 9 
    

    The output can differ from machine to machine, but it totally dependent on the first line. Here it is compiled in 64-bit mode on my Macbook:

    UNIX> ./quiz
    ip: 0x7fff5fbfdc7c, &ip: 0x7fff5fbfdc70, array: 0x7fff5fbfdc30
    
    &i: 0x7fff5fbfdc7c
    ipp: 0x7fff5fbfdc70, *ipp: 0x7fff5fbfdc7c, **ipp: 0xb
    
    a1: 0x7fff5fbfdc34, *a1: 0x1
    0 1 2 3 4 500 6 7 8 9 
    UNIX> 
    

    At this point, I urge you to study exam questions from old pointer midterm exams. Reading memory and pointers is an important part of systems programming and these questions give you very good practice with it.