CS360 Lecture notes -- Pointer Arithmetic (Small lecture)


There's really nothing new in this small lecture - just some reinforcement from the last lecture.
We've used pointers in CS140 and CS302. If you want some review, please see pointer lecture notes from CS140. For additional reinforcement, I have a set of old lecture notes from CS302 where I set up more STL data structures that point to each other with pointers. Both of these are in C++.

There is no standard template library in C. This means that vectors, lists, sets and maps are gone. We will replace all of them in the next few lectures. We'll start with vectors. In C, we use arrays instead of vectors. You can statically declare an array by putting [size] in the variable declaration. For example, the following variable declaration will create an array iarray of ten integers:

int iarray[10];

You can access the elements of iarray in square brackets. Unlike C++, iarray has no methods. In particular, the size of iarray is not stored anywhere -- you have to keep track of it yourself.

In reality, iarray is a pointer to the first element of the array. In other words, there are 40 bytes allocated for the array (since integers are four bytes each), and iarray points to the first of these. If we want, we can set a second pointer to iarray, and we can print the elements of iarray by incrementing the pointer and dereferencing it. We do all of that in src/iptr.c:

/* This program sets a pointer to an array, and then dereferences each of the
   elements of the array using the pointer and pointer arithmetic.  It prints
   the pointers in hexadecimal while it does so. */

#include <stdio.h>
#include <stdlib.h>

typedef unsigned long (UL);

int main()
{
  int iarray[10];
  int *ip;
  int i;

  /* Set the 10 elements of iarray to be 100 to 109, and print the array's address. */

  for (i = 0; i < 10; i++) iarray[i] = 100+i;
  printf("iarray = 0x%lx\n", (UL) iarray);

  /* Set ip equal to array, and then print the 10 elements using both iarray and ip.
     The following quantities will be printed for each element:

      - The index i (goes from 0 to 9)
      - The value of iarray[i] (goes from 100 to 109)
      - The pointer ip (will start at iarray and increment by four each time).
      - What *ip points to (this will be 100 to 109 again)
      - Pointer arithmetic: (ip-array) -- this will be the value of i.
   */

  ip = iarray;

  for (i = 0; i < 10; i++) {
    printf("i=%d.  ",            i              );
    printf("iarray[i]=%d.  ",    iarray[i]      );
    printf("ip = 0x%lx.  ",      (UL) ip        );
    printf("*ip=%d.  ",          *ip            );
    printf("(ip-iarray)=%ld.\n", (UL) (ip-iarray));
    ip++;
  }

  return 0;
}

In the for loop, we print out five quantities -- please read the inline comments for what gets printed. Let's take a look at output:

UNIX> bin/iptr
iarray = 0x7fff5fbfdc40
i=0.  iarray[i]=100.  ip = 0x7fff5fbfdc40.  *ip=100.  (ip-iarray)=0
i=1.  iarray[i]=101.  ip = 0x7fff5fbfdc44.  *ip=101.  (ip-iarray)=1
i=2.  iarray[i]=102.  ip = 0x7fff5fbfdc48.  *ip=102.  (ip-iarray)=2
i=3.  iarray[i]=103.  ip = 0x7fff5fbfdc4c.  *ip=103.  (ip-iarray)=3
i=4.  iarray[i]=104.  ip = 0x7fff5fbfdc50.  *ip=104.  (ip-iarray)=4
i=5.  iarray[i]=105.  ip = 0x7fff5fbfdc54.  *ip=105.  (ip-iarray)=5
i=6.  iarray[i]=106.  ip = 0x7fff5fbfdc58.  *ip=106.  (ip-iarray)=6
i=7.  iarray[i]=107.  ip = 0x7fff5fbfdc5c.  *ip=107.  (ip-iarray)=7
i=8.  iarray[i]=108.  ip = 0x7fff5fbfdc60.  *ip=108.  (ip-iarray)=8
i=9.  iarray[i]=109.  ip = 0x7fff5fbfdc64.  *ip=109.  (ip-iarray)=9
UNIX> 
Everything in hex will change from machine to machine. However, their interrelationship will always be the same. In the for loop, i, iarray[i] and *ip should all be straightforward and require no explanation. I'll explain the others in detail.

When this program starts to run, the operating system has set it up so that the 40 bytes starting with 0x7fff5fbfdc40 are where iarray is stored. That is why iarray is equal to 0x7fff5fbfdc40. If iarray[0] is the four bytes that start at 0x7fff5fbfdc40, then iarray[1] must be the four bytes that start at 0x7fff5fbfdc44. This is why ip is equal to 0x7fff5fbfdc44 on the second iteration of the for loop.

While this is all logical, it is a little confusing: adding one to ip actually adds four to the value of the pointer. This is called "pointer arithmetic" -- when you add x to a pointer, it really adds sx to it, where s is the size of data to which the pointer points.

The last column printed in the for loop is a little confusing too. Again, focus on the line where i equals one. ip is equal to 0x7fff5fbfdc44, so you would think that (ip-iarray) would equal four. It does not, because the compiler is doing pointer arithmetic -- from the point of view of the compiler, when you say "ip-iarray," you are asking for the number of elements between ip and iarray. That will be the difference between the pointers, divided by the size of the element. In this case, it is (0x7fff5fbfdc44-0x7fff5fbfdc40)/4, which equals one.

To help hammer this home a little further, I have three other programs where are nearly identical to ptr.c:

That last one is worth looking at it its entirety:

/* This is the same as src/iptr.c, except iarray is an array of "Two" structs instead
   of ints.  The "Two" struct is simply a struct of two doubles. */

#include <stdio.h>
#include <stdlib.h>

typedef struct {
  double d1;
  double d2;
} Two;

#include <stdio.h>
#include <stdlib.h>

typedef unsigned long (UL);

int main()
{
  Two iarray[10];
  Two *ip;
  int i;

  for (i = 0; i < 10; i++) iarray[i].d1 = 100+i;   /* Set the d1 field to be 100 + i */
  for (i = 0; i < 10; i++) iarray[i].d2 = 200+i;   /* Set the d2 field to be 200 + i */

  printf("iarray = 0x%lx\n", (UL) iarray);

  ip = iarray;

  for (i = 0; i < 10; i++) {
    printf("i=%d.  ",                    i                              );
    printf("iarray[i]={%.2lf,%.2lf}.  ", iarray[i].d1, iarray[i].d2     );
    printf("ip = 0x%lx.  ",              (UL) ip                        );
    printf("*ip={%.2lf,%.2lf}.  ",       ip->d1, ip->d2                 );
    printf("(ip-iarray)=%ld.\n",         (UL) (ip-iarray)               );
    ip++;
  }

  return 0;
}

In particular, I use the arrow to dereference different parts of the struct from the pointer: ip->d1 accesses d1, and ip->d2 accesses d2. Let's take a look:

UNIX> bin/sptr
iarray = 0x7ffeed0e10f0
i=0.  iarray[i]={100.00,200.00}.  ip = 0x7ffeed0e10f0.  *ip={100.00,200.00}.  (ip-iarray)=0.
i=1.  iarray[i]={101.00,201.00}.  ip = 0x7ffeed0e1100.  *ip={101.00,201.00}.  (ip-iarray)=1.
i=2.  iarray[i]={102.00,202.00}.  ip = 0x7ffeed0e1110.  *ip={102.00,202.00}.  (ip-iarray)=2.
i=3.  iarray[i]={103.00,203.00}.  ip = 0x7ffeed0e1120.  *ip={103.00,203.00}.  (ip-iarray)=3.
i=4.  iarray[i]={104.00,204.00}.  ip = 0x7ffeed0e1130.  *ip={104.00,204.00}.  (ip-iarray)=4.
i=5.  iarray[i]={105.00,205.00}.  ip = 0x7ffeed0e1140.  *ip={105.00,205.00}.  (ip-iarray)=5.
i=6.  iarray[i]={106.00,206.00}.  ip = 0x7ffeed0e1150.  *ip={106.00,206.00}.  (ip-iarray)=6.
i=7.  iarray[i]={107.00,207.00}.  ip = 0x7ffeed0e1160.  *ip={107.00,207.00}.  (ip-iarray)=7.
i=8.  iarray[i]={108.00,208.00}.  ip = 0x7ffeed0e1170.  *ip={108.00,208.00}.  (ip-iarray)=8.
i=9.  iarray[i]={109.00,209.00}.  ip = 0x7ffeed0e1180.  *ip={109.00,209.00}.  (ip-iarray)=9.
UNIX> 
As you can see, the pointers are different from the run of iptr. However, their interrelationship is the same, and each time you increment ip, its value is increased by 16, because the size of the struct is 16 bytes.

Little Quiz

More pointers: Behold the following C program (in src/quiz.c):

/* Line 1 */   #include <stdio.h>
/* Line 2 */  
/* Line 3 */   int main()
/* Line 4 */   {
/* Line 5 */     int i, array[10];
/* Line 6 */     int *ip, *a1;
/* Line 7 */     int **ipp;
/* Line 8 */   
/* Line 9 */     ip = &i;
/* Line 10 */    ipp = &ip;
/* Line 11 */    a1 = &(array[1]);
/* Line 12 */  
/* Line 13 */    for (i = 0; i < 10; i++) array[i] = i;
/* Line 14 */  
/* Line 15 */    i = 11;
/* Line 16 */  
/* Line 17 */    printf("ip: 0x%lx, &ip: 0x%lx, array: 0x%lx\n", (unsigned long) ip, (unsigned long) &ip, (unsigned long) array);
/* Line 18 */    printf("\n");
/* Line 19 */    
/* Line 20 */  
/* Line 21 */    printf("&i: 0x%lx\n", (unsigned long) &i);
/* Line 22 */    printf("ipp: 0x%lx, *ipp: 0x%lx, **ipp: 0x%lx\n", (unsigned long) ipp, (unsigned long) *ipp, (unsigned long) **ipp);
/* Line 23 */    printf("\n");
/* Line 24 */    printf("a1: 0x%lx, *a1: 0x%lx\n", (unsigned long) a1, (unsigned long) *a1);
/* Line 25 */  
/* Line 26 */    a1 += 4;
/* Line 27 */    *a1 = 500;
/* Line 28 */    
/* Line 29 */    for (i = 0; i < 10; i++) {
/* Line 30 */      printf("%d ", array[i]);
/* Line 31 */    }
/* Line 32 */    printf("\n");
/* Line 33 */  }

When you run this, the first line of output is:

UNIX> ./quiz
ip: 0xeffff9fc, &ip: 0xeffff9cc, array: 0xeffff9d0

What is the rest of the output?

(In class, I used the following drawing to help illustrate. You can get them in img/Little-Quiz-Helper.odp (Open Office) or img/Little-Quiz-Helper.pdf (PDF).)

This is tricky, but you should be able to do it with all you currently know about pointers. This is the kind of question I am fond of asking on tests. Here's the answer. If you want to make sure you're doing things right, try to draw a picture of memory and fill in what that first line tells you. Here would be my picture. We'll start with a blank slate with the relevant addresses from the first line of the program:

0xeffff9cc: |                        |
0xeffff9d0: |                        |
0xeffff9d4: |                        |
0xeffff9d8: |                        |
0xeffff9dc: |                        |
0xeffff9e0: |                        |
0xeffff9e4: |                        |
0xeffff9e8: |                        |
0xeffff9ec: |                        |
0xeffff9f0: |                        |
0xeffff9f4: |                        |
0xeffff9f8: |                        |
0xeffff9fc: |                        |

Now, what do we know from the first line of output. Well, the address of ip is 0xeffff9cc, and its value is 0xeffff9fc. So we can draw in its value at that address:

0xeffff9cc: | ip = 0xeffff9fc        |
0xeffff9d0: |                        |
0xeffff9d4: |                        |
0xeffff9d8: |                        |
0xeffff9dc: |                        |
0xeffff9e0: |                        |
0xeffff9e4: |                        |
0xeffff9e8: |                        |
0xeffff9ec: |                        |
0xeffff9f0: |                        |
0xeffff9f4: |                        |
0xeffff9f8: |                        |
0xeffff9fc: |                        |

From line 9, we know that the address of i is equal to ip. Moreover, i's value is 11, so we can draw that in:

0xeffff9cc: | ip = 0xeffff9fc        |
0xeffff9d0: |                        |
0xeffff9d4: |                        |
0xeffff9d8: |                        |
0xeffff9dc: |                        |
0xeffff9e0: |                        |
0xeffff9e4: |                        |
0xeffff9e8: |                        |
0xeffff9ec: |                        |
0xeffff9f0: |                        |
0xeffff9f4: |                        |
0xeffff9f8: |                        |
0xeffff9fc: | i = 11                 |

Now, array is a pointer to the first element of the 10-element array. Since its value is 0xeffff9d0, we can draw in all ten elements of the array:

0xeffff9cc: | ip = 0xeffff9fc        |
0xeffff9d0: | array[0] = 0           |
0xeffff9d4: | array[1] = 1           |
0xeffff9d8: | array[2] = 2           |
0xeffff9dc: | array[3] = 3           |
0xeffff9e0: | array[4] = 4           |
0xeffff9e4: | array[5] = 5           |
0xeffff9e8: | array[6] = 6           |
0xeffff9ec: | array[7] = 7           |
0xeffff9f0: | array[8] = 8           |
0xeffff9f4: | array[9] = 9           |
0xeffff9f8: |                        |
0xeffff9fc: | i = 11                 |

Now we know all we need to know. Since &i equals ip, the first line of output is "&i: 0xeffff9fc." Next, from line 10 of the program, we know that ipp equals the address of ip. So the next line is:

ipp: 0xeffff9cc, *ipp: 0xeffff9fc, **ipp: 0xb
Note, that last word is 0xb, and not 11, because we are printing 11 in hexadecimal.

Now, since a1 is a pointer to array[1], its value is 0xeffff9d4. Thus, our next line of output (after the blank line) is:

a1: 0xeffff9d4, *a1: 0x1
Finally, the statement ``a1 += 4'' is pointer arithmetic. It sets a1 ahead four ints. Therefore, it adds 16 to the value of a1 -- 16 because ints are 4 bytes: 4*4 = 16. After the statement it points to array[5]. Therefore, the last line is
0 1 2 3 4 500 6 7 8 9 
Here is the entire output:

ip: 0xeffff9fc, &ip: 0xeffff9cc, array: 0xeffff9d0

&i: 0xeffff9fc
ipp: 0xeffff9cc, *ipp: 0xeffff9fc, **ipp: 0xb

a1: 0xeffff9d4, *a1: 0x1
0 1 2 3 4 500 6 7 8 9 

The output can differ from machine to machine, but it is totally dependent on the first line. Here it is compiled in 64-bit mode on my Macbook:

UNIX> ./quiz
ip: 0x7fff5fbfdc7c, &ip: 0x7fff5fbfdc70, array: 0x7fff5fbfdc30

&i: 0x7fff5fbfdc7c
ipp: 0x7fff5fbfdc70, *ipp: 0x7fff5fbfdc7c, **ipp: 0xb

a1: 0x7fff5fbfdc34, *a1: 0x1
0 1 2 3 4 500 6 7 8 9 
UNIX>