Pointers exercises

Maria A. Hernandez Rivero

All examples are based on a 64-bit system

The C file for this program is in mherna21.c.

First example

Recall the pointer arithmetic formula: base + offset * size . Where base is the initial memory address.
Let's declare an array of four long - recall that in a 64-bit system, each long is 8 bytes (16 hexadecimal digits). Thus, the following array will occupy 32 bytes in memory.

long array[] = {0xbeefdeabdeadbeef, 2, 3, 4};

The first number is written in hexadecimal representation. The other 3 are written in decimal representation. Also, recall that array is just an alias for a memory address.

array will interpret the bytes in memory like this:

0xbeefdeabdeadbeef The first eight bytes is a big hex number
0x0000000000000002 The next eight bytes store the number two
0x0000000000000003 The next eight bytes store the number three.
0x0000000000000004 The last eight bytes store the number four.

NOTE: In the previous table, we placed 4 bytes at a time, but remember that each memory cell contains only one byte. Hence, a more accurate representation of the way the bytes are stored in memory would be:

0xef
0xbe
0xad
0xde
0xab
0xde
0xef
0xbe
0x02
0x00
...

Next, let's reinterpret the memory address where array is stored as a pointer to an integer, and assign this address to the pointer pi

int *pi = (int *)array;

The bytes remain unchanged in memory, what changes is the way we access these bytes through pi. Initially, pi points to the same memory address that array does. However, pi is interpreting the the bytes as integers. Since our machine is little endian, each original long is broke into two parts -- the lower four bytes come first, and the higher four bytes come last. So this is how pi will interpret the bytes in memory:

0xdeadbeef The first four bytes are the lower four bytes of 0xbeefdeabdeadbeef
0xbeefdeab The next four bytes are the higher four bytes of 0xbeefdeabdeadbeef
0x00000002 The first four bytes are the lower four bytes of 0x0000000000000002
0x00000000 The next four bytes are the higher four bytes of 0x0000000000000002
0x00000003 The first four bytes are the lower four bytes of 0x0000000000000003
0x00000000 The next four bytes are the higher four bytes of 0x0000000000000003
0x00000004 The first four bytes are the lower four bytes of 0x0000000000000004
0x00000000 The next four bytes are the higher four bytes of 0x0000000000000004


Recall that we can treat pointers as arrays, thus if we print pi[4] will get 3.

If we do pi++ , then pi will point to the memory address given by: array + 1*4 . For instance, assuming that array initially points to the memory address 1000 , then after doing pi++ , pi will point to the memory address 1000 + 1*4 = 1004

In other words, when doing pi++ we are moving 4 bytes up in memory. Hence, now pi points at the memory address containing ab , and since pi is a pointer to an integer it will span all the next 4 bytes. Now, when we dereference pi and print it, we obtain: beefdeab

Second example

Let's declare an array of unsigned long

unsigned long array2[] = {4294967295, 250, 3, 4};

The maximum number that can be represented with an unsigned integer is 232 - 1 = 4294967295.
Thus, 4294967295 = 1111_1111_1111_1111_1111_1111_1111_1111 == 0xffff_ffff.
In other words, if we print array2[0] as an hexadecimal we will get 0xffffffff . Interpreted as a long 4294967295 is: 0000_0000_0000_0000_0000_0000_0000_0000_1111_1111_1111_1111_1111_1111_1111_1111 (upper 32 bits (4 bytes) are all zero and lower 32 bits are all 1).

array2 will interpret the bytes stored in memory like this:

0x00000000ffffffff Upper 32 bits (4 bytes) are all zero and lower 32 bits are all 1 (0xffffffff in hexadecimal)
0x00000000000000fa The next eight bytes store the number 250 (0xfa in hexadecimal)
0x0000000000000003 The next eight bytes store the number three.
0x0000000000000004 The last eight bytes store the number four.


Now, let's reinterpret the memory address where array2 is stored as a pointer to an integer, and assign this address to the pointer pi2

int *pi2 = (int *)array2;

Similarly to the first example, after doing pi2++ , pi2 will point to the byte following the initial 4 bytes and will span 4 bytes. These following 4 bytes are all zeroes. This would also be the case if we had any other number that can be represented within 4 bytes, 1, 2... till 4294967295 - since all of these numbers can be represented with 4 bytes, the upper 4 bytes are zero. Thus, the following print statement printf("%d\n", *pi2); outputs 0.

Notice that if we had INITIALLY (not after typecasting) declared array2 as an array of unsigned int, then doing pi2++ would take us to the memory address where 250 is located - If that were the case, the previous print statement would print 250 instead of 0.

This is because, if that was the case, we would have allocated 16 bytes on the stack when declaring array2 , and array2 would interpret the bytes stored in memory like this:
0xffffffff
0x000000fa
0x00000003
0x00000004

Third example

Let's declare an array similar to the array above, but now the first element of the array will be 4294967296

long array3[] = {4294967296, 250, 3, 4};

When interpreted as a long,
4294967296 = 0000_0000_0000_0000_0000_0000_0000_0001_0000_0000_0000_0000_0000_0000_0000_0000 Notice that the first 4 bytes are all zeroes. Thus, the following print statement printf("0x%04x\n", *pi3); outputs 0x0000 . NOTE: These 4 zeroes do not represent 4 bytes, each byte is represented by two hexadecimal digits. We get 4 zeroes in the output because we specified it in the formatting.

array3 will interpret the bytes stored in memory like this:

0x0000000100000000 The 32nd bit is one, all the others are 0.
0x00000000000000fa The next eight bytes store the number 250 (0xfa in hexadecimal)
0x0000000000000003 The next eight bytes store the number three.
0x0000000000000004 The last eight bytes store the number four.


int *pi3 = (int *)array3;

pi3 will interpret the bytes as follow:

0x00000000 The first four bytes are the lower four bytes of 0x0000000100000000
0x00000001 The next four bytes are the higher four bytes of 0x0000000100000000
0x000000fa The first four bytes are the lower four bytes of 0x00000000000000fa
0x00000000 The next four bytes are the higher four bytes of 0x00000000000000fa
0x00000003 The first four bytes are the lower four bytes of 0x0000000000000003
0x00000000 The next four bytes are the higher four bytes of 0x0000000000000003
0x00000004 The first four bytes are the lower four bytes of 0x0000000000000004
0x00000000 The next four bytes are the higher four bytes of 0x0000000000000004


After doing pi3++ , pi3 will be pointing to the memory address where the 1 is located. Thus, the following print statement

printf("0x%04x\n", *pi3);

prints: 0x0001

Fourth example

Let's declare an array of four long:

long array4[] = {0xbeefdeabdeadbeef, 2, 3, 4};

Now, let's reinterpret the memory address where array4 is stored as a pointer to a char, and assign this address to the pointer pc

char *pc = (char *)array4;

pc will interpret the bytes in memory as follow:

0xef The first byte is the first byte of 0xbeefdeabdeadbeef
0xbe The second byte is the second byte of 0xbeefdeabdeadbeef
0xad The third byte is the third byte of 0xbeefdeabdeadbeef
0xde The fourth byte is the fourth byte of 0xbeefdeabdeadbeef
0xab The fifth byte is the fifth byte of 0xbeefdeabdeadbeef
0xde The sixth byte is the sixth byte of 0xbeefdeabdeadbeef
0xef The seventh byte is the seventh byte of 0xbeefdeabdeadbeef
0xbe The eighth byte is the eighth byte of 0xbeefdeabdeadbeef
0x02 The ninth byte is the first byte of 0x0000000000000002
0x00 The tenth byte is the second byte of 0x0000000000000002
...


After doing pc = pc + 5 , pc will point to the sixth element ( 0xde ). Now, in the following print statement

printf("0x%08x\n", *((int *)pc));

we are typecasting pc to a pointer to integer. Meaning that pc still points to the same memory address containing 0xde , but now it will span 4 bytes (since integers are 4 bytes). Thus, the output of that print statement is: 0x02beefde

Fifth example

In the following statement, we are typecasting 1000 to a pointer to integer:

int *z = (int *)1000;

This simply means treat 1000 as a memory address. It also means that z will treat memory addresses as follow:

Memory Address Pointer Arithmetic
1000 z
1004 z + 1
1008 z + 2
1012 z + 3
...


NOTE: Here, we are NOT allocating any memory. So, we shouldn't try to dereference - otherwise it will seg fault.

Notice the difference between allocating memory and not.

Here, we are allocating memory in the stack for a (4 bytes) and also for the pointer to integer pi (8 bytes) - which points to the memory address a :

int a = 10;
int *pi = &a;


Here, we are allocating memory in the stack for one pointer to integer pi2 (8 bytes) and memory in the heap for one integer (4 bytes):

int *pi2 = (int *)malloc(sizeof(int));
*pi2 = 20;


In both cases, it is safe to dereference pi and pi2 because we allocated memory for them. Additionally, since we initialized the memory, dereferencing it will not yield garbage values. However, in this statement int *z = (int *)1000; we have not allocated the memory 1000 == 0x0000_0000_0000_03e8, thus, we cannot dereference z.

Even though we have not allocated memory, we can do some pointer arithmetic with it.

After doing this: z++ ; z will point to the memory address 1004 (1000 + 1*4) .

Sixth example

In the following statement, we are typecasting 1000 to a pointer to pointer:

char **pointer_to_pointer = (char **)1000;

This simply means treat 1000 as a memory address. It also means that pointer_to_pointer will treat memory addresses as follow:

Memory Address Pointer Arithmetic
1000 z
1008 z + 1
1016 z + 2
1024 z + 3
1032 z + 4
...


This is because pointer_to_pointer will move through the memory addresses following the formula: base + offset*size_of_pointer . In this case: (1000 + offset*8)

Consecuently, after performing the following operation:

pointer_to_pointer = pointer_to_pointer + 4;

pointer_to_pointer will point to the memory address 1032.

This may be confusing, so NOTE AGAIN: pointer_to_pointer is a pointer to a POINTER not a char, hence the pointer arithmetic is 1000 + 4*8 rather than 1000 + 4*1

Seventh example

First part

Let's declare the following 2D array:

short matrix[4][2] = {{0x1234,0x5678},{0x9012,0xabcd},{0xdf01,0xfedc},{0xba21,0x9876}};

Recall that elements are stored contiguously in memory and the smallest addressable unit of memory is one byte - each memory cell stores 1 byte. Thus, a visual representation of the memory layout of the previous 2D array is something like this:

Memory address Row Col Elements
0x16f186f90 0 0 0x34 (little-endian - least significant bit stored first)
0x16f186f91 0 0 0x12
0x16f186f92 0 1 0x78
0x16f186f93 0 1 0x56
0x16f186f94 1 0 0x12
0x16f186f95 1 0 0x90
0x16f186f96 1 1 0xcd
0x16f186f97 1 1 0xab
0x16f186f98 2 0 0x01
0x16f186f99 2 0 0xdf
0x16f186f9a 2 1 0xdc
0x16f186f9b 2 1 0xfe
0x16f186f9c 3 0 0x21
0x16f186f9d 3 0 0xba
0x16f186f9e 3 1 0x76
0x16f186f9f 3 1 0x98


And the way that matrix interprets these bytes in memory is the following:

Memory address Elements
0x16f186f90 0x1234
0x16f186f92 0x5678
0x16f186f94 0x9012
0x16f186f96 0xabcd
0x16f186f98 0xdf01
0x16f186f9a 0xfedc
0x16f186f9c 0xba21
0x16f186f9e 0x9876


The following statement simply gives you the memory address where the first byte ( 0x34 ) is stored.

printf("base address of matrix: 0x%lx\n", (UL)matrix);

Now, let's perform the following operations and explain the process:

&(matrix[1][0]) gives you the address (&) of the element at the row 1 and col 0. Then, when adding 4, we will perform pointer arithmetic as follow: base + offset*size_of_short Our base will be the memory address where the element at matrix[1][0] is located, which is 4 bytes after the start of the matrix. Then, doing (&(matrix[1][0]) + 4) will take us '4 bytes + 4*2(bytes) = 12 bytes' from the start of the matrix - recall that the size of a short is 2 bytes. In other words, it will take us to the memory address 0x16f186f9c

Then, if we want to access the actual element at that memory address, we need to dereference it. Thus, the statement printf("*(&(matrix[1][0]) + 4): 0x%hx\n", *(&(matrix[1][0]) + 4)); will give you the actual element at the memory address 0x16f186f9c , which is 0xba21

Second part

Now, after performing the following typecasting:

unsigned char* cp = (unsigned char *)matrix;

cp will interpret the bytes in memory as follow:

Memory address Elements
0x16f186f90 0x34
0x16f186f91 0x12
0x16f186f92 0x78
0x16f186f93 0x56
0x16f186f94 0x12
0x16f186f95 0x90
0x16f186f96 0xcd
0x16f186f97 0xab
0x16f186f98 0x01
0x16f186f99 0xdf
0x16f186f9a 0xdc
0x16f186f9b 0xfe
0x16f186f9c 0x21
0x16f186f9d 0xba
0x16f186f9e 0x76
0x16f186f9f 0x98


In other words, the pointer arithmetic would be: base + offset*size_of_char

Hence, when printing printf("*cp 0x%hx\n", *cp); , we get 0xab .

Notice that doing *(cp + 7) is equivalent to doing cp[7] , the subscript operator first calculates the offset in memory and then dereference.