Pointers exercises
Maria A. Hernandez Rivero
All examples are based on a 64-bit system
The C file for this program is in
mherna21.c.
First example
Recall the pointer arithmetic formula: base + offset * size
. Where base
is the initial memory address.
Let's declare an array of four long - recall that in a 64-bit system, each long is 8 bytes (16 hexadecimal digits).
Thus, the following array will occupy 32 bytes in memory.
long array[] = {0xbeefdeabdeadbeef, 2, 3, 4};
The first number is written in hexadecimal representation. The other 3 are written in decimal representation.
Also, recall that array
is just an alias for a memory address.
array
will interpret the bytes in memory like this:
0xbeefdeabdeadbeef |
The first eight bytes is a big hex number |
0x0000000000000002 |
The next eight bytes store the number two |
0x0000000000000003 |
The next eight bytes store the number three. |
0x0000000000000004 |
The last eight bytes store the number four. |
NOTE: In the previous table, we placed 4 bytes at a time, but remember that each memory cell contains only one byte.
Hence, a more accurate representation of the way the bytes are stored in memory would be:
0xef |
0xbe |
0xad |
0xde |
0xab |
0xde |
0xef |
0xbe |
0x02 |
0x00 |
... |
Next, let's reinterpret the memory address where array
is stored as a pointer to an integer, and assign this address to the pointer
pi
int *pi = (int *)array;
The bytes remain unchanged in memory, what changes is the way we access these bytes through pi.
Initially, pi
points to the same memory address that array
does. However, pi
is interpreting the the bytes as integers. Since our machine is little endian, each original long is broke into two parts
-- the lower four bytes come first, and the higher four bytes come last.
So this is how pi
will interpret the bytes in memory:
0xdeadbeef |
The first four bytes are the lower four bytes of 0xbeefdeabdeadbeef |
0xbeefdeab |
The next four bytes are the higher four bytes of 0xbeefdeabdeadbeef |
0x00000002 |
The first four bytes are the lower four bytes of 0x0000000000000002 |
0x00000000 |
The next four bytes are the higher four bytes of 0x0000000000000002 |
0x00000003 |
The first four bytes are the lower four bytes of 0x0000000000000003 |
0x00000000 |
The next four bytes are the higher four bytes of 0x0000000000000003 |
0x00000004 |
The first four bytes are the lower four bytes of 0x0000000000000004 |
0x00000000 |
The next four bytes are the higher four bytes of 0x0000000000000004 |
Recall that we can treat pointers as arrays, thus if we print pi[4]
will get 3.
If we do pi++
, then pi
will point to the memory address given by: array + 1*4
. For instance, assuming that
array
initially points to the memory address 1000
, then after doing pi++
, pi
will point to the memory address
1000 + 1*4 = 1004
In other words, when doing pi++
we are moving 4 bytes up in memory. Hence, now pi
points at the memory
address containing ab
, and since pi
is a pointer to an integer it will span all the next 4 bytes.
Now, when we dereference pi
and print it, we obtain: beefdeab
Second example
Let's declare an array of unsigned long
unsigned long array2[] = {4294967295, 250, 3, 4};
The maximum number that can be represented with an unsigned integer is 232 - 1 = 4294967295.
Thus, 4294967295 = 1111_1111_1111_1111_1111_1111_1111_1111 == 0xffff_ffff.
In other words, if we print array2[0] as an
hexadecimal we will get 0xffffffff
. Interpreted as a long 4294967295 is:
0000_0000_0000_0000_0000_0000_0000_0000_1111_1111_1111_1111_1111_1111_1111_1111
(upper 32 bits (4 bytes) are all zero and lower 32 bits are all 1).
array2
will interpret the bytes stored in memory like this:
0x00000000ffffffff |
Upper 32 bits (4 bytes) are all zero and lower 32 bits are all 1 (0xffffffff in hexadecimal) |
0x00000000000000fa |
The next eight bytes store the number 250 (0xfa in hexadecimal) |
0x0000000000000003 |
The next eight bytes store the number three. |
0x0000000000000004 |
The last eight bytes store the number four. |
Now, let's reinterpret the memory address where array2
is stored as a pointer to an integer, and assign this address to the pointer pi2
int *pi2 = (int *)array2;
Similarly to the first example, after doing pi2++
, pi2
will point to the byte following the initial 4 bytes and will span 4 bytes.
These following 4 bytes are all zeroes. This would also be the case if we had any other number that can be represented within 4 bytes,
1, 2... till 4294967295 - since all of these numbers can be represented with 4 bytes, the upper 4 bytes are zero.
Thus, the following print statement printf("%d\n", *pi2);
outputs 0.
Notice that if we had INITIALLY (not after typecasting) declared array2
as an array of unsigned int, then doing pi2++
would take
us to the memory address where 250 is located - If that were the case, the previous print statement would print 250 instead of 0.
This is because, if that was the case, we would have allocated 16 bytes on the stack when declaring array2
,
and array2 would interpret the bytes stored in memory like this:
0xffffffff |
0x000000fa |
0x00000003 |
0x00000004 |
Third example
Let's declare an array similar to the array above, but now the first element of the array will be 4294967296
long array3[] = {4294967296, 250, 3, 4};
When interpreted as a long,
4294967296 = 0000_0000_0000_0000_0000_0000_0000_0001_0000_0000_0000_0000_0000_0000_0000_0000
Notice that the first 4 bytes are all zeroes. Thus, the following print statement printf("0x%04x\n", *pi3);
outputs 0x0000
. NOTE: These 4 zeroes do not represent 4 bytes, each byte is represented by two hexadecimal digits.
We get 4 zeroes in the output because
we specified it in the formatting.
array3 will interpret the bytes stored in memory like this:
0x0000000100000000 |
The 32nd bit is one, all the others are 0. |
0x00000000000000fa |
The next eight bytes store the number 250 (0xfa in hexadecimal) |
0x0000000000000003 |
The next eight bytes store the number three. |
0x0000000000000004 |
The last eight bytes store the number four. |
int *pi3 = (int *)array3;
pi3
will interpret the bytes as follow:
0x00000000 |
The first four bytes are the lower four bytes of 0x0000000100000000 |
0x00000001 |
The next four bytes are the higher four bytes of 0x0000000100000000 |
0x000000fa |
The first four bytes are the lower four bytes of 0x00000000000000fa |
0x00000000 |
The next four bytes are the higher four bytes of 0x00000000000000fa |
0x00000003 |
The first four bytes are the lower four bytes of 0x0000000000000003 |
0x00000000 |
The next four bytes are the higher four bytes of 0x0000000000000003 |
0x00000004 |
The first four bytes are the lower four bytes of 0x0000000000000004 |
0x00000000 |
The next four bytes are the higher four bytes of 0x0000000000000004 |
After doing pi3++
, pi3
will be pointing to the memory address where the 1 is located.
Thus, the following print statement
printf("0x%04x\n", *pi3);
prints: 0x0001
Fourth example
Let's declare an array of four long:
long array4[] = {0xbeefdeabdeadbeef, 2, 3, 4};
Now, let's reinterpret the memory address where array4
is stored as a pointer to a char, and assign this address to the pointer pc
char *pc = (char *)array4;
pc
will interpret the bytes in memory as follow:
0xef |
The first byte is the first byte of 0xbeefdeabdeadbeef |
0xbe |
The second byte is the second byte of 0xbeefdeabdeadbeef |
0xad |
The third byte is the third byte of 0xbeefdeabdeadbeef |
0xde |
The fourth byte is the fourth byte of 0xbeefdeabdeadbeef |
0xab |
The fifth byte is the fifth byte of 0xbeefdeabdeadbeef |
0xde |
The sixth byte is the sixth byte of 0xbeefdeabdeadbeef |
0xef |
The seventh byte is the seventh byte of 0xbeefdeabdeadbeef |
0xbe |
The eighth byte is the eighth byte of 0xbeefdeabdeadbeef |
0x02 |
The ninth byte is the first byte of 0x0000000000000002 |
0x00 |
The tenth byte is the second byte of 0x0000000000000002 |
... |
After doing pc = pc + 5
, pc
will point to the sixth element ( 0xde
).
Now, in the following print statement
printf("0x%08x\n", *((int *)pc));
we are typecasting pc
to a pointer to integer. Meaning that pc
still points to the same memory
address containing 0xde
, but now it will span 4 bytes (since integers are 4 bytes).
Thus, the output of that print statement is: 0x02beefde
Fifth example
In the following statement, we are typecasting 1000 to a pointer to integer:
int *z = (int *)1000;
This simply means treat 1000 as a memory address. It also means that z
will treat memory addresses as follow:
Memory Address |
Pointer Arithmetic |
1000 |
z |
1004 |
z + 1 |
1008 |
z + 2 |
1012 |
z + 3 |
... |
NOTE: Here, we are NOT allocating any memory. So, we shouldn't try to dereference - otherwise it will seg fault.
Notice the difference between allocating memory and not.
Here, we are allocating memory in the stack for a
(4 bytes) and also for the pointer to integer pi
(8 bytes) - which points
to the memory address a
:
int a = 10;
int *pi = &a;
Here, we are allocating memory in the stack for one pointer to integer pi2
(8 bytes) and memory in the heap for one integer (4 bytes):
int *pi2 = (int *)malloc(sizeof(int));
*pi2 = 20;
In both cases, it is safe to dereference pi
and pi2
because we allocated memory for them. Additionally,
since we initialized the memory, dereferencing it will not yield garbage values.
However, in this statement int *z = (int *)1000;
we have not allocated the memory
1000 == 0x0000_0000_0000_03e8
, thus, we cannot dereference z.
Even though we have not allocated memory, we can do some pointer arithmetic with it.
After doing this: z++
; z
will point to the memory address 1004 (1000 + 1*4)
.
Sixth example
In the following statement, we are typecasting 1000 to a pointer to pointer:
char **pointer_to_pointer = (char **)1000;
This simply means treat 1000 as a memory address. It also means that pointer_to_pointer
will treat memory addresses as follow:
Memory Address |
Pointer Arithmetic |
1000 |
z |
1008 |
z + 1 |
1016 |
z + 2 |
1024 |
z + 3 |
1032 |
z + 4 |
... |
This is because pointer_to_pointer
will move through the memory addresses following the formula: base + offset*size_of_pointer
.
In this case: (1000 + offset*8)
Consecuently, after performing the following operation:
pointer_to_pointer = pointer_to_pointer + 4;
pointer_to_pointer
will point to the memory address 1032.
This may be confusing, so NOTE AGAIN: pointer_to_pointer is a pointer to a POINTER not a char, hence the pointer arithmetic
is 1000 + 4*8
rather than 1000 + 4*1
Seventh example
First part
Let's declare the following 2D array:
short matrix[4][2] = {{0x1234,0x5678},{0x9012,0xabcd},{0xdf01,0xfedc},{0xba21,0x9876}};
Recall that elements are stored contiguously in memory and the smallest addressable unit of memory is
one byte - each memory cell stores 1 byte. Thus, a visual representation of the memory layout of the
previous 2D array is something like this:
Memory address |
Row |
Col |
Elements |
0x16f186f90 |
0 |
0 |
0x34 (little-endian - least significant bit stored first) |
0x16f186f91 |
0 |
0 |
0x12 |
0x16f186f92 |
0 |
1 |
0x78 |
0x16f186f93 |
0 |
1 |
0x56 |
0x16f186f94 |
1 |
0 |
0x12 |
0x16f186f95 |
1 |
0 |
0x90 |
0x16f186f96 |
1 |
1 |
0xcd |
0x16f186f97 |
1 |
1 |
0xab |
0x16f186f98 |
2 |
0 |
0x01 |
0x16f186f99 |
2 |
0 |
0xdf |
0x16f186f9a |
2 |
1 |
0xdc |
0x16f186f9b |
2 |
1 |
0xfe |
0x16f186f9c |
3 |
0 |
0x21 |
0x16f186f9d |
3 |
0 |
0xba |
0x16f186f9e |
3 |
1 |
0x76 |
0x16f186f9f |
3 |
1 |
0x98 |
And the way that matrix
interprets these bytes in memory is the following:
Memory address |
Elements |
0x16f186f90 |
0x1234 |
0x16f186f92 |
0x5678 |
0x16f186f94 |
0x9012 |
0x16f186f96 |
0xabcd |
0x16f186f98 |
0xdf01 |
0x16f186f9a |
0xfedc |
0x16f186f9c |
0xba21 |
0x16f186f9e |
0x9876 |
The following statement simply gives you the memory address where the first byte ( 0x34
) is stored.
printf("base address of matrix: 0x%lx\n", (UL)matrix);
Now, let's perform the following operations and explain the process:
&(matrix[1][0])
gives you the address (&) of the element at the row 1 and col 0. Then,
when adding 4, we will perform pointer arithmetic as follow: base + offset*size_of_short
Our base will be the memory address where the element at matrix[1][0] is located,
which is 4 bytes after the start of the matrix. Then, doing (&(matrix[1][0]) + 4)
will take us
'4 bytes + 4*2(bytes) = 12 bytes'
from the start of the matrix - recall that the size of a short is 2 bytes.
In other words, it will take us to the memory address 0x16f186f9c
Then, if we want to access the actual element at that memory address, we need to dereference it.
Thus, the statement printf("*(&(matrix[1][0]) + 4): 0x%hx\n", *(&(matrix[1][0]) + 4));
will give you the actual
element at the memory address 0x16f186f9c
, which is 0xba21
Second part
Now, after performing the following typecasting:
unsigned char* cp = (unsigned char *)matrix;
cp
will interpret the bytes in memory as follow:
Memory address |
Elements |
0x16f186f90 |
0x34 |
0x16f186f91 |
0x12 |
0x16f186f92 |
0x78 |
0x16f186f93 |
0x56 |
0x16f186f94 |
0x12 |
0x16f186f95 |
0x90 |
0x16f186f96 |
0xcd |
0x16f186f97 |
0xab |
0x16f186f98 |
0x01 |
0x16f186f99 |
0xdf |
0x16f186f9a |
0xdc |
0x16f186f9b |
0xfe |
0x16f186f9c |
0x21 |
0x16f186f9d |
0xba |
0x16f186f9e |
0x76 |
0x16f186f9f |
0x98 |
In other words, the pointer arithmetic would be: base + offset*size_of_char
Hence, when printing printf("*cp 0x%hx\n", *cp);
, we get 0xab
.
Notice that doing *(cp + 7)
is equivalent to doing cp[7]
, the subscript operator first calculates the offset
in memory and then dereference.