These should all be familiar to you. You can declare a scalar variable in one of three places: As a global variable, as a procedure parameter, and as a local variable. For example, look at the program below in p1.c:
(In this and all other lecture notes, you can copy the programs and the makefile into your own directory, and then compile them by using make. E.g. to make the program p1, you say ``make p1'').
#include < stdio.h > int i; main(int argc, char **argv) { int j; j = argc; i = j; printf("%d\n", i); }There are three scalar int variables here -- i, j, and argc. I is a global variable. J is a local variable, and argc is a parameter. Scalars are pretty straightforward. You can pass them as parameters to procedures, and return them from procedures without worrying about anything going awry.
#include < stdio.h > char s1[15]; main(int argc, char **argv) { char s2[4]; }S1 is a global array of 15 chars and s2 is a local array of 4 chars.
If an array has been statically declared, then you cannot assign it to another array. For example, look at p2.c:
#include < stdio.h > char s1[15]; main(int argc, char **argv) { char s2[4]; s2 = "Jim"; }The statement ``s2 = "Jim"'' is illegal in C, because s2 has been statically declared. If you try to compile this program, gcc will give you an error:
UNIX> gcc -o p2 p2.c p2.c: In function `main': p2.c:10: incompatible types in assignment UNIX>This is a good rule to bear in mind -- if x is a statically declared aggregate type (a struct or an array), then you can NEVER say ``x = something''. It will always give you an error.
However, you can say ``something = x''. We'll discuss this below:
You can view memory as one huge array of bytes (chars). This array has 2147483648 (or some other huge number) elements. Usually, we consider the indices to this array in hexadecimal. In other words, the array goes from 0x0 to 0x7fffffff.
A pointer is simply an index of this array. Whenever we allocate x bytes of memory, we are reserving x contiguous elements from the memory array. If we set a pointer to these bytes, then that pointer will be the index of the first allocated byte in memory.
For example, look at the following program (in p3.c):
main() { int i; char j[14]; int *ip; char *jp; ip = &i; jp = j; printf("ip = 0x%x. jp = 0x%x\n", ip, jp); }This program allocates one integer (i), an array of 14 characters (j), and two pointers (ip and jp). It then sets the pointers so that they point to the memory allocated for i and j. Finally, it prints out the values of those pointers -- these are indices into the memory array. When we run it, we get:
UNIX> p3 ip = 0xefffe924. jp = 0xefffe910 UNIX>What this means is that when we view memory as an array, elements 0xefffe924, 0xefffe925, 0xefffe926, and 0xefffe927 are allocated for the local variable i, and elements 0xefffe910 through 0xefffe91d are allocated for the array j.
Note that I said ``jp = j'' and not ``jp = &j''. This is because when treated as an expression, an array is equivalent to a pointer. The only difference is that you cannot assign a value to an array variable. Thus, you can say ``jp = j'', but you cannot say ``j = jp''. Moreover, you cannot take the address of an array variable -- saying ``&j'' is illegal.
Pointers are a little like scalars -- they too can be declared as globals, locals or parameters, and can be assigned values, passed as parameters, and returned from procedures. On our machines all pointers are 4 bytes. Thus, in p3.c, there are 26 bytes of local variables allocated in the main() procedure -- 4 for i, 14 for j, 4 for ip, and 4 for jp.
main() { char c; int i; float f; c = 'a'; i = c; f = i; printf("c = %d (%c). i = %d (%c). f = %f\n", c, c, i, i, f); }The statement `i = c' is a type cast, as is the statement `f = i'.
Some type castings, like the one above, are very natural. The C compiler will do these for you without complaining. Most others, however, the C compiler will complain about, unless you specifically tell it that you are doing a type cast (this is a way of telling the compiler ``Yes, I know what I'm doing.'').
For example, think about the procedure call: malloc(n). It allocates and returns n bytes of memory to the programmer. Look at the program p5.c:
main() { char *s; s = malloc(10); strcpy(s, "Jim"); printf("s = %s\n", s); }When you try to compile p5.c, you get a warning from the C compiler:
UNIX> gcc -o p5 p5.c p5.c: In function `main': p5.c:5: warning: assignment makes pointer from integer without a cast UNIX>What's going on? Well, all procedures in C are assumed to return integers unless they are specified otherwise. Thus, the statement ``s = malloc(10)'' is trying to set s, which is a pointer, to the return value of malloc, which is assumed to be an integer. The compiler actually does create p5, but it lets you know that you're doing something strange -- that is, assign a pointer to an integer.
What's the proper thing to do here? Well, you should really declare malloc() as returning a char *, as in p6.c:
extern char *malloc(); main() { char *s; s = malloc(10); strcpy(s, "Jim"); printf("s = %s\n", s); }This tells the compiler that you are using the procedure malloc() which returns a char *, and which is defined elsewhere. You'll note that p6.c compiles without any warnings, but that both p5 and p6 do the same thing when you run them:
UNIX> p5 s = Jim UNIX> p6 s = Jim UNIX>Most people do not write code like p6.c, though. Instead, they write it as in p7.c:
main() { char *s; s = (char *) malloc(10); strcpy(s, "Jim"); printf("s = %s\n", s); }This says to the compiler ``Yes, I know malloc() is returning an int, but I want it to be treated like a char *''. You'll notice that p7.c compiles without warning and runs just like p5 and p6.
You should also notice that on our machines, both pointers and ints are 4 bytes. This has led many people to treat pointers and ints as interchangable. For example, look at the code in p8.c:
main() { char s[4]; int i; char *s2; strcpy(s, "Jim"); i = (int) s; printf("i = %ld (0x%lx)\n", i, i); printf("s = %ld (0x%lx)\n", s, s); i++; s2 = (char *) i; printf("s = 0x%lx. s2 = 0x%lx, i = 0x%lx, s[0] = %c, s[1] = %c, *s2 = %c\n", s, s2, i, s[0], s[1], *s2); }This is a bad assumption, however, because on some machines, like the DEC alpha, ints are 4 bytes and pointers are 8. Thus, when you run p8.c on an alpha you get an error instead of a correct program run. This is because when we said ``i = (int) s'', we lost 4 bytes of the pointer s. Then when we said ``s2 = (char *) i'', the four extra bytes of s2 were set to zero, giving us different addresses for *s2 and s[1]. In fact, s2 becomes an illegal address, which results in a segmentation fault:
On our sparcs:
UNIX> p8 i = -268441312 (0xefffe920) s = -268441312 (0xefffe920) s = 0xefffe920. s2 = 0xefffe921, i = 0xefffe921, s[0] = J, s[1] = i, *s2 = iOn the alpha:
UNIX> p8 i = 536864720 (0x1fffe7d0) s = 4831832016 (0x11fffe7d0) Segmentation fault (core dumped)If we instead use a long for i instead of an int, everything works fine on the alpha, since longs and pointers are both 8 bytes:
On the alpha:
UNIX> p9 i = 4831832016 (0x11fffe7d0) s = 4831832016 (0x11fffe7d0) s = 0x11fffe7d0. s2 = 0x11fffe7d1, i = 0x11fffe7d1, s[0] = J, s[1] = i, *s2 = i UNIX>In this class we will assume that we are always working on a machine where ints and pointers are always 4 bytes. However, in general you should always be sure that your code will work when pointers and ints are different sizes.
The program pa.c generates a segmentation violation by trying to dereference NULL:
#include < stdio.h > main() { char *s; s = NULL; printf("%d\n", s[0]); }
main() { int *i; i = (int *) 1; printf("%d\n", *i); }
Moreover, the compiler always lays out structs so that the fields are aligned. Thus, in the following struct:
struct { char b; int i; }The whole struct will be 8 bytes -- 1 for b, 3 unused, and 4 for i. The 3 bytes are necessary so that i will be aligned. The compiler does not shuffle around the fields so that they pack into memory better. So, for example, if you have:
struct { char b1; int i1; char b2; int i2; }The struct will be 16 bytes:
However, if you order them differently, you can get all of those fields into 12 bytes:
struct { char b1; char b2; int i1; int i2; }Now the struct will have:
main() { char c; int i; int j; i = 10000; c = i; j = c; printf("I: %d, J: %d, C: %d\n", i, j, c); printf("I: 0x%04x, J: 0x%04x, C: 0x%04x\n", i, j, c); }
The second bug is a typical one when you deal with math routines. If you say ``man log10,'' you'll see that it takes a double and returns a double:
double log10(double x);So
main() { double x; x = log10(100); printf("%lf\n", x); }
UNIX> pd -1035.000000Why? This is because you didn't include math.h in your C program, and therefore the compiler assumed that you were passing log10 an integer, and that it returned an integer. And the compiler doesn't worry about casting int's to double's. So you get the bug. You can fix this by including math.h, as in
#include < math.h > main() { double x; x = log10(100); printf("%lf\n", x); }
UNIX> pd 2.00000
Finally
main() { double x; int y; int z; x = 4000.0; y = 20; z = -17; printf("%d %d %d\n", x, y, z); printf("%f %d %d\n", x, y, z); printf("%lf %d %d\n", x, y, z); printf("%lf %lf %lf\n", x, y, z); }
UNIX> pf 1085227008 0 20 4000.000000 20 -17 4000.000000 20 -17 4000.000000 0.000000 -3566985184068214263610043868633531298423160069569428047775 20030203482592393258067630813913494098481449525958709939145371702732604277129148 77019863534390180062158966919576508126277491063615751217181296481290794579216716 39726032966871746925158515232719273883094320046823318866372976525388441556587623 1667712.000000