CS302 Lecture notes -- General C and Unix Things

  • Jim Plank
  • Directory: /home/bvz/courses/302/notes/GeneralCStuff
  • Lecture notes: http://www.cs.utk.edu/~bvz/courses/302/notes/GeneralStuff/index.html
    This is a lecture that covers some basics of C and Unix that you should know.

    Compiling C programs

    In CS102, you learned how to compile a C program into an a.out file. For example, in this directory (/home/bvz/courses/302notes/GeneralCStuff), there is the C program hw.c:

    #include < stdio.h >
    
    main()
    {
      printf("Hello world!\n");
    }
    

    In CS102 you learned a simple way of compiling this program:

    UNIX> gcc hw.c
    
    This creates a file called a.out, which you can execute:
    UNIX> ls a.out
    a.out*
    UNIX> a.out
    Hello world!
    UNIX>
    
    When you typed a.out, the operating system loaded the contents of the file a.out into memory, and then the CPU started to execute the instructions of a.out. What this did was instruct the operating system to print ``Hello world!'' to the screen. Fortunately, the programming language C makes this pretty straightforward.

    Now, you don't have to always compile in this simple way. You can use the -o option to gcc to name the executable file something besides a.out. For example, here we'll name it hw:

    UNIX> gcc -o hw hw.c
    UNIX> ls hw
    hw*
    UNIX> hw
    Hello world!
    UNIX>
    

    Compiling Multiple C Programs into One Executable

    You don't have to have all of your C program in one file. For example, if your file contains multiple procedures, you can spread them over multiple files. For example, look at the files callprintme.c and printme.c. The first one defines a main() procedure that calls the procedure printme() three times. In printme.c, the procedure printme() is defined to print ``Hi, I'm Jim.'' Thus, if we compile these two files together, we should get an executable that prints ``Hi, I'm Jim'' three times.

    The simple way to do this is to make a simple call to gcc with the C files, and make the a.out file:

    UNIX> gcc callprintme.c printme.c
    UNIX> a.out
    Hi, I'm Jim
    Hi, I'm Jim
    Hi, I'm Jim
    UNIX>
    
    However, doing this kind of compilation for bigger programs ends up taking a long time. To solve this problem, gcc can be used to turn C files into object files. These have the .o extension. Like the a.out files, the object files are unreadable by humans, but they are written in such a way that gcc can make executables from them very quickly. To make an object file from a C file, use the -c option to gcc:
    UNIX> gcc -c callprintme.c
    UNIX> ls call*
    callprintme.c   callprintme.o
    UNIX> gcc -c printme.c
    UNIX> ls printme*
    printme.c   printme.o
    UNIX>
    
    To make an executable from object files, simply call gcc on them just like you would .c files. If one of the object files defines the main() procedure, and all procedures that you use are defined in the object files, then it will create an executable for you. This is called linking:
    UNIX> gcc -o printme callprintme.o printme.o
    UNIX> printme
    Hi, I'm Jim
    Hi, I'm Jim
    Hi, I'm Jim
    UNIX>
    
    If you try to make an executable and you don't include enough object files, then gcc will give you an error saying that a procedure is ``undefined.'' For example, suppose I try to link callprintme.o without printme.o. This means that the printme() procedure is undefined, and gcc will flag this as an error:
    UNIX> gcc -o printme callprintme.o
    Undefined                       first referenced
     symbol                             in file
    printme                             callprintme.o
    ld: fatal: Symbol referencing errors. No output written to printme
    UNIX>
    
    Get used to compiling C files into object files and then linking the object files into executables. You'll be doing this a lot.

    Header Files / Function Prototypes

    Above I had a program in one C file call a procedure in another C file. However, I did it in a non-standard way. Typically, when a procedure in one file calls a procedure in another C file, a header file is shared by both files that gives them both information on how the procedure calls should be made. For example, look at retstring.c:

    #include "retstring.h"
    
    char *return_string(int i, double d)
    {
      char *string;
    
      string = (char *) malloc(80);
    
      sprintf(string, "%d %.2lf", i, d);
      return string;
    }
    

    This defines a simple procedure that takes an integer and double, and converts them to a string. For example, if I call return_string(1, 2.333), it will return the string "1 2.33". (Note, saying .2 in the sprintf() statement means that d will be printed to 2 decimal places).

    Now, suppose I want to call return_string() from another file. How does the compiler know what arguments return_string() takes, and what its return value is? The way we typically tell the compiler these things is by using a header file. Look at retstring.h:

    extern char *return_string(int i, double d);
    
    This statement is called a function prototype. It looks like a typical procedure call declaration, except it ends with a semi-colon instead of the definition of a procedure call. Ignore the extern for now. The function prototype tells the compiler how we should be using return_string(). It says ``return_string() should take an integer as its first argument, and a double as its second argument. It will return a char *. You don't need to have the variable names -- for example, you could equally have said:
    extern char *return_string(int, double);
    
    Now, if we want to use return_string() in another C file, we include retstring.h. For example, here is a program called print12.c:

    #include < stdio.h >
    #include "retstring.h"
    
    main()
    {
      char *s;
    
      s = return_string(1, 2.33);
      printf("%s\n", s);
    }
    

    To compile it, we make print12.o and retstring.o, and then link them together to make the executable print12:

    UNIX> gcc -c print12.c
    UNIX> gcc -c retstring.c
    UNIX> gcc -o print12 print12.o retstring.o
    UNIX> print12
    1 2.33
    UNIX> 
    
    Now, suppose we didn't include retstring.h. For example, look at print12a.c -- it is the same as print12.c but it does not include retstring.h. When we try to compile it, we get a compiler warning:
    UNIX> gcc -c print12a.c
    print12a.c: In function `main':
    print12a.c:7: warning: assignment makes pointer from integer without a cast
    UNIX>
    
    This warning comes because C assumes that all procedures return integers if you don't say otherwise, so it thinks that return_string() returns an int and that you are trying to use it as a char *. If your compiler only gives you warnings, you can still go ahead and make an executable, and it may work just fine:
    UNIX> gcc -o print12a print12a.o retstring.o
    UNIX> print12a
    1 2.33
    UNIX> 
    
    However, you should look at the warning and try to get rid of it. In other words, if you are calling functions from other files, you should put prototypes in a header file and include the header file.

    One of the nice things about header files is that they warn you that you might be doing something wrong. For example, look at print12b.c:

    #include "retstring.h"
    
    main()
    {
      char *s;
    
      s = return_string("Hello World", 2.33);
      printf("%s\n", s);
    }
    

    You'll note that it calls return_string() wrong -- the first argument should be an integer, not a string. When we compile this, the compiler warns us, because the use of return_string() does not match the prototype:

    UNIX> gcc -c print12b.c
    print12b.c: In function `main':
    print12b.c:8: warning: passing arg 1 of `return_string' makes integer from pointer without a cast
    UNIX>
    
    Since it's a warning, we can keep compiling, and it will make an executable, which we can run. It will give is weird output:
    UNIX> gcc -o print12b print12b.o retstring.o
    UNIX> print12b 
    67760 2.33
    UNIX> 
    
    Hmmmm. This is one of those things about C. Sometimes it lets you do things that it shouldn't. The lesson to learn is to look at those compiler warnings and eliminate them.

    Now, the extern declaration in retstring.h tells the compiler that the procedure is defined elsewhere. You don't need it, but I usually include it in header files.


    Standard Header Files

    C comes with a bunch of header files and libraries that are standard. You can find them in the directory /usr/include, but they are hard to read. Procedures like print() and gets() are defined in stdio.h, which is why you include them in most C programs. Also, an object file called libc.a (some object files end with .a instead of .o) is always automatically linked with your executable. You can find it the directory /usr/lib.

    Other header files and object files are not automatically included when you compile. For example, if you want to print the log (base 10) of the number 30000, you'll need to use the function log10(). Type ``man log10'' to see the definition of log10(): UNIX> UNIX> man log10

    exp(3M)               Mathematical Library                exp(3M)
    
    NAME
         exp, expm1, log, log1p, log10, pow - exponential, logarithm,
         power
    
    SYNOPSIS
         cc [ flag ... ] file ...  -lm [ library ... ]
    
         #include 
    
         double exp(double x);
    
         double expm1(double x);
    
         double log(double x);
    
         double log1p(double x);
    
         double log10(double x);
    
         double pow(double x, double y);
    
    MT-LEVEL
         MT-Safe
    
    DESCRIPTION
         exp(x) computes the exponential function e**x.
    
         expm1(x) computes (e**x)-1 accurately even for tiny x.
    
         log(x) computes the natural logarithm of x.
    
         log1p(x) computes log(1+x) accurately even for tiny x.
    
         log10(x) computes the base-10 logarithm of x.
    
    ...
    
    This man page tells you a lot. First, it says that when you want to use log10(), you must include the file math.h. You include it with angle brackets rather than double-quotes because it is in the directory /usr/include. Second, it says that it takes a double as its argument, and returns a double. Third, it says that when you link your executable, you should say -lm. This is shorthand for linking with the file /usr/lib/libm.a.

    Ok -- lets try it. First, lets call it incorrectly. Look at badlog1.c:

    #include < stdio.h >
    
    main()
    {
      printf("%lf\n", log10(30000));
    }
    

    You'll note that it doesn't include math.h. When we compile it, it will complain that it can't find any definition of log10():

    UNIX> gcc -o badlog1 badlog1.c
    Undefined                       first referenced
     symbol                             in file
    log10                               /var/tmp/cca007Fp1.o
    ld: fatal: Symbol referencing errors. No output written to badlog1
    UNIX> 
    
    We can fix that by including -lm on the command line. Now the linker will find log10 in /usr/lib/libm.a:
    UNIX> gcc -o badlog1 badlog1.c -lm
    UNIX> badlog1
    0.000000
    UNIX> 
    
    Ok -- it compiled fine, but that's not what we were expecting as output to the program. The log base 10 of 30000 should be some number between 4 and 5 (if you didn't know that, you need to brush up on your logarithms). What happened? Well, we didn't define a function prototype for log10() because we didn't include math.h. Therefore, the C compiler believed that log10() took an integer argument and returned an integer. And we got a weird result.

    Now, goodlog1.c is just like badlog1.c except it includes math.h. Now the compiler knows what to do -- it will convert 30000 to a double when passing it to log10(), and it knows that log10() returns a double, so the return value will be printed properly:

    UNIX> gcc -o goodlog1 goodlog1.c -lm
    UNIX> goodlog1
    4.477121
    UNIX> 
    

    Defining variables and types in header files

    Sometimes we would like to define a global variable to use in multiple C files. The way to do this is to define the variable in one file, and then declare it as an extern variable in another. For example, look at share1.c:

    #include < stdio.h >
    
    extern int GV;
    
    main()
    {
      printf("%d\n", GV);
    }
    

    and share2.c:

    #include < stdio.h >
    
    extern int GV;
    
    int GV = 45;
    

    We can compile these together and the program will print 45:

    UNIX> gcc -c share1.c
    UNIX> gcc -c share2.c
    UNIX> gcc -o share share1.o share2.o
    UNIX> share
    45
    UNIX> 
    
    If we didn't define GV as a regular global variable in share2.c, then the compiler would yell at us. For example, look at share3.c, which is just like share2.c, but it does not define GV:
    #include < stdio.h >
    
    extern int GV;
    
    If we try to compile share1.c and share3.c, then the compile will give us an error:
    UNIX> gcc -c share3.c
    UNIX> gcc -o share share1.o share3.o
    Undefined                       first referenced
     symbol                             in file
    GV                                  share1.o
    ld: fatal: Symbol referencing errors. No output written to share
    UNIX> 
    
    Moreover, if we define GV twice, then the compiler will give us another error. Here, share4.c is identical to share2.c:
    UNIX> cp share2.c share4.c
    UNIX> gcc -c share4.c
    UNIX> gcc -o share share1.o share2.o share4.o
    ld: fatal: symbol `GV' is multiply defined:
            (file share2.o and file share4.o);
    ld: fatal: File processing errors. No output written to share
    UNIX> 
    
    Typically, we put the extern declarations into a header file, and then the programs simply need to include the header file. For exmaple, twoshare1.c and twoshare2.c share the variables GV1 and GV2 through the header file twoshare.h:
    UNIX> gcc -c twoshare1.c
    UNIX> gcc -c twoshare2.c
    UNIX> gcc -o twoshare twoshare1.o twoshare2.o
    UNIX> twoshare
    45 99
    UNIX> 
    
    You can also define types in header files, and then multiple C files can use the same type. For example, typeshare1.c and typeshare2.c share the definiting of the type Person through the header file typeshare.h:
    UNIX> gcc -c typeshare1.c
    UNIX> gcc -c typeshare2.c
    UNIX> gcc -o typeshare typeshare1.o typeshare2.o
    UNIX> typeshare
    Jim Plank
    UNIX> 
    

    Static declarations

    If you want to define a global variable or procedure that no other C program in another file may use, then you declare it as static. For example, look at static_ex1.c, and static_ex2.c. They share a header file static_ex.h which defines a procedure print_int(). You'll note that both static_ex1.c and static_ex2.c define a static procedure called get_gv() that do different things. However, since they are static, there is no confustion. Each file simply uses its own version of get_gv().
    UNIX> gcc -c static_ex1.c
    UNIX> gcc -c static_ex2.c
    UNIX> gcc -o static_ex static_ex1.o static_ex2.o
    UNIX> static_ex
    50 100
    UNIX> 
    

    Where does the compiler find header files?

    Header files that are included with angle brackets (like stdio.h) are found in /usr/include. Header files that are included with double-quotes are found in the current directory. If you specify a directory with -I, then the compiler will also look for those header files in the given directory. Thus, when you say:
    UNIX> gcc -I/blugreen/homes/plank/cs140/include -c maxmin.c
    
    then it will look in /blugreen/homes/plank/cs140/include to find header files like fields.h.

    Make

    Make is a wonderful command to help you compile things. It assumes that you have a file called makefile or Makefile in the current directory. The makefile needs to be in specific format, which many people find confusing. Basically, if a file can be created automatically, then you can specify in the makefile how to create it, and then make will automatically create it for you. Even better yet, you can specify that one file is made from others, and then if you don't change the other files between make calls, then make won't remake the files.

    Look at the makefile in this directory. This is a very simplistic makefile, but it should help you out. Look at the first few lines:

    hw: hw.o
            gcc -o hw hw.o
    
    hw.o: hw.c
            gcc -c hw.c
    
    The first line tells make that the exectable hw is made from the object file hw.o. The second line says that you make hw with the command
            gcc -o hw hw.o
    
    This line must start with a tab character, which is one of those confusing things about make. You can specify as many commands as you want as long as they all start with tabs.

    Now, the other two lines say how hw.o is made from hw.c. If you want to make hw, you type:

    UNIX> make hw
    gcc -c hw.c
    gcc -o hw hw.o
    UNIX> hw
    Hello world!
    UNIX> 
    
    You see that to make hw, it first makes hw.o and then it makes hw. If you try it again:
    UNIX> make hw
    `hw' is up to date.
    UNIX> 
    
    It tells you that nothing has changed, so we didn't need to recompile. If we modify hw.c or delete hw.o, it will remake it:
    UNIX> rm hw.o
    UNIX> make hw
    gcc -c hw.c
    gcc -o hw hw.o
    UNIX> 
    
    Better yet, if we only delete hw, then it will remake it from hw.o, which does not need to be remade:
    UNIX> rm hw
    UNIX> make hw
    gcc -o hw hw.o
    UNIX> 
    
    You'll see that programs like printme that take more than one C file to create, can be handled easily with make. Here are the relevant lines of the makefile:
    printme: callprintme.o printme.o
            gcc -o printme callprintme.o printme.o
    
    printme.o: printme.c
            gcc -c printme.c
    
    callprintme.o: callprintme.c
            gcc -c callprintme.c
    
    And that make does the right thing:
    UNIX> make printme
    gcc -c callprintme.c
    gcc -c printme.c
    gcc -o printme callprintme.o printme.o
    UNIX> 
    
    If I change printme.c or delete printme.o, you'll note that make figures out that it does not need to recompile callprintme.c:
    UNIX> rm printme.o
    UNIX> make printme
    gcc -c printme.c
    gcc -o printme callprintme.o printme.o
    UNIX> 
    
    If a program file uses a header file, you can specify that the object file that it compiles into depends on the header file, so that if the header file changes, you recompile that program file. For example, look at the following lines of the makefile:
    print12: print12.o retstring.o
            gcc -o print12 print12.o retstring.o
    
    print12.o: print12.c retstring.h
            gcc -c print12.c
    
    retstring.o: retstring.c retstring.h
            gcc -c retstring.c
    
    So, now, suppose I make print12:
    UNIX> make print12
    gcc -c print12.c
    gcc -c retstring.c
    gcc -o print12 print12.o retstring.o
    UNIX> 
    
    And then I modify retstring.h (I'll do this below with the touch command). Since I specified that print12.o and retstring.o both depend on retstring.h, they will be recompiled due to the changing of retstring.h:
    UNIX> vi retstring.h
    ...
    UNIX> make print12
    gcc -c print12.c
    gcc -c retstring.c
    gcc -o print12 print12.o retstring.o
    UNIX> 
    
    Note that if I need to do something special in compilation, I can do it in the makefile. For example, the compilation of goodlog1 includes the math library:
    goodlog1: goodlog1.c
            gcc -o goodlog1 goodlog1.c -lm
    
    Most of the other definitions in the makefile are straightforward. The last two require special attention:
    all: hw printme print12 print12a print12b goodlog1 \
         share typeshare twoshare
     
    clean:
            rm -f hw printme print12 print12a print12b goodlog1 \
                share typeshare twoshare *.o a.out core
    
    First, the backslash simply continues one line to the next.

    The all specification says that if you type make all it will make all of those exectuables. The clean specification says that if you type make clean, it will remove all of the executables, plus all .o files, plus the files a.out and core. It is an excellent idea to have all and clean specifications in all of your makefiles.

    UNIX> make clean
    rm -f hw printme print12 print12a print12b goodlog1 \
                share typeshare twoshare *.o a.out core
    UNIX> make all
    gcc -c hw.c
    gcc -o hw hw.o
    gcc -c callprintme.c
    gcc -c printme.c
    gcc -o printme callprintme.o printme.o
    gcc -c print12.c
    gcc -c retstring.c
    gcc -o print12 print12.o retstring.o
    gcc -c print12a.c
    print12a.c: In function `main':
    print12a.c:7: warning: assignment makes pointer from integer without a cast
    gcc -o print12a print12a.o retstring.o
    gcc -c print12b.c
    print12b.c: In function `main':
    print12b.c:8: warning: passing arg 1 of `return_string' makes integer from pointer without a cast
    gcc -o print12b print12b.o retstring.o
    gcc -o goodlog1 goodlog1.c -lm
    gcc -c share1.c
    gcc -c share2.c
    gcc -o share share1.o share2.o
    gcc -c typeshare1.c
    gcc -c typeshare2.c
    gcc -o typeshare typeshare1.o typeshare2.o
    gcc -c twoshare1.c
    gcc -c twoshare2.c
    gcc -o twoshare twoshare1.o twoshare2.o
    UNIX> rm hw
    UNIX> make all
    gcc -o hw hw.o
    UNIX> 
    
    If you type make with no arguments, then it will make the first thing specified. In the given makefile, that will be hw. It is a good idea to have your all specification be the first one in your makefile, so that you can make everything by typing make.

    Finally, there are many tricky things that you can do with make, such as variable declarations and default making. My makefiles do this. Read the lecture notes on make from the Scripts and Utilities class for more information.


    File permissions

    In Unix, you can specify who can see your files. For example, if I do a ``long listing'' of the files in this directory, the first column of information tells me who can see my files:
    UNIX> ls -l
    total 57
    -rw-r--r--   1 plank          64 Jan 19 09:16 badlog1.c
    -rw-r--r--   1 plank          70 Jan 19 08:31 callprintme.c
    -rw-r--r--   1 plank          82 Jan 19 09:20 goodlog1.c
    -rwxr-xr-x   1 plank        5108 Jan 19 10:24 hw
    -rw-r--r--   1 plank          59 Jan 19 08:31 hw.c
    -rw-r--r--   1 plank         712 Jan 19 10:23 hw.o
    -rw-r--r--   1 plank       26570 Jan 19 10:23 index.html
    -rw-r--r--   1 plank        1647 Jan 19 10:19 makefile
    -rw-r--r--   1 plank         118 Jan 19 08:48 print12.c
    -rw-r--r--   1 plank          95 Jan 19 08:52 print12a.c
    -rw-r--r--   1 plank         130 Jan 19 09:01 print12b.c
    -rw-r--r--   1 plank          62 Jan 19 08:31 printme.c
    -rw-r--r--   1 plank         189 Jan 19 08:37 retstring.c
    -rw-r--r--   1 plank          45 Jan 19 10:11 retstring.h
    -rw-r--r--   1 plank          69 Jan 19 09:28 share1.c
    -rw-r--r--   1 plank          49 Jan 19 09:28 share2.c
    -rw-r--r--   1 plank          17 Jan 19 09:31 share3.c
    -rw-r--r--   1 plank          49 Jan 19 09:33 share4.c
    -rw-r--r--   1 plank          28 Jan 19 09:48 static_ex.h
    -rw-r--r--   1 plank         115 Jan 19 09:48 static_ex1.c
    -rw-r--r--   1 plank         142 Jan 19 09:48 static_ex2.c
    -rw-r--r--   1 plank          32 Jan 19 09:36 twoshare.h
    -rw-r--r--   1 plank          84 Jan 19 09:35 twoshare1.c
    -rw-r--r--   1 plank          71 Jan 19 09:37 twoshare2.c
    -rw-r--r--   1 plank         102 Jan 19 09:39 typeshare.h
    -rw-r--r--   1 plank         134 Jan 19 09:40 typeshare1.c
    -rw-r--r--   1 plank         124 Jan 19 09:40 typeshare2.c
    UNIX>
    
    The first four characters in the listing tell me whether I can read/write/execute my files. For example, the -rwx in hw tell me that I can read, write and execute hw. Similarly, I can read and write callprintme.c, but not execute it. This is because it is not in the proper format for the operating system to load into memory (in order to get it into the proper format, I have to compile it into an executable).

    The last three characters in that first column tell me what others can do with this file. For example, the r-x in hw tell me that others can read and execute, but not write the hw file. Similarly, others can read, but neither write nor execute callprintme.c.

    You can change the protections on your files with the chmod command. Learn how to use it (ask your TA's if you don't know). You should not let anyone read your lab files. If they can, then they can plagiarize, and that has happened many times in previous classes. If someone plagiarizes from you, you will get a zero because could have prevented it. You have been warned.