"C Stuff 1:" Getting Started with C, Scalar Types and Aggregate Types


As a caveat, this lecture and the next used to be bundled into one. I have unbundled them to reflect more closely what I'm doing in lecture.

Moving from C++ to C

This class is taught in C, rather than C++. The reasoning is as follows: Because C hides so much less from you than C++, you have a much easier time figuring out what's going on when you run one or more programs. This will be a little painful, because you lose so many of the wonderful things about C++ on which you have grown to rely, like cin, strings, objects with methods, and the standard template library. Sorry.

These lecture notes detail the parts of C++ that you lose when you migrate to C, and how you replace them.

You have to use gcc to compile programs in this class. You cannot use g++. Don't give the TA's C++ code and say you didn't know. You know.

Time to learn C.


Header files

As with C++, you include standard header files with #include. You include the file name in less-than/greater-than signs, and you include the .h extension. Instead of starting your programs with:

#include <iostream>
using namespace std;

you start them with:

#include <stdio.h>
#include <stdlib.h>

I never liked that "using namespace std" stuff anyway.


Comments

Comments in C are delimited by "/*" and "*/". The former starts the comment, which can span multiple lines, and the latter ends the comment. (C++ style commenting has been added to the C standard, so you can use it, but I don't -- you never know when you're going to be running on that 1979 VM....)

Bye-bye, cin and cout

Frankly, this isn't too painful, and will be less so when you learn the fields library. I'm assuming that you already know printf() from previous classes. That handles output. For input, we'll focus on three procedures that are defined in stdio.h: scanf(), fscanf() and fgets(). However, we're going to focus on them in a later lecture. For now, we'll be writing some non-interactive code.

Types in C

In C, there are three kinds of types that variables can have -- scalars, aggregates, and pointers. Half of the game in getting things right in C is keeping yourself from being confused about types. This lecture tries to elaborate on this a little.

Scalar Types

There are 7 scalar types in C:
  • char -- 1 byte
  • short -- 2 bytes
  • int -- 4 bytes
  • long -- 4 or 8 bytes, depending on the system and compiler
  • float -- 4 bytes
  • double -- 8 bytes
  • (pointer -- 4 or 8 bytes, depending on the system and compiler)

    These should all be familiar to you (ok, maybe not a short, but the rest should). If you want to verify or use the size of a type in C, you use the macro sizeof(). For example, sizeof(long) will return either 4 or 8, depending on how big a long is in your system.

    You can declare a scalar variable in one of three places: As a global variable, as a procedure parameter, and as a local variable. For example, look at the program below in p1.c:

    (In this and all other lecture notes, you can copy the programs and the makefile into your own directory, and then compile them by using make. E.g. to make the program p1, you say ``make p1'').

    #include <stdio.h>
    #include <stdlib.h>
    
    int i;
    
    int main(int argc, char **argv)
    {
      int j;
    
      j = argc;
      i = j;
      printf("Argc:          %d\n", i);
    
      j = sizeof(long);
      printf("Sizeof(long):  %d\n", j);
    
      j = sizeof(int *);
      printf("Sizeof(int *): %d\n", j);
    
      exit(0);
    }
    

    There are three scalar int variables here -- i, j, and argc. I is a global variable. J is a local variable, and argc is a parameter. Scalars are pretty straightforward. You can pass them as parameters to procedures, and return them from procedures without worrying about anything going awry. The program prints argc (which it has copied to the variables j and i), and then the size of longs and pointers. Here are two runs on my Mac (in 2018). As you can see, longs and pointers are 8 bytes:

    UNIX> ./p1
    Argc:          1
    Sizeof(long):  8
    Sizeof(int *): 8
    UNIX> ./p1 using many arguments
    Argc:          4
    Sizeof(long):  8
    Sizeof(int *): 8
    UNIX> 
    
    Some machines allow you to compile in 32-bit mode, which forces pointers and longs to be four bytes. On my macintosh, that's the "-m32" option to gcc:
    UNIX> gcc -m32 -o p1-32 p1.c
    UNIX> ./p1-32
    Argc:          1
    Sizeof(long):  4
    Sizeof(int *): 4
    UNIX> 
    
    Some machines don't have 8-byte longs or pointers. One example is the Raspberry Pi. When I pulled the repo and compiled on my Pi, I got the following:
    pi@raspberrypi:~/CS360/cs360-lecture-notes/Cstuff-1$ ./p1
    Argc:          1
    Sizeof(long):  4
    Sizeof(int *): 4
    pi@raspberrypi:~/CS360/cs360-lecture-notes/Cstuff-1$ 
    
    The Pi's compiler doesn't have the "-m32" option. Such is life.

    Aggregate Types

    Arrays and structs are aggregate types in C. They are more complex than scalars. You can statically declare an array as a global or local variable -- I do both below in p2.c:

    #include <stdio.h>
    #include <stdlib.h>
    
    char s1[15];
    
    int main(int argc, char **argv)
    {
      char s2[4];
    
    ...
    

    S1 is a global array of 15 chars and s2 is a local array of 4 chars.

    If an array has been statically declared, then you cannot assign it to another array. For example, look at the rest of p2.c:

    #include <stdio.h>
    #include <stdlib.h>
    
    char s1[15];
    
    int main(int argc, char **argv)
    {
      char s2[4];
      
      s2 = "Jim";
      exit(0);
    }
    

    The statement ``s2 = "Jim"'' is illegal in C, because s2 has been statically declared. If you try to compile this program, gcc will give you an error:

    UNIX> gcc -o p2 p2.c
    p2.c: In function `main':
    p2.c:10: incompatible types in assignment
    UNIX>
    
    This is a good rule to bear in mind -- if x is an array, then you should NEVER say ``x = something''. It will usually give you an error. However, if it doesn't (because they change the compiler yet again), you are writing bad C code.

    However, you can always say ``something = x''. We'll discuss later in the lecture.

    When you pass arrays as parameters, the pointers are passed, and not the arrays. This is unlike a vector in C++, which copies the vectors. Here's an example (in p2a.c):

    #include <stdio.h>
    #include <stdlib.h>
    
    /* This sets any lower-case letter in a to upper case. */
    
    void change_case(char a[20])
    {
      int i;
    
      for (i = 0; a[i] != '\0'; i++) {
        if (a[i] >= 'a' && a[i] <= 'z') a[i] += ('A' - 'a');
      }
    }
    
    /* This sets a string to 19 characters and then calls change_case(). */
    
    int main()
    {
      int i;
      char s[20];
      
      /* Set s to "abcdefghijklmnopqrs". */
    
      for (i = 0; i < 19; i++) s[i] = 'a' + i;
      s[19] = '\0';
    
      printf("First, S is %s.\n", s);
      change_case(s);
      printf("Now, S is   %s.\n", s);
    
      return 0;
    }
    

    You'll note that even though you declare a to be an array of 20 chars, it is simply the pointer that gets passed to the procedure. For that reason, change_case() operates on the array, and not on a copy:

    UNIX> ./p2a
    First, S is abcdefghijklmnopqrs.
    Now, S is   ABCDEFGHIJKLMNOPQRS.
    UNIX>
    
    In fact, you can't trust that the compiler cares about the fact you put the "20" inside the declaration of change_case. The following are in fact all equivalent:
    void change_case(char a[20])
    void change_case(char a[19])
    void change_case(char a[5000])
    void change_case(char a[])
    void change_case(char *a)
    

    Structs

    The second way to aggregate data is with a struct. A struct looks a little like a C++ class with some notable omissions: Suppose we want to aggregate an int and a double. We can do that as in id1.c

    #include <stdio.h>
    #include <stdlib.h>
    
    struct intdouble {
      int i;
      double d;
    };
    
    int main()
    {
      struct intdouble id1;
    
      id1.i = 5;
      id1.d = 3.14;
    
      printf("%d %.2lf\n", id1.i, id1.d);
      exit(0);
    }
    

    UNIX> ./id1
    5 3.14
    UNIX> 
    
    You can use a typedef to make it a little more readable (id2.c):

    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct intdouble {
      int i;
      double d;
    } ID;
    
    int main()
    {
      ID id1;
    
      id1.i = 5;
      id1.d = 3.14;
    
      printf("1: %d %.2lf\n", id1.i, id1.d);
      exit(0);
    }
    

    You may have some confusion with structs, because they exist in C++ with different semantics. Let's take a look at the following code in C++, which declares two intdouble's, sets one's variables, and then copies one to the other (id3.cpp):

    #include <cstdio>
    #include <iostream>
    using namespace std;
    
    struct intdouble {
      int i;
      double d;
    };
    
    int main()
    {
      intdouble id1, id2;
    
      id1.i = 5;
      id1.d = 3.14;
    
      id2 = id1;
      id2.i += 5;
      id2.d += 5;
    
      printf("1: %d %.2lf\n", id1.i, id1.d);
      printf("2: %d %.2lf\n", id2.i, id2.d);
    }
    

    Straightforward when it runs:

    UNIX> ./id3
    1: 5 3.14
    2: 10 8.14
    UNIX> 
    
    Let's change this to C. If we simply fix the headers, this will not compile. That is because C++ creates "intdouble" as a type, and C does not. When we try to compile, it fails:
    UNIX> gcc -o id4 id4.c
    id4.c: In function 'main':
    id4.c:11: error: 'intdouble' undeclared (first use in this function)
    id4.c:11: error: (Each undeclared identifier is reported only once
    id4.c:11: error: for each function it appears in.)
    id4.c:11: error: expected ';' before 'id1'
    id4.c:13: error: 'id1' undeclared (first use in this function)
    id4.c:16: error: 'id2' undeclared (first use in this function)
    UNIX> 
    
    Now, if you put "struct" in front of "intdouble," that will fix the problem. It is in id5.c:

    #include <stdio.h>
    #include <stdlib.h>
    
    struct intdouble {
      int i;
      double d;
    };
    
    int main()
    {
      struct intdouble id1, id2;
    
      id1.i = 5;
      id1.d = 3.14;
    
      id2 = id1;  /* THIS IS THE OFFENDING LINE */
      id2.i += 5;
      id2.d += 5;
    
      printf("1: %d %.2lf\n", id1.i, id1.d);
      printf("2: %d %.2lf\n", id2.i, id2.d);
      exit(0);
    }
    

    This runs identically to the C++ version. I can tell you that I disapprove of this code. Why? Because the statement ``id2 = id1'' offends me. It is the only part of C where you can copy an unspecified number of bytes with an assignment statement. It is a weakness of the language. For example, take a look at id5a.c:

    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct {
      int a[1000];
    } SID;
    
    int main()
    {
      SID s1, s2;
      int i;
    
      for (i = 0; i < 1000; i++) s1.a[i] = i;
      s2 = s1;
    
      for (i = 0; i < 1000; i++) printf("%4d %4d\n", s1.a[i], s2.a[i]);
     
      exit(0);
    }
    

    The ``s2 = s1'' line copies 4000 bytes. While I expect such garbage in C++, I am surprised that it's legal in C. Why does it allow you to copy the struct, but not to copy the array? Who knows -- anyway, I want you to be aware of it. You will never see me use that feature of the language because I don't approve. As a corollary, you can pass a struct as an argument to a procedure, and in C++ fashion, it makes a copy of the entire thing. You'll note that in id5b.c, I define a procedure a(), which changes the last element of the array, and when we run it, s1.a[999] is unchanged. Make a mental note of it.

    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct {
      int a[1000];
    } SID;
    
    void a(SID s)    /* Although this procedure changes element 999 of s, */
    {                /* s is a copy of the calling parameter, so it is    */
      s.a[999] = -1; /* deleted at the end of the procedure.              */
    }                /* In other words, the procedure does nothing.       */
    
    
    int main()
    {
      SID s1, s2;
      int i;
    
      for (i = 0; i < 1000; i++) s1.a[i] = i;
      s2 = s1;
    
      a(s1);
      for (i = 0; i < 1000; i++) printf("%4d %4d\n", s1.a[i], s2.a[i]);
     
      exit(0);
    }
    

    UNIX> ./id5b | tail
     990  990
     991  991
     992  992
     993  993
     994  994
     995  995
     996  996
     997  997
     998  998
     999  999
    UNIX> 
    

    A final note about C++ structs. They are basically stripped down classes -- you can put methods in them, and then implement the methods using the struct's variables, as in id6.cpp:

    #include <cstdio>
    #include <iostream>
    using namespace std;
    
    struct intdouble {
      int i;
      double d;
      void Print();
    };
    
    void intdouble::Print()
    {
      printf("   %d %.2lf\n", i, d);
    }
    
    int main()
    {
      intdouble id1, id2;
    
      id1.i = 5;
      id1.d = 3.14;
    
      id2 = id1;
      id2.i += 5;
      id2.d += 5;
    
      id1.Print();
      id2.Print();
    }
    

    Whoever designed structs in C++ did the world a disservice, because you have the same constructs in two fairly similar languages that have completely different semantics. Which means that tons of people are going to be confused going from C++ to C, or even C to C++ when their struct semantics are off. It is for this reason that I don't teach structs as a C++ construct in either CS140 or CS302. Just use a class.

    (When your job interviewer asks you why you are using classes rather than structs in C++, you should respond as follows: "Structs in C++ are totally different constructs than they are in C. However, they have enough similarity that one can easily get confused when writing code in both languages. For that reason, I choose to use structs in C, but not in C++. In C++, I use a class. The compiler should be smart enough to make the class code as efficient as struct code.")