"C Stuff 1:" Getting Started with C, Scalar Types and Aggregate Types


As a caveat, this lecture and the next used to be bundled into one. I have unbundled them to reflect more closely what I'm doing in lecture.

Moving from C++ to C

This class is taught in C, rather than C++. The reasoning is as follows: Because C hides so much less from you than C++, you have a much easier time figuring out what's going on when you run one or more programs. This will be a little painful, because you lose so many of the wonderful things about C++ on which you have grown to rely, like cin, strings, objects with methods, and the standard template library. Sorry.

These lecture notes detail the parts of C++ that you lose when you migrate to C, and how you replace them.

You have to use gcc to compile programs in this class. You cannot use g++. Don't give the TA's C++ code and say you didn't know. You know.

Time to learn C.


Header files

As with C++, you include standard header files with #include. You include the file name in less-than/greater-than signs, and you include the .h extension. Instead of starting your programs with:

#include <iostream>
using namespace std;

you start them with:

#include <stdio.h>
#include <stdlib.h>

I never liked that "using namespace std" stuff anyway.


Comments

Comments in C are delimited by "/*" and "*/". The former starts the comment, which can span multiple lines, and the latter ends the comment. (C++ style commenting has been added to the C standard, so you can use it, but I don't -- you never know when you're going to be running on that 1979 VM....)

Bye-bye, cin and cout

Frankly, this isn't too painful, and will be less so when you learn the fields library. I'm assuming that you already know printf() from previous classes. That handles output. For input, we'll focus on three procedures that are defined in stdio.h: scanf(), fscanf() and fgets(). However, we're going to focus on them in a later lecture. For now, we'll be writing some non-interactive code.

Types in C

In C, there are three kinds of types that variables can have -- scalars, aggregates, and pointers. Half of the game in getting things right in C is keeping yourself from being confused about types.

Scalar Types

There are 7 scalar types in C:
  • char -- 1 byte
  • short -- 2 bytes
  • int -- 4 bytes
  • long -- 4 or 8 bytes, depending on the system and compiler
  • float -- 4 bytes
  • double -- 8 bytes
  • (pointer -- 4 or 8 bytes, depending on the system and compiler)

    These should all be familiar to you (ok, maybe not a short, but the rest should). If you want to verify or use the size of a type in C, you use the macro sizeof(). For example, sizeof(long) will return either 4 or 8, depending on how big a long is in your system.

    You can declare a scalar variable in one of three places: As a global variable, as a procedure parameter, and as a local variable. For example, look at the program in src/p1.c (I've reproduced it below).

    (This is not a bad time to clone the lecture note repo, so that you can do all of these actions on your own. If you've done so, go ahead and type "make" to compile the lecture's programs:)

    UNIX> make
    gcc -o bin/id1 src/id1.c
    gcc -o bin/id2 src/id2.c
    c++ -o bin/id3 src/id3.cpp
    gcc -o bin/id5 src/id5.c
    gcc -o bin/id5a src/id5a.c
    gcc -o bin/id5b src/id5b.c
    c++ -o bin/id6 src/id6.cpp
    gcc -o bin/p1 src/p1.c
    gcc -o bin/p2a src/p2a.c
    UNIX> 
    
    Back to src/p1.c:

    /* A program to demonstrate three places that you
       can declare a scalar variable: as a global (i),
       as a local (j) or as a procedure parameter (argc). */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    int i;
    
    int main(int argc, char **argv)
    {
      int j;
    
      /* Copy argc to j to i and print i */
    
      j = argc;
      i = j;  
      printf("Argc:          %d\n", i);
    
      /* Print the size of a long. */
    
      j = sizeof(long);
      printf("Sizeof(long):  %d\n", j);
    
      /* Print the size of a pointer. */
    
      j = sizeof(int *);
      printf("Sizeof(int *): %d\n", j);
    
      return 0;
    }
    

    There are three scalar int variables here -- i, j, and argc. I is a global variable. J is a local variable, and argc is a parameter. Scalars are pretty straightforward. You can pass them as parameters to procedures, and return them from procedures without worrying about anything going awry. The program prints argc (which it has copied to the variables j and i), and then the size of longs and pointers. Here are two runs on my Mac (in 2021). As you can see, longs and pointers are 8 bytes:

    UNIX> bin/p1
    Argc:          1
    Sizeof(long):  8
    Sizeof(int *): 8
    UNIX> bin/p1 using many arguments
    Argc:          4
    Sizeof(long):  8
    Sizeof(int *): 8
    UNIX> 
    
    Some machines allow you to compile in 32-bit mode, which forces pointers and longs to be four bytes. On my macintosh, that's the "-m32" option to gcc:
    UNIX> gcc -m32 -o bin/p1-32 src/p1.c
    UNIX> bin/p1-32
    Argc:          1
    Sizeof(long):  4
    Sizeof(int *): 4
    UNIX> 
    
    Some machines don't have 8-byte longs or pointers. One example is the Raspberry Pi. When I pulled the repo and compiled on my Pi, I got the following:
    pi@raspberrypi:~/CS360/cs360-lecture-notes/Cstuff-1$ bin/p1
    Argc:          1
    Sizeof(long):  4
    Sizeof(int *): 4
    pi@raspberrypi:~/CS360/cs360-lecture-notes/Cstuff-1$ 
    
    The Pi's compiler doesn't have the "-m32" option. Such is life.

    Aggregate Types

    Arrays and structs are aggregate types in C. They are more complex than scalars. You can statically declare an array as a global or local variable -- I do both below in src/p2.c:

    #include <stdio.h>
    #include <stdlib.h>
    
    char s1[15];
    
    int main(int argc, char **argv)
    {
      char s2[4];
    
    ...
    

    S1 is a global array of 15 chars and s2 is a local array of 4 chars.

    If an array has been statically declared, then you cannot assign it to another array. Let's look at all of src/p2.c:

    /* This program statically declares to arrays of characters:
        - A global variable s1, with 15 characters.
        - A local variable s2, with 4 characters.
        - It then tries to set s2 to "Jim", which will fail, because
          you can't copy arrays in C like you can in C++. */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    char s1[15];
    
    int main(int argc, char **argv)
    {
      char s2[4];
      
      s2 = "Jim";         // This line will not compile.
      return 0;
    }
    

    The statement ``s2 = "Jim"'' is illegal in C, because s2 has been statically declared. If you try to compile this program, gcc will give you an error:

    UNIX> gcc -o bin/p2 src/p2.c
    src/p2.c:16:6: error: array type 'char [4]' is not assignable
      s2 = "Jim";         // This line will not compile.
      ~~ ^
    1 error generated.
    UNIX> 
    
    This is a good rule to bear in mind -- if x is an array, then you should NEVER say ``x = something''. It will usually give you an error. However, if it doesn't (because they change the compiler yet again), you are writing bad C code.

    However, you can always say ``something = x''. We'll discuss later in the lecture.

    When you pass arrays as parameters, the pointers are passed, and not the arrays. This is unlike a vector in C++, which copies the vectors. Here's an example (in src/p2a.c):

    #include <stdio.h>
    #include <stdlib.h>
    
    /* This sets all lower-case letters in a to upper case. */
    
    void change_case(char a[20])
    {
      int i;
    
      for (i = 0; a[i] != '\0'; i++) {
        if (a[i] >= 'a' && a[i] <= 'z') a[i] += ('A' - 'a');
      }
    }
    
    /* This initializes a 19-character string of lower-case letters, and then calls change_case(). */
    
    int main()
    {
      int i;
      char s[20];
      
      /* Set s to "abcdefghijklmnopqrs". */
    
      for (i = 0; i < 19; i++) s[i] = 'a' + i;
      s[19] = '\0';
    
      /* Print, call change_case() and print again. */
    
      printf("First, S is %s.\n", s);
      change_case(s);
      printf("Now, S is   %s.\n", s);
    
      return 0;
    }
    

    You'll note that even though you declare a to be an array of 20 chars, it is simply the pointer that gets passed to the procedure. For that reason, change_case() operates on the array, and not on a copy:

    UNIX> bin/p2a
    First, S is abcdefghijklmnopqrs.
    Now, S is   ABCDEFGHIJKLMNOPQRS.
    UNIX>
    
    In fact, you can't trust that the compiler cares about the fact you put the "20" inside the declaration of change_case. The following are in fact all equivalent:
    void change_case(char a[20])
    void change_case(char a[19])
    void change_case(char a[5000])
    void change_case(char a[])
    void change_case(char *a)
    

    Structs

    The second way to aggregate data is with a struct. A struct looks a little like a C++ class with some notable omissions: Suppose we want to aggregate an int and a double. We can do that as in src/id1.c:

    /* A very simple program to show a struct 
       that aggregates an integer and a double. */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    struct intdouble {
      int i;
      double d;
    };
    
    int main()
    {
      struct intdouble id1;
    
      id1.i = 5;
      id1.d = 3.14;
    
      printf("%d %.2lf\n", id1.i, id1.d);
      return 0;
    }
    

    UNIX> bin/id1
    5 3.14
    UNIX> 
    
    You can use a typedef to make it a little more readable (src/id2.c):

    /* This program is identical to src/id1.c,
       except it uses a typedef so that you can
       assign a type to the struct. */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct intdouble {
      int i;
      double d;
    } ID;
    
    int main()
    {
      ID id1;
    
      id1.i = 5;
      id1.d = 3.14;
    
      printf("%d %.2lf\n", id1.i, id1.d);
      return 0;
    }
    

    You may have some confusion with structs, because they exist in C++ with different semantics. Let's take a look at the following code in C++, which declares two intdouble's, sets one's variables, and then copies one to the other (src/id3.cpp):

    /* This is a C++ program, which shows how you can copy one struct to another.
    
    #include <cstdio>
    #include <iostream>
    using namespace std;
    
    struct intdouble {
      int i;
      double d;
    };
    
    int main()
    {
      intdouble id1, id2;
    
      id1.i = 5;          /* Set id1 to 5 and 3.14 as before. */
      id1.d = 3.14;
    
      id2 = id1;          /* This makes a copy of id and then adds 5 to each field. */
      id2.i += 5;
      id2.d += 5;
    
      printf("1: %d %.2lf\n", id1.i, id1.d);   /* Print them out. */
      printf("2: %d %.2lf\n", id2.i, id2.d);
    
      return 0;
    }
    

    Straightforward when it runs:

    UNIX> bin/id3
    1: 5 3.14
    2: 10 8.14
    UNIX> 
    
    Let's change this to C. If we simply fix the headers (src/id4.c), it will not compile. That is because C++ creates "intdouble" as a type, and C does not. When we try to compile, it fails:
    UNIX> gcc -o bin/id4 src/id4.c
    src/id4.c:14:3: error: must use 'struct' tag to refer to type 'intdouble'
      intdouble id1, id2;       // This line won't compile.
      ^
      struct 
    1 error generated.
    UNIX> 
    
    Now, if you put "struct" in front of "intdouble," that will fix the problem. It is in src/id5.c:

    /* Copying src/id3.cpp to src/id5.c, and fixing the use of intdouble
       so that it compiles.  It works as in C++, copying the struct, but you
       should be wary of it.  */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    struct intdouble {
      int i;
      double d;
    };
    
    int main()
    {
      struct intdouble id1, id2;
    
      id1.i = 5;
      id1.d = 3.14;
    
      id2 = id1;  /* THIS IS THE OFFENDING LINE */
      id2.i += 5;
      id2.d += 5;
    
      printf("1: %d %.2lf\n", id1.i, id1.d);
      printf("2: %d %.2lf\n", id2.i, id2.d);
      return 0;
    }
    

    This runs identically to the C++ version. I can tell you that I disapprove of this code. Why? Because the statement ``id2 = id1'' offends me. It is the only part of C where you can copy an unspecified number of bytes with an assignment statement. It is a weakness of the language. For example, take a look at src/id5a.c:

    /* While C doesn't let you copy arrays, it lets you copy a struct that holds
       an array.  I don't think this really makes sense, but there is it.  In this
       code, we copy 4000 bytes in a single statement. */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct {
      int a[1000];
    } SID;
    
    int main()
    {
      SID s1, s2;
      int i;
    
      for (i = 0; i < 1000; i++) s1.a[i] = i;       /* Set s1. */
    
      s2 = s1;       /* This statement copies 4000 bytes. */
    
      for (i = 0; i < 1000; i++) printf("%4d %4d\n", s1.a[i], s2.a[i]);  /* Print s1 and s2. */
     
      return 0;
    }
    

    The ``s2 = s1'' line copies 4000 bytes. While I expect such garbage in C++, I am surprised that it's legal in C. Why does it allow you to copy the struct, but not to copy the array? Who knows -- anyway, I want you to be aware of it. You will never see me use that feature of the language because I don't approve. As a corollary, you can pass a struct as an argument to a procedure, and in C++ fashion, it makes a copy of the entire thing. In src/id5b.c, I define a procedure a(), which changes the last element of the array, and when we run it, s1.a[999] is unchanged. Make a mental note of it.

    #include <stdio.h>
    #include <stdlib.h>
    
    typedef struct {
      int a[1000];
    } SID;
    
    void a(SID s)    /* Although this procedure changes element 999 of s, */
    {                /* s is a copy of the calling parameter, so it is    */
      s.a[999] = -1; /* deleted at the end of the procedure.              */
    }                /* In other words, the procedure does nothing.       */
    
    
    int main()
    {
      SID s1;
      int i;
    
      for (i = 0; i < 1000; i++) s1.a[i] = i;    /* Set the elements of s1. */
    
      a(s1);                /* This does nothing, because it modifies a copy of s1 */
    
      printf("Element 999: %d\n", s1.a[999]);
     
      return 0;
    }
    

    UNIX> bin/id5b
    Element 999: 999
    UNIX> 
    

    A final note about C++ structs. They are basically stripped down classes -- you can put methods in them, and then implement the methods using the struct's variables, as in src/id6.cpp:

    /* Unlike C structs, you can put methods in C++ structs. */
    #include <cstdio>
    #include <iostream>
    using namespace std;
    
    struct intdouble {
      int i;
      double d;
      void Print();
    };
    
    void intdouble::Print()
    {
      printf("   %d %.2lf\n", i, d);
    }
    
    int main()
    {
      intdouble id1, id2;
    
      id1.i = 5;
      id1.d = 3.14;
    
      id2 = id1;
      id2.i += 5;
      id2.d += 5;
    
      id1.Print();
      id2.Print();
      return 0;
    }
    

    Whoever designed structs in C++ did the world a disservice, because you have the same constructs in two fairly similar languages that have completely different semantics. Which means that tons of people are going to be confused going from C++ to C, or even C to C++ when their struct semantics are off. It is for this reason that I don't teach structs as a C++ construct in either CS202 or CS302. Just use a class.

    (When your job interviewer asks you why you are using classes rather than structs in C++, you should respond as follows: "Structs in C++ are totally different constructs than they are in C. However, they have enough similarity that one can easily get confused when writing code in both languages. For that reason, I choose to use structs in C, but not in C++. In C++, I use a class. The compiler should be smart enough to make the class code as efficient as struct code.")