CS140 Lecture notes -- Migration from C++ to C


We're going with the assumption that you learned C++ in your introductory course (ECE 206 or CS102). In this course, we will use the language C, which is a predecessor to C++. C is a subset of C++, and unlike C++, is not object-oriented. Programs written in C tend to be more efficient than those written in C++, but they take longer to write. Hence you tend to use C when run-time efficiency is an important issue, such as in an operating system and most "system" software, such as file system software and networking software. You tend to use C++ when programmer efficiency is more important, such as in most applications written on top of the system software, such as spreadsheet or database applications.

C is more efficient because many of its commands and data structures map more closely to the hardware's primitive commands and data structures. This allows you to custom tailor programs to take advantage of the hardware. Think of C as being akin to a stick shift automobile--you can get more performance out of the car by manually shifting at the exact right time, but you have to work harder to get this performance. C++ is more programmer efficient because it provides higher-level commands and data structures, such as objects, that package together some of the lower-level commands or data structures. You can now accomplish more with one statement but you cannot tailor your solution as closely to the underlying hardware, hence you sacrifice performance. Think of C++ as being akin to an automatic transmission automobile--it is easier to use than a stick shift, but it does not always shift at the optimal times, thus giving you decreased performance. Actually since C is a subset of C++, the better analogy is that C++ is like a car with automatic transmission that allows you to switch to manual transmission when you think it might be more efficient to do so.

In this course you are going to learn about several of the language constructs that C provides which are lower-level, but in many cases more efficient, constructs than the alternative ones provided by C++.


G++ vs. GCC

To compile a C++ program, you use the g++ compiler. For C, you use the gcc compiler. You will find that you can compile C programs generally with g++. However in this class, you will use gcc. You will not get credit for programs that compile and run correctly with g++ but not with gcc. Sorry.

The American National Standards Institute (ANSI) has established a standard specification for C to which compilers are expected to adhere. The standard is periodically updated, with the last update occurring in 1999. C that conforms to this standard is commonly called C99, to distinguish it from earlier C standards. In this class we will we teaching C99. However, you should be aware that most compilers still do not support all the features of C99, even the gcc compiler. Hence the course notes will often indicate what features were added by C99, so that you will not be mystified if you receive a compiler error for a C99 feature. If you receive a compiler error, it means that the compiler you are using still does not support that feature. If you are trying to write code that you want to be portable, then you should not use some of the more obscure C99 features, since it is likely that one or more compilers will not support it. However, the major features should be supported by all commercial-worthy compilers.


Comments

C++ has two types of comments, line-based comments that start with "//" and are terminated by the end of the line, and paragraph-based comments that start with "/*" and are terminated with "*/". As you might expect, line-based comments only include one line while paragraph-based comments span several lines. C99 supports both types of comments but prior to C99, C provided only paragraph-based comments. C++ added the line-based comments to save the programmer the trouble of having to add a "*/" at the end of a line-based comment. Line-based comments proved so popular that they were added to the 1999 C standard.

Here is a simple example of a line-based comment and a paragraph-based comment.

/* Here is a typical paragraph-based comment. Note how it 
   spans multiple lines. */
int main()
{
  // Here is a typical line-based comment. Note that it spans only one line
  return(0);
}

Header Files

In C++, all of our programs started with some header stuff, such as:

#include <iostream>
using namespace std;

In C, our include files contain a .h suffix. C also does not support namespaces, so you will never see a namespace statement in C header code:

#include <stdio.h>
#include <stdlib.h>

The most common C++ header file you used in CS102 was iostream, for handling input and output. The comparable C header file is stdio.h. C also contains a number of commonly used functions in a header file named stdlib.h. Get used to including both stdio.h and stdlib.h in all your C programs.


Boolean Variables

To use boolean variables in a C program, include stdbool.h at the top of your program and declare your variables as type bool. The constants are true and false.


Character Strings

C++ makes string handling very convenient by providing the string class. Using the string class you can easily assign strings to variables using the assignment operator (=), compare strings using the relational operators (e.g., <, ==), and determine their length using either the size or length methods. C++ eliminates the need for the programmer to worry about memory management for strings. In particular, it hides from the programmer how a string is actually stored and it automatically allocates the right amount of memory to store a string.

C String Storage and Representation

C requires you to explicitly allocate memory for strings and store them in character arrays (C++ allocates memory for strings and stores them in the same way--it's just that the string class hides all the messy details from you).
  1. Example declarations:
    char name[10]; 
    char name[] = "brad";
    char name[10] = "brad";  
    

  2. The NULL character, '\0': C does not magically know where your string ends so it terminates strings with a special null character, '\0'. For example, "brad\0".

    1. When you create a string you should not terminate the string with a null character. C will do that for you automatically. Hence you would type "brad" and C would automatically append a '\0' character to create the string "brad\0".
    2. C needs the null character so that it can recognize when you want the string to end. Note that we use quotes ("") to delimit a string. However, you may want to be able to use a quote in a string and hence C needs to have another way of recognizing the end of a string

  3. Always declare your string arrays to be one character longer than the string you want to store in it, since C needs room to append the '\0' character. For example:
    char name[5] = "brad";
    
    A common convention when initializing an array to a string is to omit the array size. C will automatically allocate an array of the appropriate size. Thus the declaration:
    char name[] = "brad";
    
  4. If you want to include a " in a string, place a \ in front of it.

    1. Example: "The band played \"Hail to the Chief\""
    2. The \ character is called an escape character that nullifies the usual meaning of the character. Hence \" says to treat the " as a normal character rather than as a string delimiter.
    3. Use \\ to get a backslash into a string.

  5. You must include string.h in any program that manipulates C-style strings

String assignment

  1. C++ allows you to use the assignment operator, =, to assign one string to another.
  2. C does not allow the contents of one array to be assigned to another using the assignment operator so we must use strcpy instead (string_assign.c):
    char name[10] = "brad";
    char student[10];
    strcpy(student, name);  /* copy name to student */
         
  3. general syntax: strcpy(dest, source)
  4. The destination must be as big as or larger than source. Otherwise the source string could overflow the destination array.

Comparing strings

  1. In C++ you could use the relational operators, such as < and ==, to compare strings.
  2. You cannot use relational operators to compare arrays, hence in C you must use strcmp to compare two strings (string_compare.c)

    int strcmp(string1, string2): compares two strings and returns

    1. negative integer if string1 < string2
    2. 0 if string1 = string2
    3. positive integer if string1 > string2

Finding the length of a string

  1. In C++, the size and length methods return the size of a string.

  2. In C, you call strlen to return the length of a string. (string_length.c).

    1. strlen counts all characters up to the null character.
    2. It includes newline characters ('\n') in the count.

String concatenation

  1. C++ allows you to concatenate two strings using the '+' operator
  2. C cannot concatenate arrays using the '+' operator so you must use strcat

    1. strcat(string1, string2): concatenates string2 to the end of string1.
    2. string1 must have enough memory to store the concatenated string or else memory will get clobbered.

Printing Output: putchar() and printf()

One of the reasons that C++ is an attractive beginning language is that it makes the task of input and output relatively easy. With C, it's a little harder. In particular, you have none of the following to use:

In exchange for the increased complexity, C will perform I/O more quickly--2.5 times faster on the hydra machines and 10 times faster on my Mac (as of July, 2009).

In this section we will consider two functions that allow you to output data in C: putchar() and printf().


putchar()

First, take a look at putchar(). This is a procedure which takes a char as an argument and prints it on standard output. So, take a look at the putchar() version of "Hello World:" phw.c.

#include <stdio.h>
#include <stdlib.h>

int main()
{
  putchar('H');
  putchar('e');
  putchar('l');
  putchar('l');
  putchar('o');
  putchar(' ');
  putchar('W');
  putchar('o');
  putchar('r');
  putchar('l');
  putchar('d');
  putchar('!');
  putchar('\n');

  return 0;
}

It works, but it's yucky:

UNIX> gcc -o phw phw.c
UNIX> ./phw
Hello World!
UNIX> 

Here is a second program that uses putchar() to print a character string:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  char hw[] = "Hello World!\n";
  int i;

  for (i = 0; hw[i] != '\0'; i++) putchar(hw[i]);
  return 0;
}

As you can see, putchar() is a cumbersome way to print strings, since you must do it a character at a time. Additionally, it does not provide a way to print numbers. You'll find putchar() to be useful in a variety of situations, although probably not in CS140. I find that I use it primarily in systems programming, which you will encounter in CS360.


printf()

Most of the time when you print output you will want to use printf(). You can think of printf as an acronym for "(print f)ormatted output". As the name suggests, printf() allows you to output data in a formatted manner.

Printf() is an unusual procedure in that it can take a variable number of parameters. The first parameter is called a format string and it tells C how the printed output should appear. The remaining parameters are the variables that you want to print.

In its simplest form, printf() takes only a format string with no formatting characters. In this case it simply prints the contents of the format string. So, for example, the printf() version of "Hello World" (in pfhw.c) is:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  printf("Hello World!\n");
}

UNIX> gcc -o pfhw pfhw.c
UNIX> pfhw
Hello World!
UNIX> 
Note the presence of the newline character '\n' at the end of "Hello World". If you forget to put the newline character at the end of the format string, then you will not get a newline. For example, if I had written
printf("Hello World");

then my output would have looked like:

UNIX> pfhw
Hello World!UNIX> 

Conversion Specifiers

Typically you will want printf() to print the values of some variables, in addition to whatever character strings are being printed by the format string. Unlike C++'s cout operator, C's printf() function is unable to determine the type of the variables you want to print, and instead needs your help. You tell C the types of the variables by placing what are called conversion specifications into the format string. These conversion specifications tell printf what type of variable it is going to print next. C replaces the conversion specification in the format string with the value of the variable. C has to have some way of distinguishing a conversion specification from an ordinary character and so you always start a specification with a percent sign. The common ones are:

Regrettably there is no conversion specification for a bool. You need to either print it as an integer (it will print 0 for false and 1 for true), or use a conditional to print it as a true/false string.

Here is an example of a program that prints the name, age, and sex of an individual (the output is shown below the horizontal line):

#include <stdio.h>
#include <stdlib.h>

int main()
{
  int age = 63;
  char name[] = "brad" ;
  char sex = 'm';

  printf("%s %i %c\n", name, age, sex);
  return 0;
}

brad 63 m

Note, that the printf() statement contains three conversion specifications, and therefore there need to be three arguments following the format string. Also note that the spaces between the conversion specifiers have been copied directly to the output. If I had forgotten to put spaces between the conversion specifiers, then the output would have been scrunched together:

printf("%s%i%c\n", name, age, sex);

brad63m

It might be nice to label the variables that I have output, and I can do so by adding character strings to the format string that act as labels:

printf("name = %s age = %i sex = %c\n", name, age, sex);

name = brad age = 63 sex = m


Format Specifiers

While the printf statements in the previous statement produce nice formatted output for a single line of text, suppose that we wanted to print the age, sex, and names of a group of people, such as the students in this class. We might like to produce a nice looking table, with headers and fixed width columns. We also might like to be able to justify information in the columns. For example, we might want to left-justify the names and right-justify the ages. Finally if we are printing floating point numbers, such as gpa, we might want to control the number of decimal places that get printed. For example, here is how we might like the sample output to appear (note that we can only produce primitive tables with C--we cannot produce nice looking tables as we might with html):

name       age sex  gpa
bradley     45   m 4.00
amy         15   f 3.66
joseph      25   m 2.59

We can accomplish all of these tasks--fixed width columns, justification, limited number of decimal places--with format specifiers that go after the % sign and before the conversion specifier:

Now that we know how to use format specifiers, we can produce the table shown above by using the following program:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  int num_students = 3;
  int age[] = {45, 15, 25};
  char name[][11] = {"bradley", "amy", "joseph"} ;
  char sex[] = {'m', 'f', 'm'};
  float gpa[] = {4, 3.6589, 2.5868};
  int i;
  
  // print the headers
  printf("%-10s %3s %3s %4s\n", "name", "age", "sex", "gpa");

  /* print each student's information, one student per line as follows:
     name: left-justified in a field that is minimally 10 characters wide
     age: right-justified in a field that is minimally 3 characters wide
     sex: right-justified in a field that is minimally 3 characters wide--the
          field is only one character wide but the heading is three characters
          and hence we need our field to be 3 characters. Notice that we 
          could play with the spacing to center the student's sex.
     gpa: right-justified in a field that is minimally 4 characters wide with
          the gpa rounded to two decimal places
  */
  for (i = 0; i < num_students; i++)
    printf("%-10s %3i %3c %4.2f\n", name[i], age[i], sex[i], gpa[i]);

  return 0;
}


Outputting Special Characters: %, ", and \

Because the % sign and the " are special characters (the % sign indicates the start of a conversion specifier and the " will terminate the format string), you need to do something special to get C to print a % sign or a " in the output.

If you want to print a percent sign, simply use two percent signs in the string. Similarly, if you want a double-quote, prepend it with a backslash. For example:

printf("Brad \"The Man\" answered 50%% of the problems correctly\n");

produces the output

Brad "The Man" answered 50% of the problems correctly
Backslashes are a little confusing, because they can either denote a special character, such as a newline (\n) or a tab (\t), or cause a special character to be printed, such as ". Sometimes you may also want a backslash to appear in the output. In that case prepend the backslash with a second backslash (\\). I know it may seem confusing. Just memorize these special cases and you will be okay.

Some Common Printf Mistakes

Take a look at: pfmistake.c. It has examples of five common printf() mistakes:
#include <stdio.h>
#include <stdlib.h>

int main()
{
  int i;
  double d;

  /* Forgetting a parameter */

  printf("1. %d\n");

  /* Trying to print an integer as a double without casting it */

  i = 5;
  printf("2. %lf\n", i);

  /* Trying to print a double as integer without casting it */

  d = 5;
  printf("3. %d\n", d);

  /* Trying to print a string as an integer, even though it's a string of an integer */

  printf("4. %d\n", "5");

  /* Trying to print an integer as a string */

  printf("5. %s\n", i);

  return 0;
}

Here's the program running on my MacBook:

UNIX> gcc -o pfmistake pfmistake.c
UNIX> ./pfmistake
1. -1881117246
2. 0.000000
3. 0
4. 8152
Bus error
UNIX> 
And here's it running on hydra3:
UNIX> gcc -o pfmistake pfmistake.c
UNIX> ./pfmistake
1. 1075338880
2. 0.000000
3. 0
4. 134513994
Segmentation fault
UNIX> 
On the first four mistakes, you're likely to get a random answer, depending on the kind of machine on which you are running. On the last mistake, you are going to get a bus error or segmenetation violation in almost all cases, because you are treating an integer as a pointer.

Simple reading of standard input: scanf()

Scanf() allows you to read formatted input from standard input. It is like printf() in that it takes a format string as its first argument. However, that string is usually composed simply of conversion specifications. In fact, I will recommend that you always use only one conversion specification in each of your scanf() statements. You will never see me do otherwise.

The big difference between printf() and scanf() is that you have to provide a pointer to the argument that you want to be read from standard input. So, for example, the following program reads in an integer. Study it carefully (scex):

#include <stdio.h>
#include <stdlib.h>

int main()
{
  int i;

  printf("Enter an integer: ");
  scanf("%d", &i);
  
  printf("You entered: %d\n", i);
  return 0;
}

Note, we passed &i rather than i. This is because scanf() needs the pointer to "fill in" the value of i. If you put i instead of &i, you'll probably get a segmentation violation, which, believe it or not is a good thing since it will alert you to your problem.

Here is a quick example of the program running:

UNIX> gcc -o scex scex.c
UNIX> ./scex
Enter an integer: 5
You entered: 5
UNIX> 
That's nice. Here are some weirder examples:

UNIX> ./scex
Enter an integer: 55.99
You entered: 55
UNIX> ./scex
Enter an integer: 45Fred
You entered: 45
UNIX> ./scex
Enter an integer: 0000000000000005
You entered: 5
UNIX> ./scex
Enter an integer: Fred
You entered: -1073746852
UNIX> ./scex
Enter an integer: <CNTL-D>
You entered: -1073746852
UNIX> 
If you specify for scanf() to read an integer, it will read the next word and try to convert it to an integer as long as it starts like one. When it hits some characters that don't make sense, it ignores them and returns what it had read so far. This is why it returns 55 and 45 in the first two examples. Obviously, the next example shows that it doesn't care about leading zeroes. The last two exampmles show what happens when you don't give it an integer -- it returns and does not modify i. Since you never set i it has a random value.

So how do you deal with getting bad input or EOF? You use scanf()'s return value. Scanf() returns how many matches it made. If you only ever call it with one conversion specification (as I do), then if it returns 1, you had a match. If it doesn't, you don't. The following program (scex2.c) calls scanf() repeatedly to read integers until it fails:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  int i;
  int n;

  n = 1;
  while (1) {
    if (scanf("%d", &i) != 1) return 0;
    printf("Integer %d. %d\n", n, i);
    n++;
  }
}

When we run it, we see a few things. First, scanf() does not care about line breaks -- it simply keeps reading integers until it gets another, whether there are multiple integers on one line or not. It also does not care about blank lines.

UNIX> ./scex2
44 33 22
Integer 1. 44
Integer 2. 33
Integer 3. 22
-5 
Integer 4. -5



Fred
UNIX> 

Reading Strings with scanf()

When you read a string with scanf, you need to pass an array of characters. Why not a pointer? Because an array is a pointer to the first element of the array, and that's good enough for C. Scanf() will read the next word of standard input into that array, and turn it into a C-style string. For that reason, the array must be big enough to accomodate the inputted string, including the null character.. If not, you're in big trouble, which is why you should only read strings with scanf() if you know that your input is going to be small enough to fit into your array.

The following program (scex3.c) gives an example:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  char s[16];
  int i;

  i = 261303714;

  while (1) {
    if (scanf("%s", s) != 1) return 0;
    printf("s = %s.  i = %d\n", s, i);
  }
}

This program works fine as long as we enter words with 15 characters or fewer. That happens in the first few examples. However, when we give it 1234567890123456, which has 16 characters plus the null character, scanf() writes past the end of the array, and starts overwriting i. That happens in the last example too.

UNIX> gcc -o scex3 scex3.c
UNIX> ./scex3
Jim
s = Jim.  i = 261303714
Jim Plank
s = Jim.  i = 261303714
s = Plank.  i = 261303714
123456789012345
s = 123456789012345.  i = 261303714
1234567890123456
s = 1234567890123456.  i = 261303552
12345678901234578
s = 12345678901234578.  i = 261292088
<CNTL-D>
UNIX> 
Bugs of this sort are disastrous, and if you have ever heard of a buffer overflow attack, this is one of the sources. It is why you should be quite circumspect about using scanf() with strings.

No classes, no new, no delete

Finally, in C, we don't have class's, and we don't have new or delete. There's also no more private/public/protected, string class, constructors or destructors. We'll get to the replacements of each of these in subsequent lectures.