The way that programs talk to the operating system is via ``system calls.'' A system call looks like a procedure call (see below), but it's different -- it is a request to the operating system to perform some activity.
System calls are expensive. While a procedure call can usually be performed in a few machine instructions, a system call requires the computer to save its state, let the operating system take control of the CPU, have the operating system perform some function, have the operating system save its state, and then have the operating system give control of the CPU back to you. This concept is important, and will be seen time and time again in this class.
1. int open(const char *path, int flags [ , int mode ] ); 2. int close(int fd); 3. ssize_t read(int fd, void *buf, size_t count); 4. ssize_t write(int fd, const void *buf, size_t count); 5. off_t lseek(int fd, off_t offset, int whence);You'll note that they look like regular procedure calls. This is how you program with them -- like regular procedure calls. However, you should know that they are different: A system call makes a request to the operating system. A procedure call just jumps to a procedure defined elsewhere in your program. That procedure call may itself make a system call (for example, fopen() calls open()), but it is a different call.
The reason the operating system controls I/O is for safety -- the computer must ensure that if my program has a bug in it, then it doesn't crash the system, and it doesn't mess up other people's programs that may be running at the same time or later. Thus, whenever you do disk or screen or network I/O, you must go through the operating system and use system calls.
These five system calls are defined fully in their man pages (do 'man -s 2 open', 'man -s 2 close', etc). All those irritating types like ssize_t and off_t are ints and longs. They used to all be ints, but as machines and files have grown, so have they.
All actions that you will perform on files will be done through the operating system. Whenever you want to do file I/O directly with the operating system, you specify the file by its file descriptor. Thus, whenever you want to do file I/O on a specific file, you must first open that file to get a file descriptor.
Example: src/o1.c opens the file txt/in1.txt for reading, and prints the value of the file descriptor. If txt/in1.txt doesn't exist, or you don't have permissions to open it, then it will print -1, since the open() call fails. If txt/in1.txt does exist, then it will print 3, meaning that the open() request has been granted (i.e. a non-negative integer was returned).
/* This program opens the file "txt/in1.txt" in the current directory, and prints out the return value of the open() system call. If "txt/in1.txt" exists, open() will return a non-negative integer (three). If "txt/in1.txt" does not exist, then it will return -1. */ #include <fcntl.h> #include <stdio.h> #include <stdlib.h> int main() { int fd; fd = open("txt/in1.txt", O_RDONLY); printf("%d\n", fd); return 0; } |
Note the value of 'flags' -- the man page for open() will give you a description of the flags and how they work. They are also described in fcntl.h, which can be found in the directory /usr/include. (Note that fcntl.h merely includes /usr/include/sys/fcntl.h, so you'll have to look at that file to see what O_RDONLY and all really mean).
Here are a few examples of calling bin/o1. Initially, I have a file called txt/in1.txt in my directory, so the open() call is successful, returning 3. I then rename it to tmp.txt, and now the open() call fails, return -1. I rename it back, and the open() call succeeds again, returning 3:
UNIX> ls -l txt/in1.txt -rw-r--r-- 1 plank staff 22 Jan 31 12:50 txt/in1.txt UNIX> bin/o1 3 # The open call succeeds here. UNIX> mv txt/in1.txt tmp.txt UNIX> bin/o1 -1 # The open call fails here. UNIX> mv tmp.txt txt/in1.txt UNIX> bin/o1 3 # The open call succeeds again. UNIX>Second example: src/o2.c tries to open the file "txt/out1.txt" for writing. That fails because txt/out1.txt does not exist already. Here's the code -- you'll note that it uses perror() to print why the error occurred.
/* This program attempts to open the file "txt/out1.txt" for writing in the current directory. Note that this fails because "txt/out1.txt" does not exist already. See src/o3.c for an example of opening "txt/out1.txt" properly. */ #include <fcntl.h> #include <stdlib.h> #include <stdio.h> int main() { int fd; fd = open("txt/out1.txt", O_WRONLY); if (fd < 0) { perror("txt/out1.txt"); exit(1); } return 0; } |
We run it a few times below -- see the inline comments for commentary on what's happening.
UNIX> ls -l txt # As you can see, there's no txt/out1.txt total 8 -rw-r--r-- 1 plank staff 22 Jan 30 2018 in1.txt -rw-r--r-- 1 plank staff 0 Jan 30 2018 out2.txt UNIX> bin/o2 # Accordingly, then open() call fails. txt/out1.txt: No such file or directory UNIX> echo Hi > txt/out1.txt # I create txt/out1.txt UNIX> bin/o2 # And now the open() call succeeds UNIX> cat txt/out1.txt # The program did not change the file. Hi UNIX> chmod 0400 txt/out1.txt # Here I change the permissions so that I can't open for writing. UNIX> bin/o2 # And the open() call fails. txt/out1.txt: Permission denied UNIX> chmod 0644 txt/out1.txt UNIX> rm txt/out1.txt # I remove the file UNIX> bin/o2 # And the open() call fails again. txt/out1.txt: No such file or directory UNIX>In order to open a new file for writing, you should open it with (O_WRONLY | O_CREAT | O_TRUNC) as the flags argument. The binary-or is how you aggregate these arguments (they are each integers with a different bit set, so the binary-or combines them all).
/* This program opens the file "out2.txt" for writing in the current directory. It uses O_CREAT to create the file if it does not exist already, and O_TRUNC to truncate the file to zero bytes if it does exist. */ #include <fcntl.h> #include <stdio.h> #include <stdlib.h> int main() { int fd; fd = open("txt/out2.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644); if (fd < 0) { perror("txt/out2.txt"); exit(1); } return 0; } |
Below, I run bin/o3 in various situations -- you can see that if the file doesn't exist, it creates it. If the file does exist, then it truncates it:
UNIX> ls -l txt/out2.txt # txt/out2.txt has zero bytes and was last changed in 2018 -rw-r--r-- 1 plank staff 0 Jan 30 2018 txt/out2.txt UNIX> bin/o3 UNIX> ls -l txt/out2.txt # It still has zero bytes, but the modification time has updated. -rw-r--r-- 1 plank staff 0 Feb 3 14:56 txt/out2.txt UNIX> rm txt/out2.txt UNIX> bin/o3 UNIX> ls -l txt/out2.txt # Now it created the file anew. -rw-r--r-- 1 plank staff 0 Feb 3 14:57 txt/out2.txt UNIX> echo "Hi" > txt/out2.txt UNIX> ls -l txt/out2.txt # The echo command has put "Hi" and a newline into the file. -rw-r--r-- 1 plank staff 3 Feb 3 14:57 txt/out2.txt UNIX> bin/o3 UNIX> ls -l txt/out2.txt # bin/o3 has truncated the file. -rw-r--r-- 1 plank staff 0 Feb 3 14:57 txt/out2.txt UNIX> echo "Hi Again" > txt/out2.txt UNIX> chmod 0400 txt/out2.txt UNIX> ls -l txt/out2.txt # I have put 9 bytes into the file using echo, but the permission is read-only. -r-------- 1 plank staff 9 Feb 3 14:57 txt/out2.txt UNIX> bin/o3 # As such, bin/o3 fails to open the file. txt/out2.txt: Permission denied UNIX> ls -l txt/out2.txt # And the file is unchanged. -r-------- 1 plank staff 9 Feb 3 14:57 txt/out2.txt UNIX> chmod 0644 txt/out2.txt UNIX> bin/o3 UNIX> ls -l txt/out2.txt # When I change the permissions back to R/W, bin/o3 truncates the file again. -rw-r--r-- 1 plank staff 0 Feb 3 14:58 txt/out2.txt UNIX>Finally, the 'mode' argument should only be used if you are creating a new file. It specifies the protection mode of the new file. 0644 is the most typical value -- it says "I can read and write it; everyone else can only read it." The zero in 0644 says to interpret the number in octal. (Later, when you learn about the umask, you'll use a different mode, but for now, we'll use 0644).
You can open the same file more than once. You will get a different fd each time. If you open the same file for writing more than once at a time, you may get bizarre results.
/* This program opens and closes the file "txt/in1.txt" in a variety of ways. Make sure you understand this program, especially the return values of the open calls. */ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> int main() { int fd1, fd2; /* First open txt/in1.txt twice for reading. Print out the file descriptors. */ fd1 = open("txt/in1.txt", O_RDONLY); if (fd1 < 0) { perror("c1"); exit(1); } fd2 = open("txt/in1.txt", O_RDONLY); if (fd2 < 0) { perror("c1"); exit(1); } printf("Opened the file txt/in1.txt twice: Fd's are %d and %d.\n", fd1, fd2); /* Close the file descriptors. */ if (close(fd1) < 0) { perror("c1"); exit(1); } if (close(fd2) < 0) { perror("c1"); exit(1); } printf("Closed both fd's.\n"); /* Open txt/in1.txt again, to see that it will reuse the first file descriptor. */ fd2 = open("txt/in1.txt", O_RDONLY); if (fd2 < 0) { perror("c1"); exit(1); } printf("Reopened txt/in1.txt into fd2: %d.\n", fd2); /* Close the file descriptor twice. The second causes an error, which usually goes unnoticed, because programmers rarely look at the return value of close(). */ if (close(fd2) < 0) { perror("c1"); exit(1); } printf("Closed fd2. Now, calling close(fd2) again.\n"); printf("This should cause an error.\n\n"); if (close(fd2) < 0) { perror("c1"); exit(1); } return 0; } |
UNIX> bin/c1 Opened the file txt/in1.txt twice: Fd's are 3 and 4. Closed both fd's. Reopened txt/in1.txt into fd2: 3. Closed fd2. Now, calling close(fd2) again. This should cause an error. c1: Bad file descriptor UNIX>
/* This program shows some simple examples of using the system call read() to read from a file. */ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> int main() { char *c; int fd, sz; /* Allocate 100 bytes, and then open txt/in1.txt: */ c = (char *) malloc(100 * sizeof(char)); fd = open("txt/in1.txt", O_RDONLY); if (fd < 0) { perror("r1"); exit(1); } /* Read ten bytes from the file. Print the return value, add the NULL character, and print the bytes as a string. */ sz = read(fd, c, 10); printf("called read(%d, c, 10). returned that %d bytes were read.\n", fd, sz); c[sz] = '\0'; printf("Those bytes are as follows: %s\n", c); /* Now, read 99 bytes and do the same thing. You'll note that since there were only 12 more bytes in the file, that read() returns 12. Also, you'll note that read() does not NULL terminate anything. It simply reads the bytes. So you need to NULL terminate before printing. */ sz = read(fd, c, 99); printf("called read(%d, c, 99). returned that %d bytes were read.\n", fd, sz); c[sz] = '\0'; printf("Those bytes are as follows: %s\n", c); /* As with freeing memory, this is unnecessary, since we are exiting. The operating system will make sure that open files are closed when the process exits. */ close(fd); return 1; } |
When executed, you get the following:
UNIX> bin/r1 called read(3, c, 10). returned that 10 bytes were read. Those bytes are as follows: Jim Plank called read(3, c, 99). returned that 12 bytes were read. Those bytes are as follows: Claxton 221 UNIX>There are a few things to note about this program. First, buf should point to valid memory. In src/r1.c, this is achieved by malloc()-ing space for c Alternatively, I could have declared c to be a static array with 100 characters:
char c[100];Second, I null terminate c after the read() calls to ensure that printf() will understand it. This is important -- in text files, there are no NULL characters. When read() reads them, it does not NULL terminate. If you are going to use the characters as strings in C, you'll need to NULL terminate them yourself.
Third, when read() returns 0, then the end of file has been reached. When you are reading from a file, if read() returns fewer bytes than you requested, then you have reached the end of the file as well. This is what happens in the second read() in src/r1.c.
Fourth, note that the 10th character in the first read() call and the 12th character in the second are both newline characters. That is why you get two newlines in the printf() statement. One is in c, and the other is in the printf() statement.
To reiterate, the read call does not read a NULL character. It simply reads bytes from the file, and the file does not contain any NULL characters. This is why you have to put the NULL character explicitly into your string. Let's take a look at a similar program, which doesn't NULL terminate (src/r2.c):
/* Showing what happens when you don't NULL terminate. */ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> #include <string.h> int main() { char c[100]; int fd; strcpy(c, "ABCDEFGHIJKLMNOPQRSTUVWXYZ"); fd = open("txt/in1.txt", O_RDONLY); if (fd < 0) { perror("r1"); exit(1); } read(fd, c, 10); /* I read 10 bytes, but I don't null terminate. */ printf("%s\n", c); /* So this printf() will print the characters from K to Z. */ read(fd, c, 99); /* This reads 12 bytes, so it prints M to Z. */ printf("%s\n", c); return 0; } |
Because I didn't NULL terminate after reading, printf() prints every character of c until it encounters the NULL character after 'Z'. That's why you get the stray uppercase letters at the end of each printf() statement:
UNIX> bin/r2 Jim Plank KLMNOPQRSTUVWXYZ Claxton 221 MNOPQRSTUVWXYZ UNIX>
src/w1.c writes the string "cs360\n" to the file out3.txt.
/* This program opens the file "out3.txt" in the current directory for writing, and writes the string "cs360\n" to it. */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <string.h> #include <stdlib.h> int main() { int fd, sz; fd = open("txt/out3.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644); if (fd < 0) { perror("txt/out3.txt"); exit(1); } sz = write(fd, "cs360\n", strlen("cs360\n")); printf("called write(%d, \"cs360\\n\", %ld). it returned %d\n", fd, strlen("cs360\n"), sz); close(fd); return 0; } |
UNIX> bin/w1 called write(3, "cs360\n", 6). it returned 6 UNIX> cat txt/out3.txt cs360 UNIX>You should think about different combinations of O_CREAT and O_TRUNC, and their effect on write. In particular, take a look at src/w2.c. This lets you specify the combination of O_WRONLY, O_CREAT and O_TRUNC that you use in your open() call:
/* This program opens the file "txt/out3.txt" in the current directory for writing, allows you to specify the combination of O_CREAT and O_TRUNC, plus what you write to the file. */ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <string.h> #include <stdlib.h> int main(int argc, char **argv) { int fd, sz, flags, len; if (argc != 3) { fprintf(stderr, "usage: w2 w|wc|wt|wct input-word\n"); exit(1); } /* Figure out what the "flags" argument will be to the open() call. */ if (strcmp(argv[1], "w") == 0) { flags = O_WRONLY; } else if (strcmp(argv[1], "wc") == 0) { flags = (O_WRONLY | O_CREAT); } else if (strcmp(argv[1], "wt") == 0) { flags = (O_WRONLY | O_TRUNC); } else if (strcmp(argv[1], "wct") == 0) { flags = (O_WRONLY | O_CREAT | O_TRUNC); } else { fprintf(stderr, "Bad first argument. Must be one of w, wc, wt, wct.\n"); exit(1); } /* Open the file with the given flags. */ fd = open("txt/out3.txt", flags, 0644); if (fd < 0) { perror("open"); exit(1); } len = strlen(argv[2]); sz = write(fd, argv[2], len); /* Write the input word to the file. */ printf("called write(%d, \"%s\", %d). It returned %d\n", fd, argv[2], len, sz); close(fd); return 0; } |
Take a look at all of the following executions of the program. You should be able to explain them all. You should also notice that there is no newline in the write call, which is why the resulting file has no newline in it. There is also no NULL character being written to the file, because you are writing strlen() bytes, which does not include the NULL character:
UNIX> bin/w2 usage: w2 w|wc|wt|wct input-word UNIX> rm -f txt/out3.txt # Make sure there's no txt/out3.txt UNIX> ls -l txt/out* -rw-r--r-- 1 plank staff 0 Feb 3 14:58 txt/out2.txt UNIX> bin/w2 w Hi # The open() fails because the file doesn't exist, and we didn't specify O_CREAT txt/out3.txt: No such file or directory UNIX> ls txt/out* txt/out2.txt UNIX> bin/w2 wc ABCDEFG # Because of O_CREAT, the file is created. called write(3, "ABCDEFG", 7). It returned 7 UNIX> ls -l txt/out*.txt -rw-r--r-- 1 plank staff 0 Feb 3 14:58 txt/out2.txt -rw-r--r-- 1 plank staff 7 Feb 4 17:14 txt/out3.txt # It's 7 bytes because of the write(). UNIX> cat txt/out3.txt ABCDEFGUNIX> # We didn't write a newline, so it doesn't print one. UNIX> bin/w2 w XYZ # I type ENTER to get the prompt looking nice, called write(3, "XYZ", 3). It returned 3 # and I write three bytes UNIX> ls -l txt/out3.txt -rw-r--r-- 1 plank staff 7 Feb 4 17:14 txt/out3.txt # The file is still 7 bytes, because I didn't call with O_TRUNC UNIX> cat txt/out3.txt XYZDEFGUNIX> # It overwrote the "ABC" with "XYZ". UNIX> bin/w2 wc --- # O_CREAT is specified, but the file exists, so it does nothign. I didn't truncate. called write(3, "---", 3). It returned 3 UNIX> ls -l txt/out3.txt -rw-r--r-- 1 plank staff 7 Feb 4 17:15 txt/out3.txt # Still 7 bytes. UNIX> cat txt/out3.txt ---DEFGUNIX> # And the "XYZ" is replaced with "---". UNIX> bin/w2 wt abcde # Now, I specify O_TRUNC called write(3, "abcde", 5). It returned 5 UNIX> ls -l txt/out3.txt -rw-r--r-- 1 plank staff 5 Feb 4 17:16 txt/out3.txt # And the file is 5 bytes now, rather than 7 UNIX> cat txt/out3.txt abcdeUNIX> # Still no newline. UNIX> rm txt/out3.txt UNIX> bin/w2 wt fghij # This fails because the file doesn't exist, and I didn't specify O_CREAT. txt/out3.txt: No such file or directory UNIX>
Armed with that information, we can write a very simple cat program (one that copies standard input to standard output) with one line: (this is in src/simpcat.c):
#include <unistd.h> int main() { char c; while (read(0, &c, 1) == 1) write(1, &c, 1); return 0; } |
You'll note, because I am only calling the system calls read() and write(), I don't need to include stdio.h or stdlib.h.
UNIX> bin/simpcat < txt/in1.txt Jim Plank Claxton 221 UNIX>