UNIX> awk program [ file ]or
UNIX> awk -f program-file [ file ]Like sed, awk can work on standard input or on a file. Like the shell, if you start an awk program with
#!/bin/awk -fthen you can execute the program directly from the shell.
Most systems also have nawk, which stands for ``new awk.'' Nawk has many more features than awk and is generally more useful. I am just going to cover awk, but you should check out nawk too in your own time. Nawk has some nice things like a random number generator, that awk doesn't have.
pattern { action }What such a statement does is apply the action to all lines that match the pattern. If there is no pattern, then it applies the action to all lines. If there is no action, then the default action is to copy the line to standard output. Patterns can be regular expressions enclosed in slashes (they can be more than that, but for now, just assume that they are regular expressions).
So, for example, the program awkgrep works just like ``grep Jim''.
UNIX> cat awkgrep #!/bin/awk -f /Jim/ UNIX> cat input Which of these lines doesn't belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX> awkgrep input Jimmy Carter UNIX> awkgrep < input Jimmy Carter UNIX>
Awk breaks up each line into fields, which are basically whitespace-separated words. You can get at word i by specifying $i. The variable NF contains the number of words on the line. The variable $0 is the line itself.
So, to print out the first and last words on each line, you can do:
UNIX> cat input Which of these lines doesn't belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX> awk '{ print $1, $NF }' input Which belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX>An alternative awkgrep prints out $0 when it finds the pattern:
UNIX> cat awkgrep2 #!/bin/awk -f /Jim/ { print $0 } UNIX> awkgrep2 input Jimmy Carter UNIX>Awk has a printf just like C. You don't have to use parentheses when you call it (although you can if you'd like). Unlike print, printf will not print a newline if you don't want it to. So, for example, awkrev reverses the lines of a file:
UNIX> cat awkrev #!/bin/awk -f { for (i = NF; i > 0; i-- ) printf "%s ", $i printf "\n" } UNIX> awkrev input belong: doesn't lines these of Which Clinton Bill Bush George Reagan Ronald Carter Jimmy Stallone Sylvester UNIX>A few things that you'll notice about awkrev: Actions can be multiline. You don't need semicolons to separate lines like in C. However, you can specify multiple commands on a line and separate them with semi-colons as in C. And you can block commands with curly braces as in C. If you want a command to span two lines (this often happens with complex printf statements), you need to end the first line with a backslash.
Also, you'll notice that awkrev didn't declare the variable i. Awk just figured out that it's an integer.
UNIX> echo "4 Jim" | awkcast Word 1: as a number: 4, as a string: 4. 0 appended: number: 40, string 40 Word 2: as a number: 0, as a string: Jim. 0 appended: number: 0, string Jim0 UNIX>Casting a string to an integer gives it its atoi() value.
UNIX> cat awkwc #!/bin/awk -f BEGIN { nl = 0; nw = 0 } { nl++ ; nw += NF } END { print "Lines:", nl, "words:", nw } UNIX> awkwc awkwc Lines: 5 words: 26 UNIX> wc awkwc 5 26 103 awkwc UNIX>
Here are some simple examples. awkpo prints out only the odd numbered lines (note that this is an awkward way to do this, but it works):
UNIX> cat awkpo #!/bin/awk -f BEGIN { ln=0 } { ln++ if (ln%2 == 0) next print $0 } UNIX> cat -n input 1 Which of these lines doesn't belong: 2 3 Bill Clinton 4 George Bush 5 Ronald Reagan 6 Jimmy Carter 7 Sylvester Stallone UNIX> cat -n input | awkpo 1 Which of these lines doesn't belong: 3 Bill Clinton 5 Ronald Reagan 7 Sylvester Stallone UNIX>awkptR prints out all lines until it reaches a lines with a capital R
UNIX> cat awkptR #!/bin/awk -f /R/ { exit } { print $0 } UNIX> awkptR input Which of these lines doesn't belong: Bill Clinton George Bush UNIX>
Take a look at awkgolf. This is typical of quick-and-dirty awk programs that you sometimes write to look at data. This one processes golf scores. Suppose you have some score files, as in the files usopen, masters, kemper and memorial. These files first have the name of the tournament in all caps, and then scores for a bunch of golfers. Suppose you'd like to see all the golfers with scores for each tournament in a readable form. This is what awkgolf does. Let's break it into its four parts.
The first part is the BEGIN line:
BEGIN { nt = 0 ; np = 0 }This simply initializes two variables: nt is the number of tournaments, and np is the number of players.
The next line looks a little cryptic:
/^[A-Z]*$/ { this = $0; tourn[nt] = $0 ; nt++; next }This only works on lines that are all capital letters. These are the lines that identify tournaments. On these lines, it does the following:
The next part works on all lines that contain the pattern '--'. These are the lines with golfers' scores:
/--/ { golfer = $1 for (i = 2; $i != "--" ; i++) golfer = golfer" "$i if (isgolfer[golfer] != "yes") { isgolfer[golfer] = "yes" g[np] = golfer np++; } score[golfer" "this] = $(i+1) }The first two lines of this action set the golfer variable to be the golfer's name. Note that you can do string comparison in awk using standard boolean operators, unlike in C where you would have to use strcmp().
The next 5 lines use awk's associative arrays: The array isgolfer is checked to see if it contains the string ``yes'' under the golfer's name. If so, we have processed this golfer before. If not, we sed the golfer's entry in isgolfer to ``yes,'' set the np-th entry of the array g to be the golfer, and increment np.
Finally, we set the golfer's score for the tournament in the score array. Note that we don't use double-indirection. Instead, we simply concatenate the golfer's name and the tournament's name, and use that as the index for the array.
The last part of the program does the final formatting:
END { printf("%-25s", " "); for (j = 0; j < nt; j++) printf("%9s", tourn[j]) printf("\n") for (i = 0; i < np; i++) { printf("%-25s", g[i]) for (j = 0; j < nt; j++) printf("%9s", score[g[i]" "tourn[j]]) printf("\n") } }The first three lines print out 25 spaces, and then the names of the tournaments as held in the tourn array. Then we loop through each golfer, and print the golfer's name, padded to 25 characters, and then his score in each tournament. Note that if the golfer didn't play in the tournament, that entry of the tournament array will be the null string. This is quite convenient, because we don't have to test for whether the golfer played the tournament -- we can just use awk's default values.
Ok, lets try awkgolf:
UNIX> awkgolf kemper # Note that the ouput is only sorted because its # sorted in the input file KEMPER Justin Leonard -10 Greg Norman -7 Nick Faldo -7 Nick Price -7 Loren Roberts -6 Jay Haas -5 Paul Stankowski -5 Lee Janzen -4 Phil Mickelson -4 Davis Love III -3 Tom Lehman 0 Vijay Singh 0 Kirk Triplett 1 Steve Jones 2 Mark O'Meara 5 Don Pooley missed Ernie Els missed Fred Couples missed Hal Sutton missed Jesper Parnevik missed Scott McCarron missed Steve Stricker missed UNIX> cat masters usopen kemper memorial | awkgolf MASTERS USOPEN KEMPER MEMORIAL Tiger Woods 281 6 5 Tommy Tolles 283 2 -11 Tom Watson 284 16 0 Paul Stankowski 285 6 -5 -3 Fred Couples 286 13 missed Davis Love III 286 5 -3 -7 Justin Leonard 286 9 -10 0 Steve Elkington 287 7 Tom Lehman 287 -2 0 -3 Ernie Els 288 -4 missed -1 Vijay Singh 288 21 0 -14 Jesper Parnevik 289 11 missed -4 Lee Westwood 291 6 Nick Price 291 6 -7 Lee Janzen 292 13 -4 -11 Jim Furyk 293 2 -12 Mark O'Meara 294 9 5 -2 Scott McCarron 294 3 missed missed Scott Hoch 298 3 -11 Jumbo Ozaki 300 missed Frank Nobilo 303 9 -10 Bob Tway missed 2 -7 Brad Faxon missed 17 2 David Duval missed 11 -5 Greg Norman missed missed -7 -12 Loren Roberts missed 4 -6 Nick Faldo missed 11 -7 Phil Mickelson missed 10 -4 Steve Jones missed 15 2 3 Steve Stricker missed 9 missed -1 Jay Haas 2 -5 -4 Billy Andrade 4 -7 Hal Sutton 6 missed -1 Kirk Triplett 1 -2 Don Pooley missed -4 UNIX>
UNIX> awk '{print $0 > "f1"}' < input UNIX> cat f1 Which of these lines doesn't belong: Bill Clinton George Bush Ronald Reagan Jimmy Carter Sylvester Stallone UNIX>
UNIX> shwc awkwc Lines: 5 words: 26 UNIX> shwc < awkwc Lines: 5 words: 26 UNIX> shwc awkwc awkwc usage: shwc [ file ] UNIX>
Awk is not a good language for string processing. Irritatingly, it doesn't let you get at string elements with array operations. I.e. the following will fail:
UNIX> cat sp.awk { s = $1 ; s[0] = 'a' ; print s } UNIX> awk -f sp.awk input awk: syntax error near line 1 awk: illegal statement near line 1 UNIX>Of course, sed is ideal for string processing, so often you can get what you want with a combination of sed and awk.
Nawk has much more built into it than awk, and accepts awk as a subset, so if you're wanting to do things in awk but can't, check out nawk. I'm not a big nawk user, so I won't give you a big sell on nawk, but you should look at the man page.