Different languages use different syntax to specify regular expressions. These notes will use the Unix system syntax since a wide variety of Unix tools make use of this syntax and because Perl makes use of it as well. Unfortunately Python and Emacs use a different type of syntax so it does not carry over to these two tools.
UNIX> grep pattern [ files ]If you don't specify files on the command line, then it will use standard input. It prints out all lines in the specified files that contain the pattern. If you specified more than one file on the command line, then it will prepend the line with the file that it came from. Examples:
UNIX> grep penny md He will get but a penny a day UNIX> grep penny < md He will get but a penny a day UNIX> grep all md sth sth:Our shadow's taller than our souls sth:There walks a lady we all know sth:Why all that glitters is not gold sth:These lyrics are all old as mold UNIX>The pattern is a ``regular expression.'' While they're not exactly the same as regular expressions in something like CS380, they're pretty close. I'll borrow from the grep and ed man pages to define regular expressions:
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.
A list of characters enclosed by [ and ] matches any single character in that list; if the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit. A range of ASCII characters may be specified by giving the first and last characters,
UNIX> cat greptest Jim Plank This string contains no numbers This string does though (1) -9.00 G0 V0LS UNIX> grep '[Gg]' greptest This string contains no numbers This string does though (1) G0 V0LS UNIX> grep '[0-9]' greptest This string does though (1) -9.00 G0 V0LS UNIX> grep '[A-Z]' greptest Jim Plank This string contains no numbers This string does though (1) G0 V0LS UNIX> grep '[^A-Za-z ]' greptest This string does though (1) -9.00 G0 V0LS UNIX>
The caret ^ and the dollar sign $ are metacharacters that respectively match the empty string at the beginning and end of a line.
UNIX> grep '^.........$' greptest Jim Plank UNIX>To grep for lines with at least 9 characters, do:
UNIX> grep '.........' greptest Jim Plank This string contains no numbers This string does though (1) UNIX>To grep for lines that end with two numbers, do:
UNIX> grep '[0-9][0-9]$' greptest -9.00 UNIX>
The sequence \> constrains the one-character RE immediately preceding it to only match something at the end of a "word."
UNIX> grep all sth Our shadow's taller than our souls There walks a lady we all know Why all that glitters is not gold These lyrics are all old as mold UNIX> grep '\<.ll\>' sth There walks a lady we all know Why all that glitters is not gold These lyrics are all old as mold UNIX> grep 'dow\>' sth Our shadow's taller than our souls UNIX> grep '\<.\>' sth Our shadow's taller than our souls (matching the s in "shadow's") There walks a lady we all know UNIX>
UNIX> grep 'Z*' md See Saw, Margery Daw Johnny will have a new Master He will get but a penny a day Because this poem is a disaster! UNIX>Here are some more examples. The first greps for two words separated by a space (actually, since * can match zero, this will also match a single space, or a word before or following a single space). The second greps for a period followed by any number of zeros, and then the end of line. The last greps for any line with two zeros somewhere.
UNIX> grep '^[^ ]* [^ ]*$' greptest Jim Plank G0 V0LS UNIX> grep '\.0*$' greptest -9.00 UNIX> grep '0.*0' greptest -9.00 G0 V0LS UNIX>
UNIX> grep '0\{1\}' greptest -9.00 G0 V0LSThis is equivalent to grepping for 000*:
UNIX> grep '0\{2,\}' greptest -9.00Here we grep for 5-letter words containing just lower case letters, then 5-letter words, then words of at least 5 letters:
UNIX> grep '\<[a-z]\{5\}\>' greptest UNIX> grep '\<[A-Za-z]\{5\}\>' greptest Jim Plank UNIX> grep '\<[A-Za-z]\{5,\}\>' greptest Jim Plank This string contains no numbers This string does though (1) UNIX>If you want to make sure that grep prints out the file name of the file that the line comes from, include /dev/null on the command line. Then you'll have at least two files on the command line, and grep will be sure to print the file name:
UNIX> grep '\<.ld\>' sth These lyrics are all old as mold UNIX> grep '\<.ld\>' sth /dev/null sth:These lyrics are all old as mold UNIX>grep can do far more than this -- you need to read the man page to figure it all out. Also you should read about egrep and fgrep.