The Scott book gives a nice overview of different scripting languages, and the features they share. These notes go into more detail about one of these languages, Python. Python is an interpreted language that looks a lot like C, and I tend to prefer it to most of the other scripting languages. Why do I think Python is so wonderful? Well:
if age < 20: if age < 10: print ("pre-teen") else: print ("adult")There is no doubt as to which block the else is attached because of the indentation--it is attached to the conditional statement age < 20. Python's insistence on using indentation is something that causes religious wars--you tend to either love it or hate it. I love it but you can and will decide for yourself.
To try out Python, type python at the Unix prompt. You will see something like:
Python 3.1.2 (r312:79360M, Mar 24 2010, 01:33:18) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "copyright", "credits" or "license()" for more information. >>>The starting messages on your machine will be somewhat different. The standard first program in a language is a hello world program, so give it a try. Try typing:
>>> print ("hello world") hello world >>>Now try some simple arithmetic and string operations:
>>> a = 10 >>> b = 20 >>> c = a + b >>> print(c) 30 >>> c = "brad" >>> c += " " + "vander zanden" >>> print(c) brad vander zandenThis simple example shows several neat characteristics of Python, most of which are shared by other scripting languages:
When writing quick and dirty Python programs, it is nice to be able to easily get input from stdin and print output to stdout. The input statement allows you to read a single value from stdin. It takes a prompt as an optional argument. The print statement allows you to write a comma delimited sequence of values to the screen. The print statement separates each value with a space and terminates the line with a newline character. For example:
>>> name = input("enter a name: ") enter a name: brad >>> print ("name =", name) name = bradPrior to version 3.0, the input function was called raw_input and the print function did not require parentheses. If your output looks like
('name =', 'brad')it is because you are using a pre 3.0 version of Python, and the print function with parentheses operates differently in those previous versions.
Python programs are placed in files ending with the suffix .py. They can be run as standalone programs using the python interpreter. For example, here is a sample python program:
import sys print(sys.argv[1], "degrees Celcius is", 9/5*int(sys.argv[1])+32, "Fahrenheit")The import statement imports the functions in the module sys into a namespace named sys and argv is a list of the command line arguments. If this program is stored in celciusFahrenheitConverter.py, then here is a sample invocation:
unix> python celciusFahrenheitConverter.py 0 0 degrees Celcius is 32 Fahrenheit
Often you will write a set of functions in a file and wish to load them into either the Python interpreter or a Python program. You can load a Python file using the import statement:
# use #'s for both single line and block comments >>> import stack # do not use the .py extensionFiles can contain both executable statements and function definitions. Python treats the loaded file as a module named stack and places all of the functions and variables in a namespace called stack. To access a specific name, say the function pop, you must use the fully qualified name stack.pop(). You may instead import all the names in stack into the top-level namespace using the from keyword:
>>> from stack import *Or you could import a comma separated series of names if you did not want to risk having name conflicts:
>>> from stack import pop, push, topThere are also handy system libraries, such as math:
>>> import math >>> math.sin(math.pi/2) 1.0When your program imports a module named stack, the interpreter searches for a file named stack.py in 1) the current directory, and then in 2) the list of directories specified by the environment variable PYTHONPATH, and finally in 3) an installation dependent set of directories. You can query and modify the list of directories being searched by accessing or editing the variable sys.path. This variable is created as a concatenation of your current directory, PYTHONPATH, and the installation dependent set of directories.
Often times you will find an error in your file when you run it, and will want to fix the error and reload the file. You must use the imp.reload command to do this, not the import command. The import command will act as a no-op and you will be left wondering what happened to your changes. So if I discover an error in my stack's pop command, I can edit the pop function and then reload the stack.py file as follows:
import imp imp.reload(stack)In pre 3.0 versions of Python, reload was a top-level keyword and you could type "reload stack". That no longer works in Python 3.x and up.
Here is a brief list of Python's types:
For arithmetic, there are a couple differences from C syntax:
>>> 2**3 8
One feature that sets most scripting languages apart from compiled languages like C, C++, and Java is that strings are a built-in type, rather than an array of characters or a class. Python has many built-in operators for manipulating strings. Python strings are represented using either single, double, or triple quotes. Thus both 'brad' and "brad" represent the string brad. Double-quoted strings are useful when you want to get a quote mark into a string, such as "brad's apple". Triple quoted strings are useful when you want a string to span multiple lines, as in:
>>> a = '''brad went ... to the store''' >>> a 'brad went\nto the store'Strings have a number of useful methods that are described in the following table:
Operation | Meaning |
---|---|
s.capitalize() | capitalizes the first letter of s |
s.capwords() | capitalizes the first letter of each word in s |
s.lower() | converts s to lowercase letters |
s.upper() | converts s to uppercase letters |
s.count(substring) | counts the number of occurrences of substring in s |
s.find(substring) | returns the index of the first occurrence of substring in s, or -1 if it is not found |
s.split() | returns a list of words in s, using whitespace (space, tab, newline) as a delimiter. Ignores leading and trailing whitespace and allows unlimited whitespace between words. |
s.join(list) | joins a list of words into a single string with s as a separator. Very useful for creating comma-separated lists for a spreadsheet program. |
s.strip() | strips leading and trailing whitespace from s |
s.replace(old,new) | replaces all occurrences of old with new in s |
If you need more sophisticated manipulation of a string, you can use Python's regular expression mechanism.
You can interpolate variables into a string or format a string using a string's format method. Prior to version 3.1, you could use the % operator to format strings, using C-style format specifiers. Starting with version 3.1 and up, you must use the format method and Python's idiosyncratic format syntax. Here are a few concrete examples:
>>> first = 'brad' >>> last = 'vander zanden' >>> "{0} {1}".format(first,last) 'brad vander zanden' >>> "{0:>10}".format(first) ' brad' >>> "{0:^10}".format(first) ' brad ' >>> 'The price is {0:6.2f}'.format(8.2586) 'The price is 8.26' >>> "{0:06.2f}".format(8.2586) '008.26' >>> "{0:*^10.2f}".format(8.2586) '***8.26***'The examples illustrate how you positionally specify arguments with a set of curly braces {}. If you omit further formatting information, then Python assumes that you have applied the str function to the argument, which converts any variable to its string representation. You can specify further formatting information by following the argument index with a colon.
A simplified syntax for the formatting string is:
[[fill]align][0][width][.precision][type]There are a few additional formatting options allowed, and you can look these up in the Python reference manual if you are interested. Notice that if you specify a fill character, then you must also specify an alignment character. Here are the meanings of each of these options:
Option | Meaning |
---|---|
fill | The padding character used if the value is smaller than the field width |
align | The alignment of the value within the field if the
value is smaller than the field width. The allowable alignment options
are:
|
0 | Pads numeric fields with 0's between the sign and the digits. Equivalent to using '=' for the alignment operator and 0 for the fill character. |
width | The minimum width of the field |
precision |
|
type | The type of the field to be displayed. By default it
is a string. The most common types of values are:
There are other types for binary, octal, hexadecimal, and more complicated types of floating point presentations. Feel free to look them up in the Python reference manual. |
Lists are probably Python's most useful compound data structure. Python does not provide a general array data structure (it has specialized arrays for integers) but a list can be subscripted just like an array so I always use lists to simulate arrays. Remember that while using lists to simulate arrays may not be the most efficient thing in the world, I am interested in rapid prototyping and program development, not efficiency.
Lists are written as a list of comma-separated elements enclosed in square brackets:
>>> a = [10, 20, 'brad', 'knoxville'] >>> a [10, 20, 'brad', 'knoxville'] >>> a[1] 20 >>> a[-1] # a negative subscript starts from the end of the list 'knoxville'Note that lists can contain heterogeneous types. A really neat thing about lists is that I can reference a sublist using a slicing mechanism. A slice is written as two indices separated by a colon:
>>> a[1:3] [20, 'brad']If you omit the first index it defaults to 0. If you omit the second index, it defaults to the length of the list:
>>> a[:2] [10, 20] >>> a[2:] ['brad', 'knoxville']This slice notation makes it easier to pass portions of arrays to functions. If I am writing the partition routine for quicksort and I want to pass the portion of the array from i to j (including i but not including j), I can write:
partition(a[i:j])When the array arrives in partition, I can start my subscripts at 0. The end of the array can be accessed using the len operator. Hence I do not need to pass in the subscripts i or j. The ability to handle a subarray as a normal array with subscripts beginning at 0 makes the partition code easier to write and easier to read.
Here is a table of commonly used operations on lists. All of these operations modify the original list, unless otherwise noted.
Operation | Description |
---|---|
list[3] = 'brad' | changes the 4th element of the list to 'brad' |
del list[3] | deletes the 4th element of the list |
max(list) | returns the maximum element in list |
min(list) | returns the minimum element in list |
len(list) | returns the length of the list |
list.append(3) | appends 3 to list |
list.insert(3, 'brad') | inserts 'brad' at the 4th element in the list, and pushes all remaining elements, including the previous 4th element, one element to the right. |
list + [8, 10] | returns a new list with 8 and 10 appended to the end. The original list remains unchanged. |
list.extend([8,10]) | appends the list [8,10] to the end of list |
3 in list | returns True/False depending on whether or not 3 is in the list |
list.index(3) | returns the index of the first occurrence of 3. Throws a ValueError exception if 3 is not in the list |
list.pop() | returns and removes the last element from list |
list.pop(3) | returns and removes the 4th element from list |
list.remove(35) | searches for and removes the first occurrence of 35 from list. Throws a ValueError exception if 35 is not in the list. |
list.reverse() | reverses the elements of list |
list.sort() | sorts a list in ascending order. The list sorting section has a description of more sophisticated sorting techniques, but to understand it you must read the intervening sections that follow. |
Tuples are similar to lists, but unlike lists, they are immutable. Tuples are declared using parentheses, (), rather than square brackets. For example:
x = ('brad', 'm', '2/3/64')You can also "unpack" the elements of a tuple:
name, sex, birthdate = xOne of the most useful things you can do with tuples is use them to return multiple values from a function and then unpack the values by assigning the function result to multiple variables. For example, a simple function to return the minimum and maximum values of a list could be written as:
def minmax(list): return (min(list), max(list)) >>> min, max = minmax([10, 30, 40, 25, 50, 5]) >>> min 5 >>> max 50
Dictionaries are hash tables that store key/value pairs. It is usually best to limit keys to primitive types, such as integers or strings. Dictionaries are written as a list of comma-separated pairs enclosed in curly brackets:
>>> phone = {'brad': '974-1875', 'nancy' : '818-5868', 'george' : '385-8685'} >>> phone{'brad'} '974-1875' >>> phone['yifan'] = '396-6858' >>> 'yifan' in phone True >>> 'sue' in phone False >>> del phone['george'] >>> 'george' in phone False >>> age = {} >>> age['brad'] = 30 >>> age['smiley'] = 4 >>> age['brad'] >>> 30Notice that keys may be used like array indices, and hence dictionaries are often called "associative arrays".
You can iterate through the keys of a dictionary, just like a list:
>>> for name in age: ... print(name) ... smiley bradHere is a table of useful dictionary operators and methods:
Operation | Meaning |
---|---|
dict['brad'] | return the value associated with the key 'brad' |
dict['brad'] = 30 | associate the value 30 with the key 'brad' |
del dict['brad'] | delete the key 'brad' from dict |
dict.clear() | remove all elements from dict |
len(dict) | return the number of elements in dict |
'brad' in dict | returns True/False depending on whether or not the key 'brad' is in dict |
dict.keys() | return a "view" of the keys in dict |
dict.values() | return a "view" of the values in dict |
dict.items() | return a "view" of (key,value) pairs in dict. The key/value pairs are tuples |
Prior to version 3.0, the keys, values, and items methods returned lists. You can treat a view just like a list in for statements. For example:
for age in age.values(): print(age)However, if you want a true list, you need to call the list function on the view:
>>> ages = list(age.values())
This portion of the notes provides a quick introduction to classes, showing how they can be used like C's version of structs. Their more object-oriented features are covered later in the notes.
Python does not have the C equivalent of struct, but it does support classes and a class can be used to mimic a struct. For example, suppose we want to define a node data structure for a depth-first search. In C the declaration might look as follows:
struct node { int visited; struct neighbor_list *neighbors; };In Python, one would define a node class. One could just write:
class node: visited = False neighbors = [] >>> node.visited FalseNote that unlike C or C++, a class is actually an object and you can query its fields.
To create an instance of a class, you call the class as though it were a function:
x = node() x.visited = True x.neighbors = myNeighbors x.name = 'brad'The above code illustrates how one can dynamically add new instance variables, such as name, to an object. Unlike a static class language, such as C++ or Java, instances of a class are allowed to add their own instance variables, and hence two instances of a class may contain different sets of instance variables.
You are also allowed to delete instance variables from an object, but the effect may not be what you expect. While the instance variable will be deleted from the immediate object, Python will attempt to traverse the inheritance hierarchy to find a value for the instance variable if the programmer requests it. For example:
>>> x.visited True >>> del x.visited >>> x.visited Falsevisited was successfully deleted as a local instance variable of x, but when I now request visited from x, Python traverses the inheritance hierarchy, finds the visited variable in node, and returns its value, which is False.
The reason that Python has this flexibility to add and delete instance variables is that in each object, it stores all instance variables in a dictionary variable called __dict__ (unlike most object-oriented languages, Python does not have protection mechanisms for instance variables, so by convention Python's designers put __ before and after variables that have special meaning, so that you do not accidentally clobber them). Hence an assignment of a value to an instance variable simply updates the dictionary entry for that instance variable and a deletion of an instance variable simply deletes the dictionary entry for that instance variable. Each object also has a variable called __class__ that contains a pointer to the object's class object, which is what allows Python to traverse the inheritance hierarchy, looking for an instance variable. Although it is not generally recommended, for rapid prototyping you can change an object's class by changing its __class__ variable.
You can write a constructor using the __init__ method. The init method is automatically called when an instance of a class is allocated. The __init__ method must be declared inside the class and must take at least one argument, which is a reference to the instance being allocated. Typically this argument is called self (for those of you who know C++, self is equivalent to the this pointer). For example, one could write:
class node: def __init__(self, myNeighbors): self.visited = False self.neighbors = myNeighborsIt is important to prefix your instance variables with self. If you fail to do so, then the method will think that you are creating a local variable.
Since __init__ is automatically invoked when an instance of node is created, one could now allocate an instance of node and initialize it using the statement:
x = node(neighborsList)Unlike C++ or Java, you may not overload the constructor. However, you may use keyword arguments. For example:
class node: def __init__(self, name, visited=False, neighbors=[]): self.name = name self.visited = visited self.neighbors = neighbors >>> x = node('brad', neighbors=neighborsList)Keyword parameters give you the flexibility to require some parameters, such as name, and make other parameters optional. All required, unnamed parameters must appear first in the parameter list. These parameters are often called positional parameters. When you invoke a function, the unnamed arguments that you provide are matched serially with the positional parameters, and if there are more unnamed arguments than positional parameters, the remaining unnamed arguments will be matched serially with the keyword parameters. Keyword arguments may be presented in any order, and do not have to match the order in which they appear in the function's parameter list.
The syntax of the control structures is very C-like, with a few new, simplifying features added in. Instead of using {}'s, you use colons to start blocks. For example:
if age < 13: print "pre-teen"
Conditionals are much like C conditionals, except you can use the elif construct to simplify nested conditions:
if age < 13: print "pre-teen" elif age < 18: print "adolescent" else: print "adult"The above code could also have been written more verbosely as:
if age < 13: print "pre-teen" else: if age < 18: print "adolescent" else: print "adult"but the nested if/else statements makes the logic less obvious.
Note that no parentheses are needed for the conditional expressions since the colins make it clear where the expressions end.
In C, one is forever using and and or statements to construct boolean expressions. Often these and's and or's obfuscate relatively simple expressions. For example, if I want to test if a grade is between 90 and 100, I have to write:
if ((grade > 90) && (grade <= 100))Python allows the programmer to string together equality and inequality statements. Consequently, I can write the above expression more elegantly and concisely in Python as:
if (90 < grade <= 100)I would argue that the above conditional is far easier to read and understand than the corresponding C conditional.
Python can also make boolean or conditions easier to write and understand. Suppose I want to determine if the variable x is equal to one of the integers 10, 20, or 30. In C I would have to write:
if ((x == 10) || (x == 20) || (x == 30))Python provides the convenient operator in that can be used with lists or tuples. Hence, I can write the equivalent conditional in Python as:
if x in (10, 20, 30): print ("yes") else: print ("no")Again, I would argue that the intent of the latter conditional is more evident than the intent of the previous conditional.
Functions are introduced with the def keyword:
def celciusConverter(celcius): return 9.0/5 * celcius + 32All variables used in a function are by default local variables. If you want to reference a global variable, you need to declare it with the global keyword:
x = 10 def addX(y): global x return x + yYou can use tuples to return multiple values from a function. For example, suppose you have a top function for a stack that should return a value and a flag indicating whether or not the operation succeeded. In Python you can write the function as:
def top(stack): if len(stack) == 0: return (0, False) else: return (stack[-1], True)Note how much simpler and more readable this code is than the corresponding C or C++ code. With either a C or C++ function, you have to do something kludgy, like pass a pointer to an argument, so that you can then modify it. In C you might write your function prototype as:
int top(Stack *s, int *successFlag)and in C++ you might write the function prototype as:
int top(Stack &s, int &successFlag)You can access the elements of a tuple with indices, just as in a list. You can also unpack a tuple by putting multiple variables to the left of an assignment statement:
>>> value, success = top(myStack)
The principle loops in Python are the while and for loops. The break statement provides a structured way to exit a loop and the continue statement provides a structured way to skip the rest of the current iteration of a loop.
While loops are just like C:
while celcius < 100: print "fahrenheit = ", 9.0/5*celcius + 32 celcius += 1While Python supports C's "+=" notation, it does not support C's "++" notation.
For loops are much different than in C. In C they are used as counting loops. In Python they are used to iterate over collections of values, such as lists. You can obtain the functionality of a counting loop in Python by iterating over a list produced by the range function. The range function takes a begin and end value, and produces a list of values starting with the begin value and ending 1 short of the end value. For example, range(1,6) returns the list [1,2,3,4,5]. A simple loop for summing the numbers from 1 to n would be:
sum = 0 for i in range(1,n+1): # remember that range ends 1 short of the end value sum += i
One nice thing about Python's syntax for a for loop is that it looks a great deal like the syntax used by textbooks to present pseudo-code for an algorithm. For example, a depth-first search of a graph starts from a node and visits each of its neighbors in a depth-first fashion. If we assume that each node contains a visited field and a neighbors field, which is a list of neighbors, then a depth-first search procedure can be written succinctly in Python as follows:
def dfs(node): node.visited = True for neighbor in node.neighbors: if neighbor.visited == False: dfs(neighbor)There is no way that you could write such an succinct and elegant procedure for depth-first search in C.
Loop statements may have an else clause attached to them that is executed if either a list is exhausted in the case of a for loop or a condition becomes false in the case of a while loop. The else clause is not executed if a break causes a premature exit from the loop. Combining a break statement with an else clause allows one to avoid the inelegant code often associated with setting and testing the value of a flag variable. For example, suppose I want to write a naive procedure that determines whether a variable is a prime number or not. If the number is prime, the procedure should print that the number is prime. If the number is not prime, the procedure should print the number's factors. In C, the procedure might look as follows:
int is_prime(int n) { int i, prime_flag; prime_flag = 1; for (i = 2; (i < n) && (prime_flag == 1); i++) { if ((n % i) == 0) { printf("%d equals %d * %d\n", n, i, n / i); prime_flag = 0; } } if (prime_flag == 1) printf("%d is a prime number\n", n); }In Python the function could be written more succinctly and elegantly as:
def is_prime(n): for i in range(2, n): # range produces a list with the numbers [2, 3, ..., n-1] if n % i == 0: print ("{0} equals {1} * {2}".format(n, i, n//i)) break else: print (n, 'is a prime number')
You can sort a list using either the built-in function sorted or a list object's sort method. sorted builds and returns a new list while sort modifies the original list. By default both functions sort a list in ascending order. For example:
>>> list = [30, 40, 49, 15, 40, 80] >>> sortedList = sorted(list) >>> sortedList [15, 30, 40, 40, 49, 80] >>> list [30, 40, 49, 15, 40, 80] # note that the original list does not change >>> list.sort() >>> list [15, 30, 40, 40, 49, 80] # now the original list has changedYou can cause a list to be sorted in descending order by using the reverse keyword argument:
>>> list.sort(reverse=True) >>> list [80, 49, 40, 40, 30, 15]If the types to be compared are compound types, such as a user-defined class or a tuple (recall that tuples are immutable lists), you have a couple of options for specifying a comparison function:
>>> from operator import attrgetter >>> studentList.sort(key=attrgetter('name'))and if you had tuples with two elements and wanted to use the second element as the key, you could write:
>>> from operator import itemgetter >>> tuplesList = [('smiley', 70), ('joey', 55), ('sarah', 70), ('mary', 60)] >>> tuplesList.sort(key=itemgetter(1)) [('joey', 55), ('mary', 60), ('smiley', 70), ('sarah', 70)]attrgetter and itemgetter are functions defined in the operator module that retrieve the specified attribute value from a class or the value at the specified index location from a tuple. More complicated functions can be used as well, but you should refer to the Python reference manual for details.
This is a good time to point out that both sort and sortedprovide stable sorts, which means that if two values are equal, they will appear in the same order in the sorted list as they appeared in the original list. Hence since the 'smiley' tuple appeared before the 'sarah' tuple in the original list, it also appears before the 'sarah' tuple in the sorted list.
class Student: def __init__(self, name, grade): self.name = name self.grade = grade def __lt__(self, otherStudent): return self.grade < otherStudent.grade \ # \ allows you to continue a line or (self.grade == otherStudent.grade and self.name < otherStudent.name)If you need to specify a primary and secondary key for some other type of compound type, such as a tuple, you will need to look at examples on the web, as they are relatively kludgy to specify.
Classes were introduced earlier, but we treated them primarily as structs. As in C++ or Java, classes in Python may contain methods. These methods may either be defined when the class is created, or defined elsewhere and then assigned to the class object as a function pointer. Here are the two different ways to create methods:
class node: name = 'brad' visited = False def printName(self): print(self.name) def printVisited(self): print ("visited =", self.visited) >>> node.printer = printVisited >>> x = node() >>> x.printer() visited = FalseNote that as with __init__, the first argument must be a pointer to the instance object. Also note that Python treats class and instance objects differently, so the function printVisited is considered a function object in node but a method object in x. If I tried to type:
>>> node.printer()I would get an error message. However,
>>> node.printer(x)works just fine.
You can define one or more superclasses for a new class by including the superclasses in parentheses:
class nameNode(node): # nameNode is subclass of node pass # pass is a no-op for when I don't want to do anything class nameNode(node, student): # nameNode is subclass of both node and student passName conflicts between superclasses are handled by traversing the super class list in sequential order and returning the first name found. Hence if visited is defined in both node and student, then node's visited will be returned if we ask for nameNode's visited, and visited is not stored locally in nameNode.
Ordinarily you will get a pointer reference if you try to print a class:
>>> x = node() >>> print(x) <__main__.node object at 0x1255750>If you would like a pretty-printed representation of the class, you can override the __str__ method and have it return a string:
class node: name='brad' visited=False neighbors=[] def __str__(self): stringRepresentation = "name: " + self.name + "\n"; stringRepresentation += "visited: " + str(self.visited) + "\n"; stringRepresentation += "neighbors:\n"; for node in self.neighbors: stringRepresentation += "\t" + str(node) + "\n"; return stringRepresentation; >>> x = node() >>> x.name='smiley' >>> print(x) name: smiley visited: False neighbors:
You have already seen the simple functions input for reading a single value from stdin and print for writing values to stdout. Python defines three file objects named sys.stdin, sys.stdout, and sys.stderr that allow you to access the conventional standard i/o streams. In order to manipulate these file objects, you need to read on and learn about file objects.
Python implements file objects using C's stdio package. The following table shows the most commonly used commands for file objects:
Operation | Meaning |
---|---|
open("filename") | Open a file for reading and return a pointer to a file object |
open("filename", "r|w|a") | Open a file with the given accessor
and return a pointer to a file object. The meaning of the accessor options
are:
|
f.readline() | read a single line of input. The line will be terminated with a '\n' character. Returns None when end of file is reached. |
f.readlines() | return all of the file as a list. All lines will be terminated with a '\n' character. |
f.write(s) | write the string s to the file. You must include the '\n' character if you want a newline. |
f.writelines(list) | write a list of strings to the file. You must terminate each string with a '\n' character if you want newlines. |
f.close() | close the file |
You can strip newline characters using the string module's strip command. Here is an example of reading a line of input and stripping the newline character:
>>> my_file = open('/azure/homes/bvz/temp', 'r') >>> line = my_file.readline() >>> line 'Dear Dr. Tripathi, \012' >>> >>> import string >>> line = string.strip(line) >>> line 'Dear Dr. Tripathi,'
You now should know how to read input from stdin:
>>> a = sys.stdin.readline()reads the next line of standard input into a.
Once a string has been read into a variable, fields may be extracted from the string using either Python's string package or its regular expression package. If the fields are separated by whitespace (spaces, tabs, and newline characters), then one can use the split method in the string module to extract fields. For example:
>>> line = ' Dear Dr. Tripathi,\n' >>> line.split() ['Dear', 'Dr.', 'Tripathi,']Notice that split ignores leading and trailing whitespace and automatically strips newline characters.
You can also easily replace one substring with another substring using the replace method:
>>> line = 'brad is the professor and brad is also the exam grader' >>> line.replace("brad","bvz") 'bvz is the professor and bvz is also the exam grader'If you want to do more sophisticated splitting, such as splitting comma separated fields, or you want to type check strings, or you want to extract information from a file, then you will need to use Python's regular expressions, which are much like Perl regular expressions. Regular expressions allow you to construct patterns that you want matched against a string. Python's reference manual on regular expressions provides an excellent description of its regular expression syntax. You should take a moment to read it before continuing here.
To use regular expressions, first import the regular expression module:
>>> import reNext you will need to create the pattern which you wish to search for in your string. For example, suppose I want to type check a string to determine whether it is a date of the form mm/dd/yy, where the month and day may be one or two digits. You could write either:
success = re.match('\d{1,2}/\d{1,2}/\d{2}', dateString) or success = re.search('\d{1,2}/\d{1,2}/\d{2}', dateString)The only difference between the two commands is that match requires the string to match the pattern starting at the beginning of the string, whereas search will search through the entire string to find a substring that matches the pattern. Both commands will return a match object if they succeed and None if they fail. When used in a boolean expression, None evaluates to false.
You will often want to extract data from the substring that matches the pattern, and you can do so by placing parentheses around the parts of the pattern where you want to capture data. You can then access the data by calling the match object's group method and passing it the parameters 1 through n, where 1 through n represent each of the n parenthesized groups. The index 0 will return the entire substring that matches the pattern. For example:
>>> success = re.search('(\d{1,2})/(\d{1,2})/(\d{2})', "2/28/64") >>> success.group(0) '2/28/64' >>> success.group(1) '2' >>> success.group(2) '28' >>> success.group(3) '64'Notice that I did not want to capture the slashes (/) and so I did not place parentheses around them.
I can also use regular expressions to split a string into substrings based on a pattern, and have the substrings returned as a list:
>>> re.split("\s+", "brad vander zanden") ['brad', 'vander', 'zanden']I used "\s+" to soak up one or more white spaces between fields in the string. The split command provides a much easier way to separate an input string into its constituent fields:
line = sys.stdin.readline() fields = re.split("\s+", line)A final convenient thing you can do with regular expressions is to substitute one pattern for another in a string. For example, suppose I want to substitute all occurrences of "h1" with "h2" in a string. I could write:
>>> re.sub("h1", "h2", "<h1>brad the man</h1> and <h1>joe</h1>") '<h2>brad the man</h2> and <h2>joe</h2>'Of course I would not only change the tags but any occurrences of h1 in the content as well, but you get the picture.
You can also use parentheses to match data and then alter the data. For example, to convert a name of the form "firstname lastname" into a name of the form "lastname, firstname", one could write:
>>> re.sub(r'(\w+)\s+(.+)', r'\2, \1', "brad vander zanden") >>> 'vander zanden, brad'Notice that the patterns had to be prefixed with the letter 'r'. If you fail to do this, you get garbage:
>>> re.sub('(\w+)\s+(.+)', '\2, \1', "brad vander zanden") '\x02, \x01'