Python


  1. The developer of Python, Guido van Rossum, has written an excellent introductory tutorial to Python, which you can find at http://www.python.org/doc/current/tut/tut.html. These notes will summarize the most important features of Python, as well as showing some of the really neat things that I've found that I can do in Python.

  2. Python manuals

  3. These lecture notes refer to Python 3.0 versions and later. Please try to use a version 3.0 or higher interpreter when trying the examples in these notes, because pre 3.0 versions either do not support certain operators/methods, such as the string's format method, or have different behavior for some functions, such as the print function.

  4. Version 3.0 became the first non-backwards compatible version of Python. Where appropriate, these notes indicate important differences between Python 3.0 and earlier 2.x versions of Python.

Introduction

The Scott book gives a nice overview of different scripting languages, and the features they share. These notes go into more detail about one of these languages, Python. Python is an interpreted language that looks a lot like C, and I tend to prefer it to most of the other scripting languages. Why do I think Python is so wonderful? Well:

  1. It can be used as a scripting language, like Awk or Perl. However, Python reads like a real programming language, unlike either Awk or Perl. Consequently, someone with C experience can look at a Python program and figure out what it is doing, unlike either Awk or Perl. Python is also aimed at supporting general computing whereas languages like PhP and JavaScript are more domain-specific languages which are aimed at server side scripting and client-side scripting respectively.

  2. Python is interpreted, which can save you considerable time during program development because compiling and linking are unnecessary. Python also has a command-line interpreter so you can try out Python language features and perform quick computations. I used the command-line interpreter to try out my examples as I wrote these notes, and I suggest you use it as well to try out your own examples as you read through these notes.

  3. Python programs can be developed much more rapidly than C programs, are more concise than C programs, and are easier to read than C programs, because:


Getting Started With Python

To try out Python, type python at the Unix prompt. You will see something like:

Python 3.1.2 (r312:79360M, Mar 24 2010, 01:33:18) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "copyright", "credits" or "license()" for more information.
>>>
The starting messages on your machine will be somewhat different. The standard first program in a language is a hello world program, so give it a try. Try typing:
>>> print ("hello world")
hello world
>>>
Now try some simple arithmetic and string operations:
>>> a = 10
>>> b = 20
>>> c = a + b
>>> print(c)
30
>>> c = "brad"
>>> c += " " + "vander zanden"
>>> print(c)
brad vander zanden
This simple example shows several neat characteristics of Python, most of which are shared by other scripting languages:

  1. Variables do not have to be declared before they are used.
  2. Variables are dynamically typed, with their type being determined by the value currently assigned to them. Hence c is initially an int but changes to a string once "brad" is assigned to it.
  3. The meaning of built-in operators depends on the types of the variables to which they are applied. In the above example, + initially adds two integers, but later concatenates three strings ("brad", " ", and "vander zanden").
  4. Statements are executed immediately by the command line interpreter and their results can be immediately queried (most scripting languages do not have a command line interpreter--this is a nice feature of Python).


Quick Start to Python I/O

When writing quick and dirty Python programs, it is nice to be able to easily get input from stdin and print output to stdout. The input statement allows you to read a single value from stdin. It takes a prompt as an optional argument. The print statement allows you to write a comma delimited sequence of values to the screen. The print statement separates each value with a space and terminates the line with a newline character. For example:

>>> name = input("enter a name: ")
enter a name: brad
>>> print ("name =", name)
name = brad
Prior to version 3.0, the input function was called raw_input and the print function did not require parentheses. If your output looks like
('name =', 'brad')
it is because you are using a pre 3.0 version of Python, and the print function with parentheses operates differently in those previous versions.

Invoking Python from the Command Line

Python programs are placed in files ending with the suffix .py. They can be run as standalone programs using the python interpreter. For example, here is a sample python program:

import sys
print(sys.argv[1], "degrees Celcius is", 9/5*int(sys.argv[1])+32, "Fahrenheit")
The import statement imports the functions in the module sys into a namespace named sys and argv is a list of the command line arguments. If this program is stored in celciusFahrenheitConverter.py, then here is a sample invocation:
unix> python celciusFahrenheitConverter.py 0
0 degrees Celcius is 32 Fahrenheit

Loading Python Programs

Often you will write a set of functions in a file and wish to load them into either the Python interpreter or a Python program. You can load a Python file using the import statement:

# use #'s for both single line and block comments
>>> import stack   # do not use the .py extension
Files can contain both executable statements and function definitions. Python treats the loaded file as a module named stack and places all of the functions and variables in a namespace called stack. To access a specific name, say the function pop, you must use the fully qualified name stack.pop(). You may instead import all the names in stack into the top-level namespace using the from keyword:
>>> from stack import *
Or you could import a comma separated series of names if you did not want to risk having name conflicts:
>>> from stack import pop, push, top
There are also handy system libraries, such as math:
>>> import math
>>> math.sin(math.pi/2)
1.0
When your program imports a module named stack, the interpreter searches for a file named stack.py in 1) the current directory, and then in 2) the list of directories specified by the environment variable PYTHONPATH, and finally in 3) an installation dependent set of directories. You can query and modify the list of directories being searched by accessing or editing the variable sys.path. This variable is created as a concatenation of your current directory, PYTHONPATH, and the installation dependent set of directories.

Reloading a Program/Module

Often times you will find an error in your file when you run it, and will want to fix the error and reload the file. You must use the imp.reload command to do this, not the import command. The import command will act as a no-op and you will be left wondering what happened to your changes. So if I discover an error in my stack's pop command, I can edit the pop function and then reload the stack.py file as follows:

import imp
imp.reload(stack)
In pre 3.0 versions of Python, reload was a top-level keyword and you could type "reload stack". That no longer works in Python 3.x and up.

Python Types

Here is a brief list of Python's types:

  1. Python's built-in types include integers, floats, characters, booleans (True/False), and strings.
  2. None is the equivalent of C's null.
  3. There are a number of higher-level, built-in data structures that include lists, dictionaries (hash tables), tuples (immutable lists), and sets (which are not described in these notes).
  4. There is a class mechanism for creating objects. You must also use the class mechanism if you want to create C-style structs.
  5. Python does not have built-in arrays. Instead you must use lists, which are much less efficient, but support the familiar array notation.

For arithmetic, there are a couple differences from C syntax:

  1. Starting with version 3.0, the division operator (/) returns a floating point result. You must use the // operator to get C's integer floor division. For example, 5/2 returns 2.5 while 5//2 returns 2. Prior to version 3.0, the division operator performed integer floor division, just like in C. Apparently the change was made so that the division operator would always return a float, regardless of the types of the operands, and because there were occasional ambiguities in resolving what type of division was being requested when the two operands had different types.

  2. The ** operator is the power operator. For example:
    >>> 2**3
    8
    

Strings

One feature that sets most scripting languages apart from compiled languages like C, C++, and Java is that strings are a built-in type, rather than an array of characters or a class. Python has many built-in operators for manipulating strings. Python strings are represented using either single, double, or triple quotes. Thus both 'brad' and "brad" represent the string brad. Double-quoted strings are useful when you want to get a quote mark into a string, such as "brad's apple". Triple quoted strings are useful when you want a string to span multiple lines, as in:

>>> a = '''brad went
... to the store'''
>>> a
'brad went\nto the store'
Strings have a number of useful methods that are described in the following table:

OperationMeaning
s.capitalize()capitalizes the first letter of s
s.capwords()capitalizes the first letter of each word in s
s.lower()converts s to lowercase letters
s.upper()converts s to uppercase letters
s.count(substring)counts the number of occurrences of substring in s
s.find(substring)returns the index of the first occurrence of substring in s, or -1 if it is not found
s.split()returns a list of words in s, using whitespace (space, tab, newline) as a delimiter. Ignores leading and trailing whitespace and allows unlimited whitespace between words.
s.join(list)joins a list of words into a single string with s as a separator. Very useful for creating comma-separated lists for a spreadsheet program.
s.strip()strips leading and trailing whitespace from s
s.replace(old,new)replaces all occurrences of old with new in s

If you need more sophisticated manipulation of a string, you can use Python's regular expression mechanism.

Formatting a String

You can interpolate variables into a string or format a string using a string's format method. Prior to version 3.1, you could use the % operator to format strings, using C-style format specifiers. Starting with version 3.1 and up, you must use the format method and Python's idiosyncratic format syntax. Here are a few concrete examples:

>>> first = 'brad'
>>> last = 'vander zanden'
>>> "{0} {1}".format(first,last)
'brad vander zanden'
>>> "{0:>10}".format(first)
'      brad'
>>> "{0:^10}".format(first)
'   brad  '
>>> 'The price is {0:6.2f}'.format(8.2586)
'The price is   8.26'
>>> "{0:06.2f}".format(8.2586)
'008.26'
>>> "{0:*^10.2f}".format(8.2586)
'***8.26***'
The examples illustrate how you positionally specify arguments with a set of curly braces {}. If you omit further formatting information, then Python assumes that you have applied the str function to the argument, which converts any variable to its string representation. You can specify further formatting information by following the argument index with a colon.

A simplified syntax for the formatting string is:

[[fill]align][0][width][.precision][type]
There are a few additional formatting options allowed, and you can look these up in the Python reference manual if you are interested. Notice that if you specify a fill character, then you must also specify an alignment character. Here are the meanings of each of these options:

OptionMeaning
fillThe padding character used if the value is smaller than the field width
alignThe alignment of the value within the field if the value is smaller than the field width. The allowable alignment options are:

  • '<': left aligns the value (the default)
  • '>': right aligns the value
  • '^': centers the value
  • '=': forces the fill character to be placed after the sign but before the digits in a number (e.g., -00035). Valid only for numeric types.
0Pads numeric fields with 0's between the sign and the digits. Equivalent to using '=' for the alignment operator and 0 for the fill character.
widthThe minimum width of the field
precision
  • For floating point numbers, the number of digits after the decimal point.
  • For non-numeric types, the maximum field size. Takes the first precision characters from the field.
  • Not valid with integers.
typeThe type of the field to be displayed. By default it is a string. The most common types of values are:

  • c: character
  • d: integer
  • f: float

There are other types for binary, octal, hexadecimal, and more complicated types of floating point presentations. Feel free to look them up in the Python reference manual.

Lists

Lists are probably Python's most useful compound data structure. Python does not provide a general array data structure (it has specialized arrays for integers) but a list can be subscripted just like an array so I always use lists to simulate arrays. Remember that while using lists to simulate arrays may not be the most efficient thing in the world, I am interested in rapid prototyping and program development, not efficiency.

Lists are written as a list of comma-separated elements enclosed in square brackets:

>>> a = [10, 20, 'brad', 'knoxville']
>>> a
[10, 20, 'brad', 'knoxville']
>>> a[1]
20
>>> a[-1]   # a negative subscript starts from the end of the list
'knoxville'
Note that lists can contain heterogeneous types. A really neat thing about lists is that I can reference a sublist using a slicing mechanism. A slice is written as two indices separated by a colon:
>>> a[1:3]
[20, 'brad']
If you omit the first index it defaults to 0. If you omit the second index, it defaults to the length of the list:
>>> a[:2]
[10, 20]
>>> a[2:]
['brad', 'knoxville']
This slice notation makes it easier to pass portions of arrays to functions. If I am writing the partition routine for quicksort and I want to pass the portion of the array from i to j (including i but not including j), I can write:
partition(a[i:j])
When the array arrives in partition, I can start my subscripts at 0. The end of the array can be accessed using the len operator. Hence I do not need to pass in the subscripts i or j. The ability to handle a subarray as a normal array with subscripts beginning at 0 makes the partition code easier to write and easier to read.

Here is a table of commonly used operations on lists. All of these operations modify the original list, unless otherwise noted.

OperationDescription
list[3] = 'brad'changes the 4th element of the list to 'brad'
del list[3]deletes the 4th element of the list
max(list)returns the maximum element in list
min(list)returns the minimum element in list
len(list)returns the length of the list
list.append(3)appends 3 to list
list.insert(3, 'brad')inserts 'brad' at the 4th element in the list, and pushes all remaining elements, including the previous 4th element, one element to the right.
list + [8, 10]returns a new list with 8 and 10 appended to the end. The original list remains unchanged.
list.extend([8,10])appends the list [8,10] to the end of list
3 in listreturns True/False depending on whether or not 3 is in the list
list.index(3)returns the index of the first occurrence of 3. Throws a ValueError exception if 3 is not in the list
list.pop()returns and removes the last element from list
list.pop(3)returns and removes the 4th element from list
list.remove(35)searches for and removes the first occurrence of 35 from list. Throws a ValueError exception if 35 is not in the list.
list.reverse()reverses the elements of list
list.sort()sorts a list in ascending order. The list sorting section has a description of more sophisticated sorting techniques, but to understand it you must read the intervening sections that follow.

Tuples

Tuples are similar to lists, but unlike lists, they are immutable. Tuples are declared using parentheses, (), rather than square brackets. For example:

x = ('brad', 'm', '2/3/64')
You can also "unpack" the elements of a tuple:
name, sex, birthdate = x
One of the most useful things you can do with tuples is use them to return multiple values from a function and then unpack the values by assigning the function result to multiple variables. For example, a simple function to return the minimum and maximum values of a list could be written as:
def minmax(list):
  return (min(list), max(list))

>>> min, max = minmax([10, 30, 40, 25, 50, 5])
>>> min
5
>>> max
50

Dictionaries (Hash Tables)

Dictionaries are hash tables that store key/value pairs. It is usually best to limit keys to primitive types, such as integers or strings. Dictionaries are written as a list of comma-separated pairs enclosed in curly brackets:

>>> phone = {'brad': '974-1875', 'nancy' : '818-5868', 'george' : '385-8685'}
>>> phone{'brad'}
'974-1875'
>>> phone['yifan'] = '396-6858'
>>> 'yifan' in phone
True
>>> 'sue' in phone
False
>>> del phone['george']
>>> 'george' in phone
False
>>> age = {}
>>> age['brad'] = 30
>>> age['smiley'] = 4
>>> age['brad']
>>> 30
Notice that keys may be used like array indices, and hence dictionaries are often called "associative arrays".

You can iterate through the keys of a dictionary, just like a list:

>>> for name in age:
...   print(name)
...
smiley
brad
Here is a table of useful dictionary operators and methods:

OperationMeaning
dict['brad']return the value associated with the key 'brad'
dict['brad'] = 30associate the value 30 with the key 'brad'
del dict['brad']delete the key 'brad' from dict
dict.clear()remove all elements from dict
len(dict)return the number of elements in dict
'brad' in dictreturns True/False depending on whether or not the key 'brad' is in dict
dict.keys()return a "view" of the keys in dict
dict.values()return a "view" of the values in dict
dict.items()return a "view" of (key,value) pairs in dict. The key/value pairs are tuples

Prior to version 3.0, the keys, values, and items methods returned lists. You can treat a view just like a list in for statements. For example:

for age in age.values():
  print(age)
However, if you want a true list, you need to call the list function on the view:
>>> ages = list(age.values())

Introduction to Classes

This portion of the notes provides a quick introduction to classes, showing how they can be used like C's version of structs. Their more object-oriented features are covered later in the notes.

Python does not have the C equivalent of struct, but it does support classes and a class can be used to mimic a struct. For example, suppose we want to define a node data structure for a depth-first search. In C the declaration might look as follows:

struct node {
    int visited;
    struct neighbor_list *neighbors;
};
In Python, one would define a node class. One could just write:
class node:
    visited = False
    neighbors = []

>>> node.visited
False
Note that unlike C or C++, a class is actually an object and you can query its fields.

To create an instance of a class, you call the class as though it were a function:

x = node()
x.visited = True
x.neighbors = myNeighbors
x.name = 'brad'
The above code illustrates how one can dynamically add new instance variables, such as name, to an object. Unlike a static class language, such as C++ or Java, instances of a class are allowed to add their own instance variables, and hence two instances of a class may contain different sets of instance variables.

You are also allowed to delete instance variables from an object, but the effect may not be what you expect. While the instance variable will be deleted from the immediate object, Python will attempt to traverse the inheritance hierarchy to find a value for the instance variable if the programmer requests it. For example:

>>> x.visited
True
>>> del x.visited
>>> x.visited
False
visited was successfully deleted as a local instance variable of x, but when I now request visited from x, Python traverses the inheritance hierarchy, finds the visited variable in node, and returns its value, which is False.

The reason that Python has this flexibility to add and delete instance variables is that in each object, it stores all instance variables in a dictionary variable called __dict__ (unlike most object-oriented languages, Python does not have protection mechanisms for instance variables, so by convention Python's designers put __ before and after variables that have special meaning, so that you do not accidentally clobber them). Hence an assignment of a value to an instance variable simply updates the dictionary entry for that instance variable and a deletion of an instance variable simply deletes the dictionary entry for that instance variable. Each object also has a variable called __class__ that contains a pointer to the object's class object, which is what allows Python to traverse the inheritance hierarchy, looking for an instance variable. Although it is not generally recommended, for rapid prototyping you can change an object's class by changing its __class__ variable.

You can write a constructor using the __init__ method. The init method is automatically called when an instance of a class is allocated. The __init__ method must be declared inside the class and must take at least one argument, which is a reference to the instance being allocated. Typically this argument is called self (for those of you who know C++, self is equivalent to the this pointer). For example, one could write:

class node:
    def __init__(self, myNeighbors):
        self.visited = False
	self.neighbors = myNeighbors
It is important to prefix your instance variables with self. If you fail to do so, then the method will think that you are creating a local variable.

Since __init__ is automatically invoked when an instance of node is created, one could now allocate an instance of node and initialize it using the statement:

x = node(neighborsList)
Unlike C++ or Java, you may not overload the constructor. However, you may use keyword arguments. For example:
class node:
  def __init__(self, name, visited=False, neighbors=[]):
    self.name = name
    self.visited = visited
    self.neighbors = neighbors

>>> x = node('brad', neighbors=neighborsList)
Keyword parameters give you the flexibility to require some parameters, such as name, and make other parameters optional. All required, unnamed parameters must appear first in the parameter list. These parameters are often called positional parameters. When you invoke a function, the unnamed arguments that you provide are matched serially with the positional parameters, and if there are more unnamed arguments than positional parameters, the remaining unnamed arguments will be matched serially with the keyword parameters. Keyword arguments may be presented in any order, and do not have to match the order in which they appear in the function's parameter list.

Python Control Structures

The syntax of the control structures is very C-like, with a few new, simplifying features added in. Instead of using {}'s, you use colons to start blocks. For example:

if age < 13:
  print "pre-teen"

Conditional Statements

Conditionals are much like C conditionals, except you can use the elif construct to simplify nested conditions:

if age < 13:
  print "pre-teen"
elif age < 18:
  print "adolescent"
else:
  print "adult"
The above code could also have been written more verbosely as:
if age < 13:
  print "pre-teen"
else:
  if age < 18:
    print "adolescent"
  else:
    print "adult"
but the nested if/else statements makes the logic less obvious.

Note that no parentheses are needed for the conditional expressions since the colins make it clear where the expressions end.

Simplified Boolean Expressions

In C, one is forever using and and or statements to construct boolean expressions. Often these and's and or's obfuscate relatively simple expressions. For example, if I want to test if a grade is between 90 and 100, I have to write:

if ((grade > 90) && (grade <= 100))
Python allows the programmer to string together equality and inequality statements. Consequently, I can write the above expression more elegantly and concisely in Python as:
if (90 < grade <= 100)
I would argue that the above conditional is far easier to read and understand than the corresponding C conditional.

Python can also make boolean or conditions easier to write and understand. Suppose I want to determine if the variable x is equal to one of the integers 10, 20, or 30. In C I would have to write:

if ((x == 10) || (x == 20) || (x == 30))
Python provides the convenient operator in that can be used with lists or tuples. Hence, I can write the equivalent conditional in Python as:
if x in (10, 20, 30):
  print ("yes")
else:
  print ("no")
Again, I would argue that the intent of the latter conditional is more evident than the intent of the previous conditional.

Functions

Functions are introduced with the def keyword:

def celciusConverter(celcius):
  return 9.0/5 * celcius + 32
All variables used in a function are by default local variables. If you want to reference a global variable, you need to declare it with the global keyword:
x = 10
def addX(y):
  global x
  return x + y
You can use tuples to return multiple values from a function. For example, suppose you have a top function for a stack that should return a value and a flag indicating whether or not the operation succeeded. In Python you can write the function as:
def top(stack):
  if len(stack) == 0:
    return (0, False)
  else:
    return (stack[-1], True)
Note how much simpler and more readable this code is than the corresponding C or C++ code. With either a C or C++ function, you have to do something kludgy, like pass a pointer to an argument, so that you can then modify it. In C you might write your function prototype as:
int top(Stack *s, int *successFlag)
and in C++ you might write the function prototype as:
int top(Stack &s, int &successFlag)
You can access the elements of a tuple with indices, just as in a list. You can also unpack a tuple by putting multiple variables to the left of an assignment statement:
>>> value, success = top(myStack)

Loops

The principle loops in Python are the while and for loops. The break statement provides a structured way to exit a loop and the continue statement provides a structured way to skip the rest of the current iteration of a loop.

While loops are just like C:

while celcius < 100:
  print "fahrenheit = ", 9.0/5*celcius + 32
  celcius += 1 
While Python supports C's "+=" notation, it does not support C's "++" notation.

For loops are much different than in C. In C they are used as counting loops. In Python they are used to iterate over collections of values, such as lists. You can obtain the functionality of a counting loop in Python by iterating over a list produced by the range function. The range function takes a begin and end value, and produces a list of values starting with the begin value and ending 1 short of the end value. For example, range(1,6) returns the list [1,2,3,4,5]. A simple loop for summing the numbers from 1 to n would be:

sum = 0
for i in range(1,n+1):  # remember that range ends 1 short of the end value
  sum += i

One nice thing about Python's syntax for a for loop is that it looks a great deal like the syntax used by textbooks to present pseudo-code for an algorithm. For example, a depth-first search of a graph starts from a node and visits each of its neighbors in a depth-first fashion. If we assume that each node contains a visited field and a neighbors field, which is a list of neighbors, then a depth-first search procedure can be written succinctly in Python as follows:

def dfs(node):
    node.visited = True  
    for neighbor in node.neighbors:
	if neighbor.visited == False:
	    dfs(neighbor)
There is no way that you could write such an succinct and elegant procedure for depth-first search in C.

Else Clauses on Loops

Loop statements may have an else clause attached to them that is executed if either a list is exhausted in the case of a for loop or a condition becomes false in the case of a while loop. The else clause is not executed if a break causes a premature exit from the loop. Combining a break statement with an else clause allows one to avoid the inelegant code often associated with setting and testing the value of a flag variable. For example, suppose I want to write a naive procedure that determines whether a variable is a prime number or not. If the number is prime, the procedure should print that the number is prime. If the number is not prime, the procedure should print the number's factors. In C, the procedure might look as follows:

int is_prime(int n) {
    int i, prime_flag;

    prime_flag = 1;
    for (i = 2; (i < n) && (prime_flag == 1); i++) {
        if ((n % i) == 0) {
	    printf("%d equals %d * %d\n", n, i, n / i);
	    prime_flag = 0;
	}
    }
    if (prime_flag == 1) 
        printf("%d is a prime number\n", n);
}
In Python the function could be written more succinctly and elegantly as:
def is_prime(n):
    for i in range(2, n):  # range produces a list with the numbers [2, 3, ..., n-1]
        if n % i == 0:
	    print ("{0} equals {1} * {2}".format(n, i, n//i))
	    break
    else:
        print (n, 'is a prime number')

Sorting a List

You can sort a list using either the built-in function sorted or a list object's sort method. sorted builds and returns a new list while sort modifies the original list. By default both functions sort a list in ascending order. For example:

>>> list = [30, 40, 49, 15, 40, 80]
>>> sortedList = sorted(list)
>>> sortedList
[15, 30, 40, 40, 49, 80]
>>> list
[30, 40, 49, 15, 40, 80]  # note that the original list does not change
>>> list.sort()
>>> list
[15, 30, 40, 40, 49, 80]  # now the original list has changed
You can cause a list to be sorted in descending order by using the reverse keyword argument:
>>> list.sort(reverse=True)
>>> list
[80, 49, 40, 40, 30, 15]
If the types to be compared are compound types, such as a user-defined class or a tuple (recall that tuples are immutable lists), you have a couple of options for specifying a comparison function:

  1. Use the key keyword argument to specify a function that takes the type as an argument and returns the value of the element you want compared. For example, if you have a class called Student and the comparison should be on a field called name, then you could write:
    >>> from operator import attrgetter
    >>> studentList.sort(key=attrgetter('name'))
    
    and if you had tuples with two elements and wanted to use the second element as the key, you could write:
    >>> from operator import itemgetter
    >>> tuplesList = [('smiley', 70), ('joey', 55), ('sarah', 70), ('mary', 60)]
    >>> tuplesList.sort(key=itemgetter(1))
    [('joey', 55), ('mary', 60), ('smiley', 70), ('sarah', 70)]
    
    attrgetter and itemgetter are functions defined in the operator module that retrieve the specified attribute value from a class or the value at the specified index location from a tuple. More complicated functions can be used as well, but you should refer to the Python reference manual for details.

    This is a good time to point out that both sort and sortedprovide stable sorts, which means that if two values are equal, they will appear in the same order in the sorted list as they appeared in the original list. Hence since the 'smiley' tuple appeared before the 'sarah' tuple in the original list, it also appears before the 'sarah' tuple in the sorted list.

  2. If you want to use both a primary and a secondary key, and you are sorting class objects, you can define the __lt__ method for a class. For example, to sort a set of students by grade and then by name, one might write:
    class Student:
      def __init__(self, name, grade):
        self.name = name
        self.grade = grade
      def __lt__(self, otherStudent):
        return self.grade < otherStudent.grade \   # \ allows you to continue a line
          or (self.grade == otherStudent.grade and self.name < otherStudent.name)
    
    If you need to specify a primary and secondary key for some other type of compound type, such as a tuple, you will need to look at examples on the web, as they are relatively kludgy to specify.
Historically the list sort method took an optional comparison function as a parameter. However, as of version 3.0, the sort method no longer takes such a parameter, and instead you control sorting using the function keyword argument.

Object-Oriented Programming

Classes were introduced earlier, but we treated them primarily as structs. As in C++ or Java, classes in Python may contain methods. These methods may either be defined when the class is created, or defined elsewhere and then assigned to the class object as a function pointer. Here are the two different ways to create methods:

class node:
  name = 'brad'
  visited = False
  def printName(self):
    print(self.name)

def printVisited(self):
  print ("visited =", self.visited)

>>> node.printer = printVisited
>>> x = node()
>>> x.printer()
visited = False
Note that as with __init__, the first argument must be a pointer to the instance object. Also note that Python treats class and instance objects differently, so the function printVisited is considered a function object in node but a method object in x. If I tried to type:
>>> node.printer()
I would get an error message. However,
>>> node.printer(x)
works just fine.

Inheritance

You can define one or more superclasses for a new class by including the superclasses in parentheses:

class nameNode(node):   # nameNode is subclass of node
  pass                  # pass is a no-op for when I don't want to do anything

class nameNode(node, student):   # nameNode is subclass of both node and student
  pass
Name conflicts between superclasses are handled by traversing the super class list in sequential order and returning the first name found. Hence if visited is defined in both node and student, then node's visited will be returned if we ask for nameNode's visited, and visited is not stored locally in nameNode.

Printing a Class

Ordinarily you will get a pointer reference if you try to print a class:

>>> x = node()
>>> print(x)
<__main__.node object at 0x1255750>
If you would like a pretty-printed representation of the class, you can override the __str__ method and have it return a string:
class node:
  name='brad'
  visited=False
  neighbors=[]
  def __str__(self):
    stringRepresentation = "name: " + self.name + "\n";
    stringRepresentation += "visited: " + str(self.visited) + "\n";
    stringRepresentation += "neighbors:\n";
    for node in self.neighbors:
      stringRepresentation += "\t" + str(node) + "\n";
    return stringRepresentation;

>>> x = node()
>>> x.name='smiley'
>>> print(x)
name: smiley
visited: False
neighbors:

Files and Standard I/O

You have already seen the simple functions input for reading a single value from stdin and print for writing values to stdout. Python defines three file objects named sys.stdin, sys.stdout, and sys.stderr that allow you to access the conventional standard i/o streams. In order to manipulate these file objects, you need to read on and learn about file objects.

Python implements file objects using C's stdio package. The following table shows the most commonly used commands for file objects:

OperationMeaning
open("filename")Open a file for reading and return a pointer to a file object
open("filename", "r|w|a")Open a file with the given accessor and return a pointer to a file object. The meaning of the accessor options are:

  • r: open a file for reading
  • w: open a file for writing and clobber any previous file contents
  • a: open a file and append the writes to any previous file contents
f.readline()read a single line of input. The line will be terminated with a '\n' character. Returns None when end of file is reached.
f.readlines()return all of the file as a list. All lines will be terminated with a '\n' character.
f.write(s)write the string s to the file. You must include the '\n' character if you want a newline.
f.writelines(list)write a list of strings to the file. You must terminate each string with a '\n' character if you want newlines.
f.close()close the file

You can strip newline characters using the string module's strip command. Here is an example of reading a line of input and stripping the newline character:

>>> my_file = open('/azure/homes/bvz/temp', 'r')
>>> line = my_file.readline()
>>> line
'Dear Dr. Tripathi, \012'
>>> 
>>> import string
>>> line = string.strip(line)
>>> line
'Dear Dr. Tripathi,'

You now should know how to read input from stdin:

>>> a = sys.stdin.readline()
reads the next line of standard input into a.


String Operations and Regular Expressions

Once a string has been read into a variable, fields may be extracted from the string using either Python's string package or its regular expression package. If the fields are separated by whitespace (spaces, tabs, and newline characters), then one can use the split method in the string module to extract fields. For example:

>>> line = '   Dear Dr.   Tripathi,\n'
>>> line.split()
['Dear', 'Dr.', 'Tripathi,']
Notice that split ignores leading and trailing whitespace and automatically strips newline characters.

You can also easily replace one substring with another substring using the replace method:

>>> line = 'brad is the professor and brad is also the exam grader'
>>> line.replace("brad","bvz")
'bvz is the professor and bvz is also the exam grader'
If you want to do more sophisticated splitting, such as splitting comma separated fields, or you want to type check strings, or you want to extract information from a file, then you will need to use Python's regular expressions, which are much like Perl regular expressions. Regular expressions allow you to construct patterns that you want matched against a string. Python's reference manual on regular expressions provides an excellent description of its regular expression syntax. You should take a moment to read it before continuing here.

To use regular expressions, first import the regular expression module:

>>> import re
Next you will need to create the pattern which you wish to search for in your string. For example, suppose I want to type check a string to determine whether it is a date of the form mm/dd/yy, where the month and day may be one or two digits. You could write either:
success = re.match('\d{1,2}/\d{1,2}/\d{2}', dateString)
   or
success = re.search('\d{1,2}/\d{1,2}/\d{2}', dateString)
The only difference between the two commands is that match requires the string to match the pattern starting at the beginning of the string, whereas search will search through the entire string to find a substring that matches the pattern. Both commands will return a match object if they succeed and None if they fail. When used in a boolean expression, None evaluates to false.

You will often want to extract data from the substring that matches the pattern, and you can do so by placing parentheses around the parts of the pattern where you want to capture data. You can then access the data by calling the match object's group method and passing it the parameters 1 through n, where 1 through n represent each of the n parenthesized groups. The index 0 will return the entire substring that matches the pattern. For example:

>>> success = re.search('(\d{1,2})/(\d{1,2})/(\d{2})', "2/28/64")
>>> success.group(0)
'2/28/64'
>>> success.group(1)
'2'
>>> success.group(2)
'28'
>>> success.group(3)
'64'
Notice that I did not want to capture the slashes (/) and so I did not place parentheses around them.

I can also use regular expressions to split a string into substrings based on a pattern, and have the substrings returned as a list:

>>> re.split("\s+", "brad vander     zanden")
['brad', 'vander', 'zanden']
I used "\s+" to soak up one or more white spaces between fields in the string. The split command provides a much easier way to separate an input string into its constituent fields:
line = sys.stdin.readline()
fields = re.split("\s+", line)
A final convenient thing you can do with regular expressions is to substitute one pattern for another in a string. For example, suppose I want to substitute all occurrences of "h1" with "h2" in a string. I could write:
>>> re.sub("h1", "h2", "<h1>brad the man</h1> and <h1>joe</h1>")
'<h2>brad the man</h2> and <h2>joe</h2>'
Of course I would not only change the tags but any occurrences of h1 in the content as well, but you get the picture.

You can also use parentheses to match data and then alter the data. For example, to convert a name of the form "firstname lastname" into a name of the form "lastname, firstname", one could write:

>>> re.sub(r'(\w+)\s+(.+)', r'\2, \1', "brad vander zanden")
>>> 'vander zanden, brad'
Notice that the patterns had to be prefixed with the letter 'r'. If you fail to do this, you get garbage:
>>> re.sub('(\w+)\s+(.+)', '\2, \1', "brad vander zanden")
'\x02, \x01'