Scripts and Utilities -- Python lecture

Brad Vander Zanden

Directory: /home/cs494/notes/Python

This file: http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Python/lecture.html

Lecture links: http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Python/links.html

Python manuals: http://tecfa.unige.ch/guides/python (This lecture is taken from the Python tutorial manual)

Email questions and answers

Why is Python Great?

Python is an interpreted language that looks a lot like C. Why is Python so great? Well:

It can be used as a scripting language, like Awk or Perl. However, Python has more general data types than either Awk or Perl and hence is applicable to a much broader problem domain. Python also reads like a real programming language, unlike either Awk or Perl. Consequently, someone with C experience can look at a Python program and figure out what it is doing, unlike either Awk or Perl.
Python is interpreted, which can save you considerable time during program development because compiling and linking are unnecessary. Python also has a command-line interpreter so you can try out Python language features and perform quick computations.
Python programs can be developed much more rapidly than C programs, are more concise than C programs, and are easier to read than C programs, because:
- Python contains very high-level data types, such as lists and hash tables (called dictionaries) that allow complicated sequences in C to be expressed in single statements in Python.
- Python groups statements using indentation rather than begin/end brackets. Hence, you do not encounter those mysterious C errors in which you've indented the code in the way you intend it to run, but because of badly placed brackets, the code runs in some other way.
- Python does not require any variable or argument declarations. It also does not require any type declarations.
- Python has some neat syntactic tricks that allow you to do incredibly useful things to lists and strings.

Getting Started With Python

To try out Python, type python at the Unix prompt. You will see something like:

Python 1.4 (Jun 30 1997)  [GCC 2.7.2.1]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>>

Learning about Python

The developer of Python, Guido van Rossum, has written an excellent introductory tutorial to Python. Rather than try to regurgitate it in the lecture notes, I will just point you to the appropriate link, http://tecfa.unige.ch/guides/python/tut.

In these lecture notes, I will concentrate on some of the really neat things that I've found I can do in Python:

Using Lists as Arrays

Lists are probably Python's most useful compound data structure. Python does not provide a general array data structure (it has specialized arrays for integers) but a list can be subscripted just like an array so I always use lists to simulate arrays. Remember that while using lists to simulate arrays may not be the most efficient thing in the world, I am interested in rapid prototyping and program development, not efficiency.

Lists are written as a list of comma-separated elements enclosed in square brackets:

>>> a = [10, 20, 'brad', 'knoxville']
>>> a
[10, 20, 'brad', 'knoxville']
>>> a[1]
20
>>> a[-1]   # a negative subscript starts from the end of the list
'knoxville'

A really neat thing about lists is that I can reference a sublist using a slicing mechanism. A slice is written as two indices separated by a colon: >>> a[1:3] [20, 'brad'] If you omit the first index it defaults to 0. If you omit the second index, it defaults to the length of the string:

>>> a[:2]
[10, 20]
>>> a[2:]
['brad', 'knoxville']

This slice notation makes it easier to pass portions of arrays to functions. If I am writing the partition routine for quicksort and I want to pass the portion of the array from i to j (including i but not including j), I can write:

partition(a[i:j])

When the array arrives in partition, I can start my subscripts at 0. The end of the array can be accessed using the len operator. Hence I do not need to pass in the subscripts i or j. the ability to handle a subarray as a normal array with subscripts beginning at 0 makes the partition code easier to write and easier to read.

Simplified Boolean Expressions

In C, one is forever using and and or statements to construct boolean expressions. Often these and's and or's obfuscate relatively simple expressions. For example, if I want to test if a grade is between 90 and 100, I have to write:

if ((grade > 90) && (grade <= 100))

Python allows the programmer to string together equality and inequality statements. Consequently, I can write the above expression more elegantly and concisely in Pyton as:

if (90 < grade <= 100)

I would argue that the above conditional is far easier to read and understand than the cooresponding C conditional.

Python can also make or conditions easier to write and understand. Suppose I want to determine if the variable x is equal to the integers 10, 20, or 30. In C I would have to write:

if ((x == 10) || (x == 20) || (x == 30))

Python provides the convenient operator in that can be used with lists or tuples. Hence, I can write the equivalent conditional in Python as:

if x in [10, 20, 30]

Again, I would argue that the intent of the latter conditional is more evident than the intent of the previous conditional.

Iterating over Sets and Lists

A lot of algorithms that you see in textbooks require that you process every element in a set or list. Python's for statement iterates over the items in a sequence and hence makes it easy to mimic the set and list iterations one finds in a textbook. For example, a depth-first search of a graph starts from a node and visits each of its neighbors in a depth-first fashion. If we assume that each node contains a visited field and a neighbors field, which is a list of neighbors, then a depth-first search procedure can be written succinctly in Python as follows:

true = 1
false = 0

def dfs(node):
    node.visited = true
    for neighbor in node.neighbors:
	if neighbor.visited == false:
	    dfs(neighbor)

There is no way that you could write such an succinct and elegant procedure for depth-first search in C.

Mimicing Structs

Python does not have the C equivalent of struct, but it does support classes and a class can be used to mimic a struct. Unlike C, you can associate a function with the class (the function is actually called a method) that initializes an instance of a class when an instance is allocated. For example, suppose we want to define the node data structure for the depth-first search described above. In C the declaration might look as follows:

struct node {
    int visited;
    struct neighbor_list *neighbors;
};

In Python, one would define a node class. One could just write:

class node:
    pass

You do not have to define the variables in a node when you declare the node class since variables can be dynamically added to a node. Hence one could write:

x = node()
x.visited = false
x.neighbors = []

The above code illustrates how one often wants to initialize variables in a record to certain values. Python allows you to define a procedure called __init__ that is called when an instance of a class is allocated. The __init__ method must be declared inside the class and must take at least one argument, which is a reference to the instance being allocated. Typically this argument is called self (for those of you who know C++, self is equivalent to the this pointer). For example, one could write:

class node:
    def __init__(self):
        self.visited = false
	self.neighbors = []

Since __init__ is automatically invoked when an instance of node is created, one could now allocated an instance of node and initialize it using the statement:

x = node()

Else Clauses on Loops

Loop statements may have an else clause attached to them that is executed if either a list is exhausted in the case of a for loop or a condition becomes false in the case of a while loop. The else clause is not executed if a break causes a premature exit from the loop. Combining a break statement with an else clause allows one to avoid the inelegant code often associated with setting and testing the value of a flag variable. For example, suppose I want to write a naive procedure that determines whether a variable is a prime number or not. If the number is prime, the procedure should print that the number is prime. If the number is not prime, the procedure should print the number's factors. In C, the procedure might look as follows:

int is_prime(int n) {
    int i, prime_flag;

    prime_flag = 1;
    for (i = 2; (i < n) && (prime_flag == 1); i++) {
        if ((n % x) == 0) {
	    printf("%d equals %d * %d\n", n, x, n / x);
	    prime_flag = 0;
	}
    }
    if (prime_flag == 1) 
        printf("%d is a prime number\n", n);
}

In Python the function could be written more succinctly and elegantly as:

def is_prime(n):
    for i in range(2, n):
        if n % i == 0:
	    print n, 'equals', i, '*', n/i
	    break
    else:
        print n, 'is a prime number'

Reading from a File or Stdin

Python does not have the C equivalent of scanf. However, it does provide a way to read lines from a file or stdin, and then to extract information from these strings using regular expressions. Consequently, Python has a more powerful input and scanning mechanism than C, but it is somewhat more difficult to learn.

Files

Python supports a file object which is described in the Python reference manual in section 2.1.7. File objects are implemented using C's stdio package. Perhaps the most useful command is readline(), which returns the next line in a file, as a string. For example:

>>> my_file = open('/azure/homes/bvz/temp', 'r')
>>> line = my_file.readline()
>>> line
'Dear Dr. Tripathi, \012'

Note that the newline character is included in the string.

Standard Input

The stdin file object may be found in sys.stdin. For example:

>>> a = sys.stdin.readline()

reads the next line of standard input into a.

Regular Expressions and String Operations

Once a string has been read into a variable, fields may be extracted from the string using either Python's string package or its regular expression package. If the fields are separated by blanks, commas, etc., then one can simply use the find method in the string module to extract fields (see section 4.1 of the Python reference manual). For example:

>>> import string
>>> blank_position = string.find(line, ' ')
>>> field1 = line[0 : blank_position]
>>> field1
'Dear'
>>> old_blank_position = blank_position
>>> blank_position = string.find(line, ' ', blank_position+1)
>>> field2 = line[old_blank_position+1 : blank_position]
>>> field2
'Dr.'

If your input is more complicated, you will need to use Python's regular expressions, which, unfortunately, are like Emacs regular expressions rather than Unix regular expressions. You will need to read sections 4.2 and 4.3 of the Python reference manual and know something about Emacs to use this facility.