Perl: Web Stuff


These notes discuss two topics:

  1. How to get your cgi scripts to run for labs and common errors that you may make in trying to get them to run, and
  2. How to keep track of state information during a session with a user. A session is loosely defined as one or more web interactions with a browser that have a common goal, such as completing a survey or an e-commerce transaction.


Getting CGI Scripts to Run at UTK

Here are some detailed directions to use for storing and testing your Perl .cgi scripts:

  1. Create a cgi-bin subdirectory in your www-home directory
  2. Put your perl .cgi scripts in the cgi-bin directory. Make sure they end with the extension .cgi rather than .pl.
  3. Make the scripts world readable and executable. The easiest way to do that is to use "chmod 0755 filename.cgi" where filename is the filename of your .cgi script.
  4. Make sure the chain of directories leading to your cgi scripts are world readable and writeable, including your home directory, your www-home directory, and your cgi-bin directory.
  5. The URL for reaching your scripts should be "http://web.eecs.utk.edu/~your_user_id/cgi-bin/filename.cgi". You will put this URL in the action attribute of your form element.
  6. You may also need to access your initial web page (i.e., icecream.html), by typing in web.eecs.utk.edu/ and the path to your icecream.html file.

Debugging your CGI Scripts

Here are a number of frequently encountered errors that can prevent your .cgi scripts from working properly:

  1. forgetting to put #!/usr/bin/perl at the top of the .cgi file: The server does not magically know how to run your program. In particular, a .cgi file could just as easily be a php file as a perl file so it cannot even guess what interpreter to run based on the file extension. If you do not put the above line at the top of your program, then our server will treat your program as a shell script and you will get a server error.
  2. forgetting to call the header, start_html, and end_html functions: if you do not call these functions when you create your dynamic web page, then the server will not know how to interpret the text file it receives and it will generate a server error.
  3. permissions not set properly on root (i.e., ~username), www-home, or cgi-bin: It is not enough to make your cgi scripts world readable and executable. You must make sure that the chain of directories that are followed to get to your cgi scripts are world readable and executable. This means your top-level directory, www-home, and cgi-bin directories must be world readable and executable.
  4. not using a .cgi extension. Our servers won't accept .pl or .perl extensions.
  5. not putting your cgi scripts in ~username/www-home/cgi-bin. Putting your cgi scripts anywhere else will cause a server error.
  6. not using web.eecs.utk.edu as your URL (www does not work)
  7. syntax errors: This one is a surprisingly common cause of server errors. The browser will not tell you if it is a simple syntax error in your program. You can check this one by running your cgi script from the command line using "perl filename.cgi".

Maintaining State Information in Web Interactions

We've seen how Perl can be used to handle simple CGI forms. More complicated forms may span several pages and require the maintenance of state information. For example, a survey might take several pages, with a Next button being used to advance to each page of the survey. As another example, an e-commerce site might allow you to add items to a basket and then take you through several forms during the checkout process.

The problem that multi-page forms pose is that the http protocal is not designed with the notion of a session and hence it is not possible to keep a script running as a user moves through a multi-page form. Instead http is designed so that each page interaction starts a new script. Even if the same script is called by a form, it will be a new invocation of the script, rather than an old invocation. The problem then is for the set of scripts associated with a particular multi-page form to maintain state information.

There are both client-side and server-side approaches to saving this state information. The easiest client-side approaches involve either hidden fields or cookies. The easiest server-side approaches involve maintaining a session id, storing state information in a file or database, and using the session id to access this information.

Client-Side State Management

We have already seen how hidden fields can encode information on a form. The basic idea is that information is cumulatively built up and stored in hidden fields on the form that the user is currently interacting with. When a script is activated it can access the hidden fields and restore any state information it needs.

For example, suppose at an e-commerce site that a user starts adding items to a shopping cart. Each item that is added to the cart can be concatenated to the end of a string, along with its quantity. For example, if a user has selected 2 tubes of suntan lotion with stock code 368 and 1 red polo shirt with stock code 5831, a script could create the following hidden field and insert it into subsequent forms:

<input type="hidden" name="cart" value="368 2 5831 1" /> A perl script can easily break this string into an array and then initialize a hash table with the array elements to determine what the user ordered.

Hidden values have a number of disadvantages:

  1. The hidden values can be easily changed by a malicious user. The user can simply store the page source, modify the hidden value fields, and then re-load the html file. Thus a user could change the number of polo shirts ordered from 1 to 100 without changing the price that is charged.

  2. If the information to be saved is large, then the page could take a while to load.

  3. If the user goes to a static part of the web-site then the hidden information will get lost.

A second approach involves using cookies. A cookie is essentially a name-value pair that gets stored on the user's computer and that can be accessed by specified scripts. A cookie can be created using a Perl CGI function named cookie. For example:

$cart = cookie( -name => "cart",
                -value => "368 2 5831 1" );
The parameters are named parameters and hence can be presented in any order to the function.

You can now transmit the cookie to the browser using the header function:

print header(-cookie => $cart);
You must transmit all cookies to the browser before transmitting any html markup to the browser. In other words, create and transmit all your cookies before calling start_html.

You can transmit multiple cookies using an anonymous array:

print header(-cookie => [$cart, $prices]);
A script can also retrieve the cookie using the cookie function:
$cart_value = cookie("cart");  # returns the value of cart
The cookie name may be omitted, in which case the cookie function returns all the cookies that have been set by scripts on this server:
@cookie_names = cookie();    # returns the names of all cookies set by this server
When cookies are created in the manner just shown, they are accessable by any script stored on the server, not just the script that created the cookie. Hence, a script associated with one form can store information in a cookie and a script associated with another form can subsequently retrieve the cookie.

Cookies created in the manner just shown also exist for the life of the browser session and are then deleted when the user quits the browser. To extend the life of a cookie across browser sessions, one can provide the -expires parameter to the cookie function:

$cart = cookie( -name => "cart",
                -value => "368 2 5831 1",
		-expires => "+7d" );  # create a cookie that lasts a week
The -expires parameter is a request to the browser, not an absolute command. The browser can choose to delete the cookie for any number of reasons, such as limited cookie space or a clearing of the cookie cache by the user. Options for the expire parameter include the following:

Time PeriodExampleMeaning
Seconds+30sSave for 30 seconds
Minutes+30mSave for 30 minutes
Hours+30hSave for 30 hours
Days+30dSave for 30 days
Months+30mSave for 30 months
Years+30ySave for 30 years
Specific TimeMonday, 03-Feb-2008 06:00:00 GMT Format: Day, DD-Mon-YYYY HH:MM:SS GMT

You can also restrict the scripts on the server that have access to the cookie by specifying the -path parameter to the cookie function. For example:

# restrict the cookie to scripts residing in the /books directory
$cart = cookie( -name => "cart",
                -value => "368 2 5831 1",
		-path => '/books' ); 
# restrict the cookie to this particular script. It does not restrict access
# however to only this invocation of the script. Any invocation of this
# script will have access to the cookie
$cart = cookie( ...
                -path => script_name() ); 
The path is taken from the top-level cgi-bin directory. For example, if the -path parameter is '/books' and if my cgi directory is ~bvz/www-home/cgi-bin/, then only scripts stored in ~bvz/www-home/cgi-bin/books can access the cookie.


An Example

The web-page cookie-demo gives you an opportunity to create a simple cookie. Enter a color and then press the "Submit Form" button. A script called cookie1.cgi creates a cookie and then dynamically generates a web-page that gives information about the cookie. If you press the Press Me button a script called cookie2.cgi retrieves the cookie and prints its value. The two scripts are shown below:


#!/usr/bin/perl # cookie1.cgi -- create the cookie use CGI ':standard', '-debug'; $color = param("color"); $color_cookie = cookie( -name => 'color', -value => $color, -expires => "+4h"); # transmit all cookies to the page before sending any data to the page print header(-cookie=>$color_cookie); # now start sending data to the page print start_html( -title => "Form 2" ); # The <<NAME; idiom allows me to transmit a multi-line text string without # having to put quotes around each line print<<START_FORM2; <b>You created a cookie called $color_cookie</b> <form method="post" action="books/cookie2.cgi"> <input type="submit" value="Press Me" /> <input type="hidden" name="repeat" value="yes" /> </form> START_FORM2
#!/usr/bin/perl # cookie2.cgi -- use the cookie use CGI ':standard', '-debug'; print header(); print start_html( -title => "Form 3" ); # retrieve the cookie $color = cookie("color"); print<<START_FORM3; <b>Your favorite color is $color</b> START_FORM3 print end_html(); The cookie created by cookie1.cgi and stored on my computer looks as follows (the cookie is stored using xml markup, which we will discuss later in this course): <dict> <key>Created</key> <real>223495191.341405</real> <key>Domain</key> <string>www.cs.utk.edu</string> <key>Expires</key> <date>2008-01-31T21:59:51Z</date> <key>Name</key> <string>color</string> <key>Path</key> <string>/</string> <key>Value</key> <string>yellow</string> </dict>

Sizing Up Cookies

Cookies have the advantage of requiring less time to load a web page because some of the information the web page requires is already stored on the client's computer. Unfortunately cookies still have a number of disadvantages including:

  1. They are typically stored in a plain text file where a user can edit them, thus compromising the integrity of the cookie's data.
  2. The browser can erase them at any time, even if the server has requested that they be retained for a specific amount of time.
  3. Many users disable them because of privacy issues. Cookies can be used to see what sites of a web-site have been explored by a user, and many users view the collection of this information as a violation of their privacy.

The obvious approach would be to store the state information in a file. Unfortunately, browsers do not


Server-Side State Management

Server-side state management involves assigning a browser a session id, storing the session information in a file or database, and using that session id as a key to access the information. The session id will typically be stored on the client side in either a cookie, or, if cookies are disabled, using a fairly complicated technique known as URL encoding or URL re-writing. URL encoding is beyond the scope of this course but typically involves embedding a session id with each link in pages that are served to the user's browser. In other words, the URL for each link will have a session id associated with it. When the user clicks the link, the URL is sent to the server and the server can extract the session id. Many companies have developed tools that make URL re-writing easier to perform, but it is a poor second choice to using cookies.


High-Level Support for Sessions

These notes have provided a brief introduction to the basic concepts used in maintaining state information during a session. Rather than implementing these concepts from scratch, it may be easier to use one of the high-level Perl modules that have been developed over the years to deal with session variables, cookies, or hidden data. If you look at the Comprehensive Perl Archive Network (CPAN) site, you will find different modules that may accomplish your intended task.