CS 365

Home Page

CS 365 - Programming Languages
Spring Semester 2005, Section 31252

LBEA Software Specification

Our company Little Brother Email Analysis (LBEA) specializes in monitoring email activity to identify non-work related overuse of facilities. It also allows upper management a vehicle to understand what are the issues, problems, and concerns that are monopolizing an organization's culture over a period of time. The product is not designed to catch a user for using company facilities for personal use - it's our belief that such a product would be counterproductive in terms of employee morale and efficiency. Our company has created a software environment that can classify large volumes of email messages into meaningful clusters and identify which specific emails are mapped to those clusters. A sample of what search terms might be can be found in Example A and Example C.

However, one of our current problems is that in order to go from word lists to identifying which emails contain those words is a cumbersome and time consuming task. At a minimum (Level I) we are looking for a programming team to create a tool that allows us navigate (or link) between words and the documents. The tool must be written in Java and compatible across platforms. The user should be able to type in a list of up to 10 words. The search for emails using these terms is understood to be done with the Boolean OR operation. That is, all emails containing any subset or all of the terms should be returned. The list of emails returned would be ranked by decreasing frequency of all terms found (multiple instances of terms count).

The user should also be presented with hyperlinks to all emails returned so that a click on any email in the list brings up a window/pop-up containing the full text of the email (including header). All cluster terms (originally supplied by the user) should be highlighted in the email selected. At any time, the user should be able to save the return list of ranked emails to a textfile (for subsequent processing in any spreadsheet or word processing system). All fields in the output textfile should be tab-separated. Finally, the user would be prompted to save the current ranked email list before any exit.

Level II Specifications

Our software management system has already identfied certain emails that are associated with certain clusters of words. A sample is available in Example B. For this more advanced specification (Level II), this specific list of emails provided with the word cluster should be identified/highlighted from within the potentially larger return list of all emails containing words from the cluster. Emails from the input list (such as Example B) not appearing in the return list (from the Level I specification) should be indicated also. Note that the records in Example B contain the directory path for each email without the normal forward-slash (/) characters and final period (.) for the filename.

Level III Specifications

A final more advanced specification would facilitate pruning of email return list. By this we mean that the user should be able to invoke either the boolean NOT or AND operator with another search term so that the emails returned satisfy the boolean constraint (i.e., use the additional search term or do not contain the extra search term).

Although LBEA has set minimum requirements (being able to display the documents) it is open-minded towards any alternative approach that achieves the same goals.
  • Example A (sample word clusters from March 2001)

  • Example B (sample email clusters from March 2001)
  • Example C (another set of word clusters from August 2001)

Evaluation Criteria
  • Spec 1 (Ranked Return List) - 20 points
  • Spec 1 (Term Highlighting) - 20 points
  • Spec 1 (Email Hyperlinks) - 20 points
  • Spec 1 (Save Ranked Email List) - 20 points
  • Spec 1 (Help facility) - 10 points
  • Spec 1 (Ease of use ) - 10 points
  • Spec 2 (Previous Email highlighting) - 20 points EC
  • Spec 3 (Boolean AND/NOT searching) - 20 points EC