April 13, 2002 Workshop
Hyatt Regency, Crystal City
Arlington, Virginia

To be Held in Conjunction with
Second SIAM International Conference on Data Mining (SDM 2002)


Theme Statement

Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. No doubt we have moved beyond the bag of words notion of text documents to exploit not only the patterns but also the structure of term usage. As digital libraries and the World-Wide-Web continue to proliferate the enormous volume of online textual material, effective yet scalable approaches to text mining will be needed. How can we know what a document is about without having to read it? How do we automatically cluster or categorize documents from from diverse sources? What are the best ways to clean text? Can semantic models of text be adequately visualized? These are some of the fundamental yet simple questions we must be able to address.

A one-day workshop on Text Mining is being held in conjunction with SDM 2002 in Arlington, VA (April '02) to bring together researchers from a variety of disciplines to present their current approaches and results in text mining. One of the main themes of the workshop will be on clustering unstructured collections of documents and the challenges therein of high-dimensionality and sparsity.

Topics of interest include:


Attendees are required to register for SDM 2002 so that no separate registration is needed for this workshop.

Submission Requirements

To submit a paper for consideration, send 4 copies of the manuscript to Ms. Peggy Stewart (see address below). Electronic submissions (postscript or PDF versions printable on 8.5 x 11 paper only) are strongly encouraged. To guarantee consideration, manuscripts must be received by December 21, 2001 December 28, 2001, and must be no more than 12 pages excluding figures, tables, and references. A two colum format with with 1 inch margins should be used, and all selected papers will need to be converted to PDF format for online posting (by SIAM).

Submission of work in progress is also encouraged.

Send all submissions to:

Ms. Peggy Stewart
Attn: Text Mining Workshop
Army High Performance Computing Research Center
1100 South Washington Avenue
Minneapolis, MN 55415
Tel: (612) 626-8079
Fax: (612) 626-1596

Special Volume

All papers accepted and received by the March 1, 2002 deadline (see below) will appear in the workshop proceedings which will be bound by SIAM (also posted on their website) and distributed to workshop attendees. In addition to the workshop proceedings (distributed at the conference), selected workshop papers will be published (along with other solicited papers on Text Mining) in a special volume entitled A Comprehensive Survey of Text Mining. Click here for an outline of the proposed volume (PDF file) to be published by Springer-Verlag.

Important Dates

Papers Due:
Dec 21st Extended to Dec 28th Deadline has passed

Notification of Acceptance:
Jan 30th Feb. 8 Deadline has passed

Camera ready:
Feb 22nd Mar. 1 Deadline has passed

Apr 13th

Workshop Schedule

Select either Postscript or PDF formats (updated on April 4, 2002).

Program Committee

Katy Börner, Indiana University
Malu Castellanos, HP Labs, Palo Alto
Chris Ding, Lawrence Berkeley National Lab. (NERSC)
Rick Fierro, California State University at San Marcos
Efim Gendler, iBoogie.tv
Kyle Gallivan, Florida State
Liz Jessup, University of Colorado
Haesun Park, University of Minnesota
Dulce Ponceleon, IBM Almaden
Bill Pottenger, Lehigh University
Padma Raghavan, Pennsylvania State University
Flavio Sartoretto, Univ. of Venezia (Italy)
Malcolm Slaney, IBM Almaden
Marc Teboulle, Tel-Aviv University (Israel)
Layne Watson, Virgina Tech
Jason Wu, Boeing

Organizing Committee

Michael W. Berry
Department of Computer Science
203 Claxton Complex
University of Tennessee
Knoxville, TN 37996-3450
Phone: (865) 974-3838
Fax: (865) 974-4404

Inderjit Dhillon
Department of Computer Science
University of Texas
Austin, TX 78712-1188
Phone: (512) 471-9725
Fax: (512) 471-8885


Jacob Kogan
Department of Mathematics and Statistics
Univ. of Maryland, Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3297
Fax: (410) 455-1066

Justin T. Giles
Department of Computer Science
203 Claxton Complex
University of Tennessee
Knoxville, TN 37996-3450
Phone: (865) 974-4196
Fax: (865) 974-4404


Last modified on Jan. 8, 2002.