[LtoR] Brett Bader, Dave Engel, and Pu Wang
[LtoR] Andrey Puretskiy, Wenyin Tang, and April Kontostathis
John Ascuaga's Nugget
May 2, 2009
[LtoR] Ziqiu Su, Andreas Janecek, and Eric Jiang
to be held in conjunction with
History of the Workshop
This is the seventh in the series of Text Mining workshops held in conjunction with SDM. Previous ones have taken place in 2001, 2002, 2003, 2006, 2007, and 2008, and last year in Atlanta, 44 authors representing industry, academia and national research laboratories from 7 different countries submitted a total of 18 papers. After careful review, 10 papers were selected for publication and presentation. In addition, SAS and BeliefNetworks Inc. sponsored the workshop and provided funds for student travel expenses. Photos shown above were taken at 2009 SDM Text Mining Workshop.
proliferation of digital computing devices and their use in
communication has resulted in an increased demand for systems and
algorithms capable of mining textual data. Thus, the development of
techniques for mining unstructured, semi-structured, and fully
structured textual data has become quite important in both academia and
industry. As a result, this Workshop will survey the emerging field of
Text Mining - the application of techniques of machine learning in
conjunction with natural language processing, information extraction
and algebraic/mathematical approaches to computational information
retrieval. Many issues are being addressed in this field ranging from
the development of new learning approaches to the parallelization of
existing algorithms. The goal of this workshop is to provide a venue
for researchers to share initial approaches and preliminary results of
recent research in Text Mining. Through the careful selection and
review of submitted workshop papers, we hope to provide a suitable
selection of topics that will both generate interest and provide
insight into the state of the field of Text Mining.
Special Topics - Text Mining with the Enron Data Set and VAST 2007/2008 Contest Data
Because of the continued interest generated from the availability of the Enron data set of 1.3 million email messages (see Enron Email Dataset) and its versatility in terms of potential research topics (link analysis, pattern matching), researchers are encouraged to submit papers to this workshop. In addition, the text-based datasets of news events and scenario definition used in the IEEE Symposium on Visual Analytics Science and Technology (VAST) 2007 and 2008 Contests is an interesting corpus for research in topic detection/tracking, role playing, and scenario analysis (see VAST 2007 or VAST 2008 contests for more details on those datasets). Researchers whose work is more focused on social networking models of the Enron and VAST-2007/2008 datasets should contact the organizers of the SDM Link Analysis (SLA) Workshop. With the authors' permission, a paper may be re-assigned to the SLA workshop (especially if the Program Committee makes the recommendation based on the content of the paper).
Other Specific Topics of Interest Include:
required to register for SDM 2009 so that no separate registration is
needed for this workshop.
A one-day registration for the conference is available. Workshop attendees do not have to register at the complete conference rate.
Click here for more details.
submit a paper, upload your paper in PDF format (Papers should be printable
on 8.5 × 11 paper only and be roughly 10 pages in length using a
11pt font in two-column font with 1 inch margins)
by accessing the review system via
In the Authors section you will find the instructions:
1. Use the abstract submission interface to provide the main information
on your paper. You will be given an id/password which must later be used
to access the system during the following steps, so save the login information message that you will receive from the system.
2. Once an abstract has been submitted, you can upload your paper.
To guarantee consideration, manuscripts must be received by January 16, 2009. Submission of work in progress is also encouraged.
January 16, 2009
deadline has passed
February 6, 2009
notifications have been made.
ready: Final Papers due to workshop: February 13, 2009
Title of Presentation: Algebraic Techniques for Multilingual Document Clustering Presentation Slides (pdf)
Abstract: Text documents in multiple languages pose a problem if one wants to cluster them by topic. One approach of translating everything to a common language is not feasible when dealing with a large corpus or many languages. This presentation will show a variety of novel algebraic methods for efficiently clustering multilingual text documents. The methods use a multilingual parallel corpus as a 'Rosetta Stone' from which algorithmic variations of Latent Semantic Analysis (LSA) are able to learn concepts in a multilingual term space. New documents are projected into this concept space to produce language-independent feature vectors for subsequent use in similarity calculations or machine learning applications. Our numerical experiments show that the new methods have better performance than LSA and can be used in machine learning tasks.
Biography: Brett W. Bader received his B.S. and M.S. degrees in chemical engineering from the Massachusetts Institute of Technology. Subsequently, he worked in the chemical industry, developing mathematical models of chemical plants for online, real-time optimization. He received his Ph.D. in computer science from the University of Colorado at Boulder, studying higher-order methods for optimization and solving systems of nonlinear equations. In 2003, Brett received the John von Neumann Research Fellowship at Sandia National Laboratories, where he now develops algorithms for multi-way data analysis and machine learning for informatics applications in networks and text.
Michael W. Berry, University of Tennessee
and Jacob Kogan, University of Maryland, Baltimore County
Roger Bilisoly, Central Connecticut State University
Daniel Boley, University of Minnesota
Murray Browne, Turner Broadcasting Systems, Inc.
Malu Castellanos, Hewlett-Packard Laboratories
Anirban Chatterjee, The Pennsylvania State University
Carlotta Domeniconi, George Mason University
Kyle Gallivan, Florida State University
Efstratios Gallopoulos, University of Patras, Greece
Wilfried Gansterer, University of Vienna
Efim Gendler, iboogie.tv
Peg Howland, Utah State University
April Kontostathis, Ursinus University
Choudur Lakshminarayan, Hewlett-Packard Laboratories
Bill Pottenger, DIMACS, Rutgers
Padma Raghavan, Penn State University
Andrea Tagarelli, University of Calabria, Italy
Judith Vogel, Stockton College
Zeev Volkovich, Ort Braude College, Israel
Yu Xia, University of Birmingham, UK
Michael W. Berry
Department of Electrical Engineering & Computer Science
203 Claxton Complex
University of Tennessee
Knoxville, TN 37996-3450
Phone: (865) 974-3838
Fax: (865) 974-4404
berry AT eecs DOT utk DOT edu
Department of Mathematics and Statistics
University of Maryland, Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3297
Fax: (410) 455-1066
kogan AT math DOT umbc DOT edu