Disney's Paradise
Pier Hotel

April 28, 2012

(Download Poster, 695KB)

             Disney's Paradise Pier Hotel

to be held in conjunction with

Twelfth SIAM International Conference on Data Mining (SDM 2012)

Topics of interest | Registration | Submission Requirements | Important Dates
Program | Program Committee | Organizational Committee | Sponsors | Slides | Special Issue

History of the Workshop

This is the tenth in the series of Text Mining workshops held in conjunction with SDM. Previous ones have taken place in 2001, 2002, 2003, 2006, 2007, 2008, and 2009, and 2010, and at the most recent workshop (2011) in Mesa, AZ, 25 authors representing industry, academia and national research laboratories from 3 different countries submitted a total of 9 papers. After careful review, 7 papers were selected for publication and presentation. In addition, SAS and the Center for Intelligent Systems and Machine Learning (CISML) at UT-Knoxville sponsored the workshop and provided funds for student travel expenses.

Books published from these Text Mining Workshops...

General Topics

The proliferation of digital computing devices and their use in communication has resulted in an increased demand for systems and algorithms capable of mining textual data. Thus, the development of techniques for mining unstructured, semi-structured, and fully structured textual data has become quite important in both academia and industry. As a result, this Workshop tracks new developments in the field of Text Mining - the application of techniques of machine learning in conjunction with natural language processing, information extraction and algebraic/mathematical approaches to computational information retrieval. Issues addressed range from the development of new learning approaches to the parallelization of existing algorithms. The goal of this workshop is to provide a venue for researchers to share initial approaches and preliminary results of recent research in Text Mining. Through the careful selection and review of submitted workshop papers, we hope to provide a suitable selection of topics that will both generate interest and provide insight into the state of the field of Text Mining.

Special Topics - Text Mining with Email (Enron), Blogs, Tweets, and VAST 2008/2009/2010/2011 Contest Data

Because of the continued interest generated from the availability of the Enron data set of 1.3 million email messages (see Enron Email Dataset) and its versatility in terms of potential research topics (link analysis, pattern matching), researchers are encouraged to submit papers to this workshop. In addition, the text-based datasets of news events and scenario definition used in the IEEE Symposium on Visual Analytics Science and Technology (VAST) 2008 and 2009 Contests is an interesting corpus for research in topic detection/tracking, role playing, and scenario analysis (see VAST 2008 , VAST 2009 , VAST 2010, and VAST 2011, contests for more details on those datasets). Text classification and clustering models for social media repositories such as Twitter and Facebook are also encouraged.
Other Specific Topics of Interest Include:
    Algorithms and Models

  • Bayesian Models
  • Concept Decomposition
  • Orthogonal Decomposition
  • Probabilistic Models
  • Vector Space Models
  • Latent Semantic Indexing
  • Graph-based Models
  • Text Streaming Models
  • Clustering
  • Factor Analysis
  • Visualization Techniques
  • Metadata Generation
  • Information Extraction
  • Text Classification
  • Text Purification
  • Text Segmentation
  • Text Summarization
  • Query Structures
  • Trend Detection
  • Distributed Storage and Retrieval


Attendees are required to register for SDM 2012 so that no separate registration is needed for this workshop.
A one-day registration for the conference is available. Workshop attendees do not have to register at the complete conference rate. Click here for more details.

Submission Requirements

To submit a paper, upload your paper in PDF format (Papers should be printable on 8.5 × 11 paper only and be roughly 10 pages in length using a 11pt font in two-column font with 1 inch margins). Please click here to download on SIAM LaTeX macros (soda2e.all) you can use to format your two-column paper.

Click here to access the MyReview system for uploading abstracts and manuscripts.
Note: You must create a MyReview account for uploading your files. In the Authors section you will find the instructions:

1. Use the abstract submission interface to provide the main information on your paper. You will be given an id/password which must later be used to access the system during the following steps, so save the login information message that you will receive from the system.

2. Once an abstract has been submitted, you can upload your paper.

To guarantee consideration, manuscripts must be received by January 13, 2012. Submission of work in progress is also encouraged.

Important Dates

Papers Due: January 13, 2012 Deadline passed.

Notifications sent: February 3, 2012. Deadline passed..

Camera ready (final papers) due to workshop: February 10, 2012 Deadline passed.

Keynote speaker: Dr. Malu Castellanos, Information Analytics Lab, HP Laboratories, Palo Alto, CA
Title of Presentation:

Tapping Social Media for Sentiments with Live Customer Intelligence (LCI)


The explosion of Web opinion data that Web 2.0 and its increasingly popular social sites like Twitter, Facebook, blogs and review sites have brought about, has made essential the need for automatic tools to analyze and understand sentiments toward different topics. This has fueled the emerging field known as sentiment analysis whose goal is to translate the vagaries of human emotion into hard data. Live Customer Intelligence (LCI) is a system that taps into what is being said to understand the sentiment with the particular ability of doing so in near real-time. LCI integrates novel algorithms for sentiment analysis and a configurable dashboard with different kinds of charts including dynamic ones that change as new data is ingested. LCI has been researched and prototyped at HP Labs in close interaction with business divisions and a few selected customers. In this talk I give an overview of LCI, focusing in particular on challenging issues and illustrating its capabilities with selected use cases.


Dr. Malu Castellanos is a senior researcher in the Information Analytics Lab at Hewlett-Packard Laboratories in Palo Alto, CA, USA. Since 1998 she has been applying data management and data analytics technologies to develop novel solutions to different kinds of business related problems. She received a B.S. in Computer Engineering at the National University of Mexico and a Ph.D. in Computer Science from the Polytechnic University of Catalunya. Prior to joining Hewlett-Packard she was on the faculty at the Information Systems Department of the Polytechnic University of Catalunya. She has more than 60 publications in international conferences, journals and book chapters and has served in numerous PC committees and journal review boards. She has participated in the organization of prestigious international conferences and workshops in different areas of data management occupying different chairing roles including being General Chair for ICDE 2008. Her current interests are new technologies or methods to gain insigh ts from big data, real-time business intelligence, text analytics, automatic database tuning, business process intelligence and data interoperability related technologies. She is a member of the Executive Committee for the IEEE technical committee of data engineering (TCDE).

Program in PDF format (Posted April 25)

A special issue of the online journal Algorithms published by MPDI will be devoted to the accepted papers of this years's Text Mining 2012 Workshop.
Presentation Slides: Azzopardi (pdf), Koessler (pdf), Mahapatra (pdf), Rankel (pdf), Sapozhnikov (pdf), Skillicorn (pptx)
Sponsor: SAS Institute Inc. of Cary, NC
Program Committee

Co-Chairs: Michael W. Berry, University of Tennessee and Jacob Kogan, University of Maryland, Baltimore County

Loulwah AlSumait, Kuwait University
Brett Bader, Digital Globe
Malu Castellanos, Hewlett-Packard Laboratories
Efstratios Gallopoulos, University of Patras, Greece
Wilfried Gansterer, University of Vienna
Efim Gendler, iboogie.tv
April Kontostathis, Ursinus University

Choudur Lakshminarayan, Hewlett-Packard Laboratories
Alan Ratner, Northop Gruman
Andrea Tagarelli, University of Calabria, Italy
Dvora Toledano-Kitai, Ort Braude, Israel
Judith Vogel, Stockton College
Zeev Volkovich, Ort Braude College, Israel

Organizational Committee

Michael W. Berry
Department of Electrical Engineering & Computer Science
Min H. Kao Building, Suite 401
1520 Middle Drive
University of Tennessee
Knoxville, TN 37996
Phone: (865) 974-3838
Fax:     (865) 974-4404
berry AT eecs DOT utk DOT edu

Jacob Kogan
Department of Mathematics and Statistics
University of Maryland, Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3297
Fax:     (410) 455-1066
kogan AT math DOT umbc DOT edu

Last modified on March 29, 2012