Original URL: http://web.eecs.utk.edu/~mberry/projects/
|Home > Computational Science Projects|
|Latent Semantic Indexing (or LSI) is a concept-based information retrieval model. Terms and documents are both encoded for vector space representation so that documents may be clustered (semantically) near each other yet share no common terms. LSI addresses the two fundamental problems which plague traditional lexical-matching indexing schemes: synonymy and polysemy. Content Analyst Company, LLC owns the original patent to LSI: Computer information retrieval using latent semantic structure U.S. Patent No. 4,839,853, June 13, 1989.||SVDPACK comprises four numerical (iterative) methods for computing the singular value decomposition (SVD) of large sparse matrices using double precision ANSI Fortran-77. A compatible ANSI-C version (SVDPACKC) is also available. SVDPACK and SVDPACKC implement Lanczos and subspace iteration-based methods for determining several of the largest singular triplets for large sparse matrices. The development of SVDPACK was motivated by the need to compute large-rank approximations to sparse term-document matrices from information retrieval applications such as Latent Semantic Indexing (described at the left). SVDPACKC was used in in the InfoMap project developed in the Computational Semantics Laboratory at Stanford University.|
The Integrated Modeling Project (IMP) sponsored by the
Environmental Impacts Program of the USDA Forest Service
is an integrated forest health and productivity
assessment of southern and southeastern forests in relation to
changing climate, air quality, and land use changes. The
primary research focus of Prof. Michael W. Berry and Research
Associate Karen S. Minser (Dept. of Computer Science)
is the development of a problem-solving environment or PSE
which facilitates the horizontal integration of
forest responses to environmental stresses and disturbances
through the use of micro-scale cellular automata.
ICAT: The Interactive Cluster Analysis Toolkit (or ICAT) utilizes the Enhanced Hoshen-Kopelman algorithm to provide a highly adaptable method for cluster analysis. Within the context of diabetic retinopathy, different neighborhood rules implemented within ICAT provide better approaches for classifying retinal features such as neovascularization and exudates. The flexible design of ICAT allows new metrics for characterizing cluster geometry or new neighborhood rules for cluster identification to be easily incorporated.
RSim: A Regional Simulation model (RSim) designed to integrate environmental effects of on-base military training testing as well as off-base development. Effects considered include air and water quality, noise, and habitats for endangered and game species. A risk assessment approach is being used to determine impacts of single and integrated risks. The RSim simulation will eventually be available on the Web and will be used in a gaming mode so that users can explore repercussions of military and land-use decisions. RSim is currently being developed for the region around Fort Benning, Georgia but is broadly applicable. This project is sponsored by the Strategic Environmental Research & Development Program (SERDP) -- an initiative funded by the U.S. Deparments of Energy and Defense and the U.S. Environmental Protection Agency (EPA). A user interface for RSim is under current development.
LUCAS: Land-Use Change Analysis System for the simulation of land-cover changes on a heterogeneous (distributed) computing environment. LUCAS generates new maps of land cover representing the amount of land-cover change so that issues such as biodiversity conservation, assessing the importance of landscape elements to meet conservation goals, and long-term landscape integrity can be addressed.
Encyclopedia of Computer Science and Engineering: Dr. Michael W. Berry is serving as the Applications area editor of the Encyclopedia of Computer Science and Engineering (Wiley Interscience) which is being edited by Prof. Benjamin Wah at the University of Illinois at Urbana-Champaign. Publication anticipated for 2004.
Three-Day Seminar Course on Information Retrieval, Facultad de Matemátics Universidad Autónoma de Yucatán (UADY) Mérida, México, March 10-12,2004.
News Release. All links are powerpoint files (password protected).
Whole Genome Phylogeny:
As whole genome sequences continue to expand in number and
complexity, effective methods for comparing and categorizing both genes
and species represented within extremely large datasets are required.
Current methods have generally utilized incomplete (and likely
insufficient) subsets of the available data even as additional data
becomes available at
a rapid rate. In collaboration with Prof. Gary Stuart at Indiana
State University, an accurate and efficient method for
producing robust gene and species phylogenies using very large whole genome
protein datasets has been developed.
This method relies on multidimensional protein vector
definitions supplied by the singular value decomposition (SVD) of
large sparse data matrices in which each protein is uniquely represented as
vector of overlapping tetrapeptide frequencies. Link above is to
presentation slides shown on March 23 at the
Bioinformatics Summit 2002, and an updated presentation
was made at a
School of Informatics Colloquim on Nov. 14, 2003 (audio/slides).
SGO: Understanding the functional relationship between genes remains to be a major challenge in interpretation of genomic data. Bioinformatics tools to automate extraction and utilization of gene information from the biological databases and the scientific literature are being developed. We present a new software environment called Semantic Gene Organizer © (SGO) which utilizes Latent Semantic Indexing (LSI), a concept-based vector space model, to automatically extract gene relationships from titles and abstracts in MEDLINE citations.
FAUN: We have develop a Web-based bioinformatics tool called Feature Annotation Using Nonnegative matrix factorization (FAUN) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of nonnegative matrix factorization (NMF) for processing gene sets are currently being investigated. FAUN has been tested on several manually constructed gene collections (size ranging from 50 to 800 genes) and has been particularly engineered to analyze several microarray-derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g., MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation. Click here for a video about NIMBioS with Elina Tjioe demonstrating FAUN. This project is supported by the Gene Regulation in Time & Space project (funded by the NIH).
GST Retreat Poster (March 14, 2008, 4.7MB ppt) UT-ORNL-KBRIN Poster (March 28-30, 2008); published in BMC Bioinformatics July 8, 2008
Firemodel: The Grid Computing for Ecological Modeling and Spatial Control of Wildfires project is a National Science Foundation (NSF) funded research project which began in 2005 and concluded in 2008. The project involved several students and postdoctoral fellows who developed several different fire spread models and several different methods to evaluate how spatial control might be utilized to limit the spread of a wildfire. The software simulated a fire starting at a variety of possible burnable locations on a map. The fire would then spread based upon burnable/non-burnable (green/black) areas in the map, in the simplest case, with the possibility of including a local fire load which would affect the magnitude of local burns, as well as the probability of spread. The unique aspect of this project involved the computation for optimal placement of a fire break with the objective of enclosing the fire and sparing as much of the region as possible from burning. The overall goal of the project is to improve the accuracy of responses to fire spread, to develop effective control strategies, and to produce a method that might be useful in training for fire suppression personnel.
Python for Biologists: The intent of this tutorial, created from a COSC 670 course project during Spring Semester 2012, is to enlighten computational biologists with some of the novel features of the python programming language for problem solving. This material is intended to accompany a one day in-person hands-on workshop and serve as a post workshop resource for workshop attendees.