Maximal Biclique Enumeration
Background
Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge
for modern systems biology. Bipartite graphs can be useful for representing relationships
across pairs of disparate data types, with the interpretation of these relationships
accomplished through an enumeration of maximal bicliques. Most previously-known techniques
are generally ill-suited to this foundational task, because they are relatively inefficient
and without effective scaling. In recent work, we have devised a powerful new algorithm that
produces all maximal bicliques in a bipartite graph. Unlike most previous approaches, the
new method neither places undue restrictions on its input nor inflates the problem size.
Efficiency is achieved through an innovative exploitation of bipartite graph structure,
and through computational reductions that rapidly eliminate non-maximal candidates from
the search space. An iterative selection of vertices for consideration based on non-decreasing
common neighborhood sizes boosts efficiency and leads to more balanced recursion trees.
Results
The new technique was implemented in C on Ubuntu Linux and compared to previously published
approaches from graph theory and data mining. Formal time and space bounds were derived.
Experiments were performed on both random graphs and graphs constructed from functional
genomics data. It was found that the new method substantially outperforms the best previous
alternatives.
Conclusions
The new method is streamlined, efficient, and particularly well-suited to the study of huge
and diverse biological data. A robust implementation has been incorporated into GeneWeaver,
an online tool for integrating and analyzing functional genomics experiments. The enormous
increase in scalability it provides empowers users to study complex and previously
unassailable gene-set associations between genes and their biological functions in a
hierarchical fashion and on a genome-wide scale. This practical computational resource is
adaptable to almost any applications environment in which bipartite graphs can be used to
model relationships between pairs of heterogeneous entities.
Related Site
GeneWeaver
Citation
Y. Zhang, C. A. Phillips, G. L. Rogers, E. J. Baker, E. J. Chesler and M. A. Langston, "On
Finding Bicliques in Bipartite Graphs: a Novel Algorithm and Its Application to the Integration
of Diverse Biological Data Types," BMC Bioinformatics, accepted for publication.
Software
Available upon request.