Protein Sorting Motif Analysis and Protein Subcellular Localization

This ongoing project aims to develop algorithms for identifying protein sorting motifs and protein subcellular localization prediction

metaP, a heuristic ensemble algorithm for protein subcellular localization prediction

LRensemble, a machine learning based ensemble algorithm for protein subcellular localization prediction

BayesMotif, a de novo algorithm for identifying anchor based sorting signal motifs

SortMotifDB, a database of protein sorting motifs, supporting motif search, retrieval, motif model comparison

Computational prediction of protein binding residues

We have developed two web servers for computational prediction of protein binding residues: 1)HemeBind, which predicts heme binding residues by integrating structural and sequence information; 2) HemeNet, which makes heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures

HemeBind Web server

HemeNet Web server

iMISS: Integrative Missing Value Estimation for Microarray Data

iMISS (the integrative Missing Value Estimation method) is an integrative algorithm framework for improving microarray missing value estimation by incorporating information from multiple reference microarray datasets. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets into consideration. To determine whether the given reference data sets are sufficiently informative for integration, we use a submatrix imputation approach. Our experiments showed that iMISS can significantly and consistently improve the accuracy of the state-of-the-art Local Least Square (LLS) imputation algorithm by up to 15% improvement in our benchmark tests.

Download iMISS Here

J. Hu, Haifeng Li, Michael S. Waterman, Xianghong Jasmine Zhou. Integrative Missing Value Estimation for Microarray Data. BMC Bioinformatics, 2006

EMD: Ensemble Algorithms for Motif discover

We proposed a novel clustering-based ensemble algorithm named EMD for de novo motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from E. coli RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences.

Supplementary material

J. Hu, Yifeng David Yang and Daisuke Kihara, "EMD: an Ensemble Algorithm for discovering regulatory motifs in DNA sequences", BMC Bioinformatics, 7:342. 2006. Click to download PDF File
J. Hu
, Bin Li, and Daisuke Kihara, "Limitations and Potentials of Current Motif Discovery Algorithms", Nucleic Acid Research, 33: 4899-4913, 2005

Evograph: Evolving Graphs using Genetic Programming

Evograph uses Genetic Programming to evolve arbitrary types of graphs. Users only need to define the fitness/evaluation function of the graphs and the evolutionary search will try to find an optimum graph. It has been applied to the wireless access point configuration problem. Source code is available for downloading.

Visit the Evograph website

J. Hu, E. Goodman, “Wireless Access Point Configuration by Genetic Programming”, Proc. IEEE Congress on Evolutionary Computation CEC2004

GPBG: Evolving Bond graph Using Genetic Programming

Machines or dynamic systems such as electronic circuits, mechanical vibration absorbers or MEMS devices can be represented by bond graph models. To design a machine is equivalent to design a bond graph model. GPBG is a complete framework for automated evolutionary synthesis of Bond graph models using Genetic. It has been used to evolve analog filter circuits, printer redesign, vibration absorber, MEMS filters, and robust circuits.

Visit the GPBG website

K. Seo, J. Hu, Z. Fan, E. D. Goodman, and R. C. Rosenberg. Automated Design Approaches for Multi-Domain Dynamic Systems Using Bond Graphs and Genetic Programming," The International Journal of Computers, Systems and Signals, vol.3, no.1, pp.55-70, 2002.

Bond Graph C++ Library

We have open sourced our Bond graph C++ package for dynamic system simulation, it can generate state representation model A/B/C/D matrixes.

Download the C++ library here

HFC: Hierarchical Fair Competition EC Framework

Many current Evolutionary Algorithms (EAs) suffer from a tendency to converge prematurely or stagnate without progress for complex problems. Hierarchical Fair Competition (HFC) model is a generic framework for sustainable evolutionary search by transforming the convergent nature of the current EA framework into a non-convergent search process. The significant gain in robustness, scalability and efficiency by HFC, with little additional computing effort, and its tolerance of small population sizes, demonstrates its effectiveness on these problems and shows promise of its potential for improving other existing EAs for difficult problems. A paradigm shift from that of most EAs is proposed: rather than trying to escape from local optima or delay convergence at a local optimum, HFC allows the emergence of new optima continually in a bottom-up manner, maintaining low local selection pressure at all fitness levels, while fostering exploitation of high-fitness individuals through promotion to higher levels.

Visit the HFC homepage

J. Hu, E. Goodman, K.Seo, Z. Fan, R. Rosenberg, "The Hierarchical Fair Competition (HFC) Framework for Sustainable Evolutionary Algorithms", Evolutionary Computation, 13 (2), MIT Press, 2005. Click to download PDF File