Deep Learning and its Application in Bioinformatics: Case Study on Protein-peptide Binding Prediction

Friday, December 1, 2017 - 2:20pm to 3:10pm
Swearingen room 2A14

Speaker: Dr. Jianjun Hu

Abstract: Deep learning has led to tremendous progress in computer vision, speech recognition, and natural language processing. It has now crossed the boundary and has brought breakthroughs also in the area of bioinformatics. One interesting problem is developing accurate models for predicting peptide binding affinities to protein receptors such as MHC(Major Histocompatibility complex), which can shed understanding to adverse drug reaction and autoimmune diseases and lead to more effective protein therapy and design of vaccines.

We proposed a deep convolutional neural network (CNN) based peptide binding prediction algorithm for achieving substantially higher accuracy as tested in MHC-I peptide binding affinity prediction. Our model takes raw binding peptide sequences and affinity scores or binding labels as input without needing any human-designed features. The back-propagation training algorithm allows it to learn nonlinear relationships among the amino acid positions of the peptides. It also can naturally handle the peptide length variation, MHC polymorphasim, and unbalanced training samples of MHC proteins with different alleles via a simple amino acid padding scheme. Our experiments showed that DeepMHC can achieve the state-of-the-art prediction performance on most of the IEDB benchmark datasets with a single model architecture and without using any consensus or composite ensemble classifier models.

Bio: I joined CSE department of the University of South Carolina in August 2007. I am now working on integrative functional genomics and especially integrative analysis of microarray data. I am also interested in motif discovery for understanding gene expression mechanisms involved in diseases. I got my Ph.D. in Computer Science in the area of machine learning and particularly evolutionary computation at the Genetic Algorithm Research and Application Group (GARAGe) of Michigan State University. My dissertation focuses on sustainable evolutionary computation algorithms and automated computational synthesis. I have worked on the DNA motif discovery problem as Postdoc at Kihara Bioinformatics Lab, Purdue University and microarray analysis at the Computational Molecular Biology Division at the University of Southern California (another USC).