Phylogeny, Ancestral Genome, and Disease Diagnoses Models Constructions using Biological Data

Monday, November 12, 2018 - 12:00pm to 1:00pm
Meeting room 2267, Innovation Center

DISSERTATION DEFENSE
Department of Computer Science and Engineering
University of South Carolina

Author : Bing Feng
Advisor : Dr. Jijun Tang
Date : Nov 12th , 2018
Time : 12:00 pm
Place : Meeting room 2267, Innovation Center

Abstract

Studies of bioinformatics develop methods and software tools to analyze biological data and provide insight of the mechanisms of biological processes. Machine learning techniques have been widely used by researchers for disease prediction, disease diagnosis, and bio-marker identification. Using machine learning algorithms to diagnose diseases has a couple of advantages. Besides solely relying on the doctors’ experiences and stereotyped formulas, researchers could use learning algorithms to analyze sophisticated, high-dimensional and multimodal biomedical data, and construct prediction/classification models to make decisions even when some information was incomplete, unknown, or contradictory. In this study, we first build an automated computational pipeline to reconstruct phylogenies and ancestral genomes for two high-resolution real yeast whole genome datasets. We further compare the results with recent studies and publications show that we reconstruct very accurate and robust phylogenies and ancestors. We also identify and analyze conserved syntenic blocks among reconstructed ancestral genomes and present yeast species.

Next, we analyzed the metabolic level dataset obtained from the positive mass spectrometry of human blood samples. We applied machine learning algorithms and feature selection algorithms to construct diagnosis models of Chronic kidney diseases (CKD). We also identify the most critical metabolite features and study the correlations among the metabolite features and the developments of CKD stages. The selected metabolite features provided insights into CKD early stage diagnosis, pathophysiological mechanisms, CKD treatments and medicine development.

Finally, we use deep learning techniques to build accurate Down Syndrome (DS) prediction/screening models based on the analysis of newly introduced Illumina human genome genotyping array. We proposed a bi-stream convolutional neural network (CNN) architecture with nine layers and two merged CNN models, which took two input chromosome SNP maps in combination. We evaluated and compared the performances of our CNN DS predictions models with conventional machine learning algorithms and single-stream CNN models. We visualized the feature maps and trained filter weights from intermediate layers of our trained CNN model. We further discussed the advantages of our method and the underlying reasons for their performances differences.