Thursday, April 2, 2015 - 03:00 pm
Swearingen, 3A00 (Dean’s Conference Room
DISSERTATION DEFENSE Department of Computer Science and Engineering, University of South Carolina Candidate: Shuai Jiang Advisor: Dr. Jijun Tang Date: Thursday, April 2, 2015 Time: 3:00pm Place: Swearingen, 3A00 (Dean’s Conference Room) Abstract Genome rearrangements are evolutionary events that shuffle genomic material. Since large-scale rearrangements happen rarely and have dramatic effect on the genomes, the evolutionary history between two genomes usually corresponds to a shortest rearrangement scenario (minimal number of rearrangements) between them. However, there may exist multiple shortest rearrangement scenarios between two genomes, making it difficult to find the true history. Furthermore, there also exist "complex" rearrangement operations such as transpositions (that cut off a segment of a chromosome and insert it into some other place in the genome), whose effect on the shortest rearrangement scenario is still unclear. There exist several rearrangement models that simplify the rearrangement analysis, such as multi-break (also known as k-breaks, k=2,3) and Double-Cut-and-Join, none of which is biologically adequate for analysis of transpositions among other types of genome rearrangements. In the first part of this dissertation, we employ the multi-break rearrangement model to study the shortest rearrangement scenarios between two genomes. We show the appearance of transpositions in the shortest 2-break scenarios and derive the upper and lower bounds of their proportion in such scenarios. The results imply that transpositions may exist in large proportion in shortest 2-break scenarios. Then we prove that the introduction of a relative large weight to transpositions cannot achieve its purpose of bounding the proportion of transpositions in shortest 3-break scenarios. The ability of finding a shortest rearrangement scenario can be used for reconstruction of ancestral genomes from the given genomes, so that the total numbers of rearrangements between ancestral genomes and given genomes is minimal. This problem is known to be NP-complete even in the "simplest" case of three genome, called the genome median problem. Currently, there exist several tools for reconstruction of ancestral genomes, such as Multiple Genome Rearrangements and Ancestors (MGRA). Most of them are limited to genomes on the same set of genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many important biological datasets or degrades the resolution of ancestral reconstructions in others. Furthermore even if the given genomes are linear (i.e., consist only of linear chromosomes), the ancestral genomes reconstructed by existing tools may contain circular chromosomes, which is biologically inadequate. In the second part of this dissertation, we present a next-generation tool MGRA2 that extends MGRA functionality to support gene insertions and deletions, making it applicable for more complex genomic datasets. We also propose an algorithm that uses the existing solution for ancestral genome reconstruction problem to obtain linear ancestral genomes in some optimal way.