Author : Ahmed Al-Qari
Advisor : Dr. John Rose
Proteomics has made major progress in recent years after the sequencing of the genomes of a substantial number of organisms. A typical method for identifying peptides uses a database of peptides identified using tandem mass spectrometry (MS/MS). The profile of accurate mass and elution time (AMT) for peptides that need to be identified will be compared with this database. Restricting the search to those peptides detectable by MS will reduce processing time and more importantly increase accuracy. In addition, there are significant impacts for clinical studies. Proteotypic peptides are those peptides in a protein sequence that are most likely to be confidently observed by current MS-based proteomics methods. There has been rapid improvement in the prediction of proteotypic peptides for AMT studies based on amino acid properties such as amino acid content, polarity, charge and hydrophobicity using a support vector machine (SVM) classification approach. Our goal is to improve proteotypic peptide prediction. We describe the development of a classifier that considers amino acid usage that has achieved a classification sensitivity of 90% and specificity 81% on the Yersinia pestis proteome (using 3-AAU). Using Ordered Amino Acid Usage (AAU) feature, we were able to identify a different set of peptides that was not identified by the 35 peptides features that STEP (Webb-Robertson, 2010) have used. This means that Ordered Amino Acid Usage (AAU) feature could complement other features used by STEP to improve identification accuracy. Building on this success, we used STEP (Webb-Robertson, 2010) 35 amino acids features to complement Ordered Amino Acid Usage (AAU) feature in order to enhance the overall accuracy.