Prediction of Phosphorylation Sites from Protein Sequences
Seymen, Nogayhan (2016-06-15)
Aineistoon ei liity tiedostoja.
Phosphorylation is amongst the most crucial and well-studied post-translational modifications. It is involved in multiple cellular processes which makes phosphorylation prediction vital for understanding protein functions. However, wet-lab techniques are labour and time intensive. Thus, computational tools are required for efficiency. This project aims to provide a novel way to predict phosphorylation sites from protein sequences by adding flexibility and Sezerman Grouping amino acid similarity measure to previous methods, as discovering new protein sequences happens at a greater rate than determining protein structures. The predictor – NOPAY - relies on Support Vector Machines (SVMs) for classification. The features include amino acid encoding, amino acid grouping, predicted secondary structure, predicted protein disorder, predicted protein flexibility, solvent accessibility, hydrophobicity and volume. As a result, we have managed to improve phosphorylation prediction accuracy for Homo sapiens by 3% and 6.1% for Mus musculus. Sensitivity at 99% specificity was also increased by 6% for Homo sapiens and for Mus musculus by 5% on independent test sets. In this study, we have managed to increase phosphorylation prediction accuracy for Homo sapiens and Mus musculus. When there is enough data, future versions of the software may also be able to predict other organisms.