Statistical Methods for Conservation and Alignment Quality in Proteins

dc.contributorMatemaattis-luonnontieteellinen tiedekunta / Faculty of Mathematics and Natural Sciences, Department of Statistics-
dc.contributor.authorAhola, Virpi
dc.contributor.departmentfi=Matematiikan ja tilastotieteen laitos|en=Department of Mathematics and Statistics|
dc.contributor.facultyfi=Matemaattis-luonnontieteellinen tiedekunta|en=Faculty of Mathematics and Natural Sciences|-
dc.date.accessioned2008-10-24T05:30:55Z
dc.date.available2008-10-24T05:30:55Z
dc.date.issued2008-11-07
dc.description.abstractConstruction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.en
dc.description.accessibilityfeatureei tietoa saavutettavuudesta
dc.description.notificationSiirretty Doriasta
dc.format.contentfulltext
dc.identifierISBN 978-951-29-3726-4en
dc.identifier.olddbid44072
dc.identifier.oldhandle10024/42522
dc.identifier.urihttps://www.utupub.fi/handle/11111/28327
dc.identifier.urnURN:ISBN:978-951-29-3726-4
dc.language.isoengeng
dc.publisherfi=Turun yliopisto|en=University of Turku|
dc.publisherAnnales Universitatis Turkuensis AII 228en
dc.relation.ispartofseriesTurun yliopiston julkaisuja. Sarja AII, Biologica - Geographica – Geologica
dc.relation.issn2343-3183
dc.relation.numberinseries228-
dc.source.identifierhttps://www.utupub.fi/handle/10024/42522
dc.titleStatistical Methods for Conservation and Alignment Quality in Proteinsen
dc.type.ontasotfi=Artikkeliväitöskirja|en=Doctoral dissertation (article-based)|en

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
AII228.pdf
Size:
539.61 KB
Format:
Adobe Portable Document Format