Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization

dc.contributorMatemaattis-luonnontieteellinen tiedekunta / Faculty of Mathematics and Natural Sciences, Department of Information Technology-
dc.contributor.authorTuikkala, Johannes
dc.contributor.departmentfi=Tulevaisuuden teknologioiden laitos|en=Department of Future Technologies|
dc.contributor.facultyfi=Matemaattis-luonnontieteellinen tiedekunta|en=Faculty of Mathematics and Natural Sciences|-
dc.date.accessioned2014-10-31T07:49:21Z
dc.date.available2014-10-31T07:49:21Z
dc.date.issued2014-11-20
dc.description.abstractThe amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.-
dc.description.accessibilityfeatureei tietoa saavutettavuudesta
dc.description.notificationSiirretty Doriasta
dc.format.contentfulltext
dc.identifierISBN 978-952-12-3126-1-
dc.identifier.olddbid114235
dc.identifier.oldhandle10024/100993
dc.identifier.urihttps://www.utupub.fi/handle/11111/28447
dc.identifier.urnURN:ISBN:978-952-12-3126-1-
dc.language.isoeng-
dc.publisherTurku Centre for Computer Science
dc.relation.ispartofseriesTUCS Dissertations
dc.relation.issn1239-1883
dc.relation.numberinseries185-
dc.source.identifierhttps://www.utupub.fi/handle/10024/100993
dc.titleAlgorithmic Techniques in Gene Expression Processing. From Imputation to Visualization-
dc.type.ontasotfi=Artikkeliväitöskirja|en=Doctoral dissertation (article-based)|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
TUCSDissertationD185.pdf
Size:
1.66 MB
Format:
Adobe Portable Document Format