Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization

Tuikkala, Johannes

Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization

dc.contributor	Matemaattis-luonnontieteellinen tiedekunta / Faculty of Mathematics and Natural Sciences, Department of Information Technology	-
dc.contributor.author	Tuikkala, Johannes
dc.contributor.department	fi=Tulevaisuuden teknologioiden laitos\|en=Department of Future Technologies\|
dc.contributor.faculty	fi=Matemaattis-luonnontieteellinen tiedekunta\|en=Faculty of Mathematics and Natural Sciences\|	-
dc.date.accessioned	2014-10-31T07:49:21Z
dc.date.available	2014-10-31T07:49:21Z
dc.date.issued	2014-11-20
dc.description.abstract	The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.	-
dc.description.accessibilityfeature	ei tietoa saavutettavuudesta
dc.description.notification	Siirretty Doriasta
dc.format.content	fulltext
dc.identifier	ISBN 978-952-12-3126-1	-
dc.identifier.olddbid	114235
dc.identifier.oldhandle	10024/100993
dc.identifier.uri	https://www.utupub.fi/handle/11111/28447
dc.identifier.urn	URN:ISBN:978-952-12-3126-1	-
dc.language.iso	eng	-
dc.publisher	Turku Centre for Computer Science
dc.relation.ispartofseries	TUCS Dissertations
dc.relation.issn	1239-1883
dc.relation.numberinseries	185	-
dc.source.identifier	https://www.utupub.fi/handle/10024/100993
dc.title	Algorithmic Techniques in Gene Expression Processing. From Imputation to Visualization	-
dc.type.ontasot	fi=Artikkeliväitöskirja\|en=Doctoral dissertation (article-based)\|

Tiedostot

Näytetään 1 - 1 / 1

Name:: TUCSDissertationD185.pdf
Size:: 1.66 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Väitöskirjat