On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses

dc.contributor.authorFischer D.
dc.contributor.authorNordhausen K.
dc.contributor.authorOja H.
dc.contributor.organizationfi=matematiikka|en=Mathematics|
dc.contributor.organization-code1.2.246.10.2458963.20.41687507875
dc.converis.publication-id51366315
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/51366315
dc.date.accessioned2022-10-28T13:13:12Z
dc.date.available2022-10-28T13:13:12Z
dc.description.abstract<p>Dimension reduction is often a preliminary step in the analysis of data sets with a large number of variables. Most classical, both supervised and unsupervised, dimension reduction methods such as principal component analysis (PCA), independent component analysis (ICA) or sliced inverse regression (SIR) can be formulated using one, two or several different scatter matrix functionals. Scatter matrices can be seen as different measures of multivariate dispersion and might highlight different features of the data and when compared might reveal interesting structures. Such analysis then searches for a projection onto an interesting (signal) part of the data, and it is also important to know the correct dimension of the signal subspace. These approaches usually make either no model assumptions or work in wide classes of semiparametric models. Theoretical results in the literature are however limited to the case where the sample size exceeds the number of variables which is hardly ever true for data sets encountered in bioinformatics. In this paper, we briefly review the relevant literature and explore if the dimension reduction tools can be used to find relevant and interesting subspaces for small-<em>n</em>-large-<em>p</em> data sets. We illustrate the methods with a microarray dataset of prostate cancer patients and healthy controls.<br /></p>
dc.identifier.eissn2405-8440
dc.identifier.jour-issn2405-8440
dc.identifier.olddbid180572
dc.identifier.oldhandle10024/163666
dc.identifier.urihttps://www.utupub.fi/handle/11111/31885
dc.identifier.urlhttps://doi.org/10.1016/j.heliyon.2020.e05732
dc.identifier.urnURN:NBN:fi-fe2021042821856
dc.language.isoen
dc.okm.affiliatedauthorOja, Hannu
dc.okm.discipline111 Mathematicsen_GB
dc.okm.discipline112 Statistics and probabilityen_GB
dc.okm.discipline111 Matematiikkafi_FI
dc.okm.discipline112 Tilastotiedefi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherElsevier Ltd
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumbere05732
dc.relation.doi10.1016/j.heliyon.2020.e05732
dc.relation.ispartofjournalHeliyon
dc.relation.issue12
dc.relation.volume6
dc.source.identifierhttps://www.utupub.fi/handle/10024/163666
dc.titleOn linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
dc.year.issued2020

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
On_linear_dimension.pdf
Size:
683.33 KB
Format:
Adobe Portable Document Format
Description:
Publisher´s PDF