Dealing with dimensionality: the application of machine learning to multi-omics data

Tietueen suppeat tiedot

dc.contributor.author	Feldner-Busztin Dylan
dc.contributor.author	Nisantzis Panos F.
dc.contributor.author	Edmunds Shelley J.
dc.contributor.author	Boza Gergely
dc.contributor.author	Racimo Fernando
dc.contributor.author	Gopalakrishnan Shyam
dc.contributor.author	Limborg Morten T.
dc.contributor.author	Lahti Leo
dc.contributor.author	de Polavieja Gonzalo G.
dc.contributor.organization	fi=data-analytiikka\|en=Data-analytiikka\|
dc.contributor.organization-code	1.2.246.10.2458963.20.68940835793
dc.converis.publication-id	178948715
dc.converis.url	https://research.utu.fi/converis/portal/Publication/178948715
dc.date.accessioned	2025-08-28T00:18:46Z
dc.date.available	2025-08-28T00:18:46Z
dc.description.abstract	<p><strong>Motivation:</strong> Machine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets.<br></p><p><strong>Results:</strong> Our main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments.</p><p><strong>Availability and implementation:</strong> All data and processing scripts are available at this GitLab repository: https://gitlab.com/polavieja_lab/ml_multi-omics_review/ or in Zenodo: https://doi.org/10.5281/zenodo.7361807.</p><p><strong>Supplementary information:</strong> Supplementary data are available at <em>Bioinformatics</em> online.</p>
dc.identifier.eissn	1367-4811
dc.identifier.jour-issn	1367-4803
dc.identifier.olddbid	205503
dc.identifier.oldhandle	10024/188530
dc.identifier.uri	https://www.utupub.fi/handle/11111/54907
dc.identifier.url	https://doi.org/10.1093/bioinformatics/btad021
dc.identifier.urn	URN:NBN:fi-fe2023032132634
dc.language.iso	en
dc.okm.affiliatedauthor	Lahti, Leo
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.discipline	113 Tietojenkäsittely ja informaatiotieteet	fi_FI
dc.okm.internationalcopublication	international co-publication
dc.okm.internationality	International publication
dc.okm.type	A2 Scientific Article
dc.publisher	OXFORD UNIV PRESS
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.articlenumber	btad021
dc.relation.doi	10.1093/bioinformatics/btad021
dc.relation.ispartofjournal	Bioinformatics
dc.relation.issue	2
dc.relation.volume	39
dc.source.identifier	https://www.utupub.fi/handle/10024/188530
dc.title	Dealing with dimensionality: the application of machine learning to multi-omics data
dc.year.issued	2023

Tiedostot

Näytetään 1 - 1 / 1

Name:: btad021.pdf
Size:: 4.1 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet