Dirichlet process mixture models for single-cell RNA-seq clustering

Adossa Nigatu A.; Rytkönen Kalle T; Elo Laura

Dirichlet process mixture models for single-cell RNA-seq clustering

dc.contributor.author	Adossa Nigatu A.
dc.contributor.author	Rytkönen Kalle T
dc.contributor.author	Elo Laura
dc.contributor.organization	fi=biolääketieteen laitos\|en=Institute of Biomedicine\|
dc.contributor.organization-code	2609201
dc.converis.publication-id	175721725
dc.converis.url	https://research.utu.fi/converis/portal/Publication/175721725
dc.date.accessioned	2022-10-28T13:17:17Z
dc.date.available	2022-10-28T13:17:17Z
dc.description.abstract	<p>Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.<br></p>
dc.identifier.olddbid	181057
dc.identifier.oldhandle	10024/164151
dc.identifier.uri	https://www.utupub.fi/handle/11111/36933
dc.identifier.url	https://doi.org/10.1242/bio.059001
dc.identifier.urn	URN:NBN:fi-fe2022081154533
dc.language.iso	en
dc.okm.affiliatedauthor	Adossa, Nigatu
dc.okm.affiliatedauthor	Rytkönen, Kalle
dc.okm.affiliatedauthor	Elo, Laura
dc.okm.discipline	1182 Biochemistry, cell and molecular biology	en_GB
dc.okm.discipline	1182 Biokemia, solu- ja molekyylibiologia	fi_FI
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	The Company of Biologists Ltd.
dc.publisher.country	United Kingdom	en_GB
dc.publisher.country	Britannia	fi_FI
dc.publisher.country-code	GB
dc.relation.articlenumber	bio059001
dc.relation.doi	10.1242/bio.059001
dc.relation.ispartofjournal	Biology Open
dc.relation.issue	4
dc.relation.volume	11
dc.source.identifier	https://www.utupub.fi/handle/10024/164151
dc.title	Dirichlet process mixture models for single-cell RNA-seq clustering
dc.year.issued	2022

Tiedostot

Näytetään 1 - 1 / 1

Name:: bio059001.pdf
Size:: 1.5 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet