KANN: estimation of genetic ancestry profiles by nearest neighbor regression

dc.contributor.authorRiikonen, Juha
dc.contributor.authorKerminen, Sini
dc.contributor.authorHavulinna, Aki
dc.contributor.authorPirinen, Matti
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id516225679
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/516225679
dc.date.accessioned2026-04-24T18:00:12Z
dc.description.abstract<p>State-of-the-art methods for inferring individual-level genetic ancestry are based on statistical models for haplotype data. Unfortunately, these methods are computationally demanding, making them impractical for biobank-scale analyses. In this paper, we describe KANN, an efficient k-nearest neighbor regression method for individual-level ancestry estimation with respect to predefined source populations using only principal components of genetic structure. Contrary to the existing tools that can only use reference samples with discrete source population assignment, KANN enables the use of reference samples with continuous ancestry profiles across multiple source populations. We observe that KANN’s ancestry estimates agree well with the haplotype-based method SOURCEFIND when estimating ancestry profiles across up to 10 Finnish source populations on a dataset of 18 125 Finnish samples from THL Biobank. In the 1000 Genomes Project data containing globally diverse genetic backgrounds, KANN produces highly similar results to the ADMIXTURE software. Based on our results, KANN is a promising tool for ancestry estimation in large-scale genomic studies.<br></p>
dc.identifier.eissn1362-4962
dc.identifier.jour-issn0305-1048
dc.identifier.urihttps://www.utupub.fi/handle/11111/59128
dc.identifier.urlhttps://doi.org/10.1093/nar/gkag209
dc.identifier.urnURN:NBN:fi-fe2026042333068
dc.language.isoen
dc.okm.affiliatedauthorHavulinna, Aki
dc.okm.discipline3111 Biomedicineen_GB
dc.okm.discipline3111 Biolääketieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherOxford University Press (OUP)
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumbergkag209
dc.relation.doi10.1093/nar/gkag209
dc.relation.ispartofjournalNucleic Acids Research
dc.relation.issue5
dc.relation.volume54
dc.titleKANN: estimation of genetic ancestry profiles by nearest neighbor regression
dc.year.issued2026

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
gkag209.pdf
Size:
1.12 MB
Format:
Adobe Portable Document Format