Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data

dc.contributor.authorWang Ning
dc.contributor.authorLysenkov Vladislav
dc.contributor.authorOrte Katri
dc.contributor.authorKairisto Veli
dc.contributor.authorAakko Juhani
dc.contributor.authorKhan Sofia
dc.contributor.authorElo Laura L
dc.contributor.organizationfi=InFLAMES Lippulaiva|en=InFLAMES Flagship|
dc.contributor.organizationfi=Turun biotiedekeskus|en=Turku Bioscience Centre|
dc.contributor.organizationfi=lääketieteellinen tiedekunta|en=Faculty of Medicine|
dc.contributor.organizationfi=tyks, vsshp|en=tyks, varha|
dc.contributor.organization-code1.2.246.10.2458963.20.13290506867
dc.contributor.organization-code1.2.246.10.2458963.20.18586209670
dc.contributor.organization-code1.2.246.10.2458963.20.68445910604
dc.contributor.organization-code2609200
dc.contributor.organization-code2609201
dc.converis.publication-id175018398
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/175018398
dc.date.accessioned2025-08-28T00:31:25Z
dc.date.available2025-08-28T00:31:25Z
dc.description.abstract<p>Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.<br></p><p>Author summary<br></p><p>The development of next generation sequencing (NGS) technologies and computational algorithms enabled the large scale, simultaneous detection of a wide range of genetic variants, such as single nucleotide variants as well as insertions and deletions (indels), which may confer potential clinical significance. Recently, many studies have been conducted to evaluate variant calling tools for indel calling. However, the optimal indel size range for different variant calling tools remains unclear. A good benchmarking dataset for indel calling evaluation should contain biologically representative high-confident indels with a wide size range and preferably come from various sequencing settings. In this article, we created a semi-simulated whole genome sequencing dataset where the sequencing data were computationally generated. The indels in the semi-simulated genome were incorporated from a real human sample to represent biologically realistic indels and to avoid the inclusion of variants due to potential technical sequencing errors. Furthermore, we used three real-world NGS datasets generated by whole genome or targeted sequencing to further evaluate our candidate tools. Our results demonstrated that variant calling tools vary greatly in calling different sizes of indels. Deletion calling and insertion calling also showed differences among the tools. The sequencing settings in coverage and read length also had a great impact on indel calling. Our results suggest that the accuracy of indel calling was dependent on the combination of a variant calling tool, indel size range, and sequencing settings.<br></p>
dc.identifier.eissn1553-734X
dc.identifier.jour-issn1553-7358
dc.identifier.olddbid205870
dc.identifier.oldhandle10024/188897
dc.identifier.urihttps://www.utupub.fi/handle/11111/35553
dc.identifier.urlhttps://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009269
dc.identifier.urnURN:NBN:fi-fe2022081153821
dc.language.isoen
dc.okm.affiliatedauthorWang, Ning
dc.okm.affiliatedauthorLysenkov, Vladislav
dc.okm.affiliatedauthorKhan, Sofia
dc.okm.affiliatedauthorKairisto, Veli
dc.okm.affiliatedauthorAakko, Juhani
dc.okm.affiliatedauthorElo, Laura
dc.okm.affiliatedauthorDataimport, tyks, vsshp
dc.okm.discipline1184 Genetics, developmental biology, physiologyen_GB
dc.okm.discipline318 Medical biotechnologyen_GB
dc.okm.discipline1184 Genetiikka, kehitysbiologia, fysiologiafi_FI
dc.okm.discipline318 Lääketieteen bioteknologiafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherPUBLIC LIBRARY SCIENCE
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.articlenumbere1009269
dc.relation.doi10.1371/journal.pcbi.1009269
dc.relation.ispartofjournalPLoS Computational Biology
dc.relation.volume18
dc.source.identifierhttps://www.utupub.fi/handle/10024/188897
dc.titleTool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data
dc.year.issued2022

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
WangEtAl2022ToolEvaluationForTheDetection.pdf
Size:
4.08 MB
Format:
Adobe Portable Document Format