VarSCAT: A computational tool for sequence context annotations of genomic variants

dc.contributor.authorWang Ning
dc.contributor.authorKhan Sofia
dc.contributor.authorElo Laura L
dc.contributor.organizationfi=InFLAMES Lippulaiva|en=InFLAMES Flagship|
dc.contributor.organizationfi=MediCity|en=MediCity|
dc.contributor.organizationfi=Turun biotiedekeskus|en=Turku Bioscience Centre|
dc.contributor.organization-code1.2.246.10.2458963.20.18586209670
dc.contributor.organization-code1.2.246.10.2458963.20.68445910604
dc.contributor.organization-code2607003
dc.contributor.organization-code2609201
dc.converis.publication-id181158858
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/181158858
dc.date.accessioned2025-08-27T22:49:19Z
dc.date.available2025-08-27T22:49:19Z
dc.description.abstractThe sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from .
dc.identifier.eissn1553-734X
dc.identifier.jour-issn1553-7358
dc.identifier.olddbid202870
dc.identifier.oldhandle10024/185897
dc.identifier.urihttps://www.utupub.fi/handle/11111/50507
dc.identifier.urlhttps://doi.org/10.1371/journal.pcbi.1010727
dc.identifier.urnURN:NBN:fi-fe2025082785877
dc.language.isoen
dc.okm.affiliatedauthorWang, Ning
dc.okm.affiliatedauthorKhan, Sofia
dc.okm.affiliatedauthorElo, Laura
dc.okm.discipline1182 Biochemistry, cell and molecular biologyen_GB
dc.okm.discipline1182 Biokemia, solu- ja molekyylibiologiafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherPUBLIC LIBRARY SCIENCE
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.articlenumbere1010727
dc.relation.doi10.1371/journal.pcbi.1010727
dc.relation.ispartofjournalPLoS Computational Biology
dc.relation.issue8
dc.relation.volume19
dc.source.identifierhttps://www.utupub.fi/handle/10024/185897
dc.titleVarSCAT: A computational tool for sequence context annotations of genomic variants
dc.year.issued2023

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
journal.pcbi.1010727.pdf
Size:
2.2 MB
Format:
Adobe Portable Document Format