Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Smolander Johannes; Khan Sofia; Singaravelu Kalaimathy; Kauko Leni; Lund Riikka J.; Laiho Asta; L. Elo Laura

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Smolander Johannes; Khan Sofia; Singaravelu Kalaimathy; Kauko Leni; Lund Riikka J.; Laiho Asta; L. Elo Laura

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Smolander Johannes

Khan Sofia

Singaravelu Kalaimathy

Kauko Leni

Lund Riikka J.

Laiho Asta

L. Elo Laura

Katso/Avaa

Publisher's PDF (2.345Mb)

Lataukset:

doi:10.1186/s12864-021-07686-z

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021093048825

Tiivistelmä

Background

Detection
of copy number variations (CNVs) from high-throughput next-generation
whole-genome sequencing (WGS) data has become a widely used research
method during the recent years. However, only a little is known about
the applicability of the developed algorithms to ultra-low-coverage
(0.0005–0.8×) data that is used in various research and clinical
applications, such as digital karyotyping and single-cell CNV detection.

Result

Here,
the performance of six popular read-depth based CNV detection
algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was
studied using ultra-low-coverage WGS data. Real-world array- and
karyotyping kit-based validation were used as a benchmark in the
evaluation. Additionally, ultra-low-coverage WGS data was simulated to
investigate the ability of the algorithms to identify CNVs in the sex
chromosomes and the theoretical minimum coverage at which these tools
can accurately function. Our results suggest that while all the methods
were able to detect large CNVs, many methods were susceptible to
producing false positives when smaller CNVs (< 2 Mbp) were detected.
There was also significant variability in their ability to identify CNVs
in the sex chromosomes. Overall, BIC-seq2 was found to be the best
method in terms of statistical performance. However, its significant
drawback was by far the slowest runtime among the methods (> 3 h)
compared with FREEC (~ 3 min), which we considered the second-best
method.

Conclusions

Our
comparative analysis demonstrates that CNV detection from
ultra-low-coverage WGS data can be a highly accurate method for the
detection of large copy number variations when their length is in
millions of base pairs. These findings facilitate applications that
utilize ultra-low-coverage CNV detection.

Kokoelmat

Rinnakkaistallenteet [29337]