Evaluation of variant calling tools for detecting structural variants using real and simulated genomic datasets
Wang, Ning (2018-09-12)
Evaluation of variant calling tools for detecting structural variants using real and simulated genomic datasets
Wang, Ning
(12.09.2018)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
suljettu
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2018100537611
https://urn.fi/URN:NBN:fi-fe2018100537611
Tiivistelmä
Structural variants are generally defined as DNA variations larger than 50bp. They have
been recognized as the largest source of inter-individual genetic variation and shown to
play an important role in human life. Genomic structural variants consist of various
types and nowadays next generation sequencing makes it possible to screen these
variants. To date, many variant calling tools have been published for this purpose with
different underlying detection algorithms. However, there is a lack of information on
the performance of these variant calling tools when used for calling structural variants.
Therefore, five different state-of-the-art variant calling tools were comprehensively
evaluated in this thesis. The results of the evaluation can help researchers to choose
the best suitable variant calling tool for their specific data types and research
questions.
In summary, in the first three chapters, the biological and computational backgrounds
of DNA sequencing technologies and the previous variant calling tool evaluation
studies are reviewed. The fourth chapter introduces the materials and methods which
were used in this thesis. The results of the variant calling tool evaluation are presented
in the fifth chapter and the discussion and the conclusion are in sixth and seventh
chapters.
In this study, the performances of five open source and widely used variant calling tools,
namely Pindel, ScanIndel, Fermikit, VarDict and VarScan were evaluated. The tools
were evaluated using both real genomic data and simulated genomic data. The
performance of the tools was measured using different metrics such as precision, recall,
detected variant length, running time and similarities in variant calls between the tools.
The results of this thesis indicate that there is no single “multipurpose” tool but instead
different tools are good in detecting specific variant types of a specific size range.
been recognized as the largest source of inter-individual genetic variation and shown to
play an important role in human life. Genomic structural variants consist of various
types and nowadays next generation sequencing makes it possible to screen these
variants. To date, many variant calling tools have been published for this purpose with
different underlying detection algorithms. However, there is a lack of information on
the performance of these variant calling tools when used for calling structural variants.
Therefore, five different state-of-the-art variant calling tools were comprehensively
evaluated in this thesis. The results of the evaluation can help researchers to choose
the best suitable variant calling tool for their specific data types and research
questions.
In summary, in the first three chapters, the biological and computational backgrounds
of DNA sequencing technologies and the previous variant calling tool evaluation
studies are reviewed. The fourth chapter introduces the materials and methods which
were used in this thesis. The results of the variant calling tool evaluation are presented
in the fifth chapter and the discussion and the conclusion are in sixth and seventh
chapters.
In this study, the performances of five open source and widely used variant calling tools,
namely Pindel, ScanIndel, Fermikit, VarDict and VarScan were evaluated. The tools
were evaluated using both real genomic data and simulated genomic data. The
performance of the tools was measured using different metrics such as precision, recall,
detected variant length, running time and similarities in variant calls between the tools.
The results of this thesis indicate that there is no single “multipurpose” tool but instead
different tools are good in detecting specific variant types of a specific size range.