Comparison of de novo metagenomic assembly tools
Paulin, Niklas (2019-05-24)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Julkaisun pysyvä osoite on:
The field of metagenomics involves studying the composition and function of microbial communities via isolated DNA from a variety of environments like soil, water and the human gut. Since the current generation sequencing instruments produce millions of short read fragments (75-300 bp), while bacterial genomes typically range from 1 to 7 Mbp, an issue of recovering meaningful information from the fragments arises. To help overcome the issue, a number of de novo sequence assembly software have been published, which assembles the short fragments into contigs. This thesis aims to provide a comprehensive evaluation of five different assemblers (DISCO, Faucet, IDBA-UD, MEGAHIT and metaSPAdes) using a previously published benchmarking dataset and an assembly evaluation tool (metaQUAST). The resulting metaQUAST report together with some practical insights like assembly time and documentation was evaluated, determining the strengths and limitations of each tool. The best performers were MEGAHIT and metaSPAdes, having the fastest assembly times and longest contigs, while Faucet had the worst performance in almost every evaluation category. IDBA-UD and DISCO were unable to finish the assembly of the dataset due to technical difficulties. In conclusion, MEGAHIT would be the first recommendation for an assembler to use due to the fast assembly times. metaSPAdes is worth using if contig length is of importance, while Faucet is only recommended for the storage space deprived.