Comparison of de novo metagenomic assembly tools

dc.contributor.authorPaulin, Niklas
dc.contributor.departmentfi=Tulevaisuuden teknologioiden laitos|en=Department of Future Technologies|
dc.contributor.facultyfi=Luonnontieteiden ja tekniikan tiedekunta|en=Faculty of Science and Engineering|
dc.contributor.studysubjectfi=Bioinformatics|en=Bioinformatics|
dc.date.accessioned2019-06-17T21:00:42Z
dc.date.available2019-06-17T21:00:42Z
dc.date.issued2019-05-24
dc.description.abstractThe field of metagenomics involves studying the composition and function of microbial communities via isolated DNA from a variety of environments like soil, water and the human gut. Since the current generation sequencing instruments produce millions of short read fragments (75-300 bp), while bacterial genomes typically range from 1 to 7 Mbp, an issue of recovering meaningful information from the fragments arises. To help overcome the issue, a number of de novo sequence assembly software have been published, which assembles the short fragments into contigs. This thesis aims to provide a comprehensive evaluation of five different assemblers (DISCO, Faucet, IDBA-UD, MEGAHIT and metaSPAdes) using a previously published benchmarking dataset and an assembly evaluation tool (metaQUAST). The resulting metaQUAST report together with some practical insights like assembly time and documentation was evaluated, determining the strengths and limitations of each tool. The best performers were MEGAHIT and metaSPAdes, having the fastest assembly times and longest contigs, while Faucet had the worst performance in almost every evaluation category. IDBA-UD and DISCO were unable to finish the assembly of the dataset due to technical difficulties. In conclusion, MEGAHIT would be the first recommendation for an assembler to use due to the fast assembly times. metaSPAdes is worth using if contig length is of importance, while Faucet is only recommended for the storage space deprived.
dc.format.extent66
dc.identifier.olddbid164794
dc.identifier.oldhandle10024/147953
dc.identifier.urihttps://www.utupub.fi/handle/11111/21784
dc.identifier.urnURN:NBN:fi-fe2019061720807
dc.language.isoeng
dc.rightsfi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|
dc.rights.accessrightssuljettu
dc.source.identifierhttps://www.utupub.fi/handle/10024/147953
dc.subjectAssembly, Algorithms, Computational Biology, Metagenomics, Sequencing
dc.titleComparison of de novo metagenomic assembly tools
dc.type.ontasotfi=Pro gradu -tutkielma|en=Master's thesis|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Paulin_Niklas_Thesis.pdf
Size:
812.79 KB
Format:
Adobe Portable Document Format