MUFFIN
MUFFIN is a hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis.
If you use MUFFIN in your research, please cite our paper
INDEX
- Introduction
- Figure :
- Installation :
- Test the pipeline
- Manual configuration
- Usage :
- Troubleshooting
- Options
- Complete help and options
- Bibliography
- License
Introduction
MUFFIN aims at being a reproducible pipeline for metagenome assembly of crossed illumina and nanopore reads.
MUFFIN uses the following software
Task | Software | Version | Docker | Image version |
---|---|---|---|---|
QC illumina | fastp | 0.20.0 | LINK | 0.20.0--78a7c63 |
QC ont | automated way to discard shortest reads | |||
filtlong | 0.2.0 | LINK | v0.2.0--afa175e | |
metagenomic composition of ont | sourmash | 2.0.1 | LINK | 2.0.1--6970ddc |
Hybrid assembly | Meta-spades | 3.13.1 | LINK | 3.13.1--2c2a4c0 |
unicycler | 0.4.7 | LINK | 0.4.7-0--c0404e6 | |
Long read assembly | MetaFlye | 2.7 | LINK | 2.7--957a1a1 |
polishing | racon | 1.4.13 | LINK | 1.4.13--bb8a908 |
medaka | 1.0.3 | LINK | 1.0.3--7c62d67 | |
pilon | 1.23 | LINK | 1.23--b21026d | |
mapping | minimap2 | 2.17 | LINK | 2.17--caba7af |
bwa | 0.7.17 | LINK | 1.23--b21026d | |
samtools | 1.9 | LINK | 2.17--caba7af | |
retrieve reads mapped to contig | seqtk | 1.3 | LINK | 1.3--dc0d16b |
Binning | Metabat2 | 2.13 | LINK | 2.13--0e2577e |
maxbin2 | 2.2.7 | LINK | 2.2.7--b643a6b | |
concoct | 1.1.0 | LINK | 1.1.0--03a3888 | |
metawrap | 1.2.2 | LINK | 1.2.2--de94241 | |
qc binning | checkm | 1.0.13 | LINK | 1.0.13--248242f |
Taxonomic Classification | sourmash using the gt-DataBase | 2.0.1 | LINK | 2.0.1--6970ddc |
GTDB | version r89 | |||
Annotations (bin and RNA) | eggNOG | 2.0.1 | LINK | 2.0.1--d5e0c8c |
eggNOG DB | v5.0 | |||
De novo transcript and quantification | Trinity | 2.9.1 | LINK | 2.9.1--82fe26c |
Salmon | 0.15.0 | LINK | 2.9.1--82fe26c |
Figure
The Workflow
The parser output
BIBLIOGRAPHY
BWA: Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60. [PMID: 19451168]
CheckM: Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043–1055.
Concoct: Johannes Alneberg, Brynjar Smári Bjarnason, Ino de Bruijn, Melanie Schirmer, Joshua Quick, Umer Z Ijaz, Leo Lahti, Nicholas J Loman, Anders F Andersson & Christopher Quince. 2014. Binning metagenomic contigs by coverage and composition. Nature Methods, doi: 10.1038/nmeth.3103
Fastp: Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560
Filtlong: https://github.com/rrwick/Filtlong
Flye: Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019 doi:10.1038/s41587-019-0072-8
HMMER: http://hmmer.org/
Maxbin2: Wu YW, Tang YH, Tringe SG, Simmons BA, and Singer SW, "MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm", Microbiome, 2:26, 2014.
Medaka: https://github.com/nanoporetech/medaka
Metabat2: Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 2015;3:e1165. doi:10.7717/peerj.1165
Metawrap: Uritskiy, G.V., DiRuggiero, J. and Taylor, J. (2018). MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome, 6(1). https://doi.org/10.1186/s40168-018-0541-1
Minimap2: Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191
Pilon: Bruce J. Walker, Thomas Abeel, Terrance Shea, Margaret Priest, Amr Abouelliel, Sharadha Sakthikumar, Christina A. Cuomo, Qiandong Zeng, Jennifer Wortman, Sarah K. Young, Ashlee M. Earl (2014) Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE 9(11): e112963. doi:10.1371/journal.pone.0112963
pplacer: Matsen FA, Kodner RB, Armbrust EV. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11: doi:10.1186/1471-2105-11-538.
prodigal: Hyatt D, Locascio PF, Hauser LJ, Uberbacher EC. 2012. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28: 2223–2230.
Racon: Vaser R, Sovic I, Nagarajan N, Sikic M. 2017. Fast and accurate de novogenome assembly from long uncorrected reads. Genome Res 27:737–746.https://doi.org/10.1101/gr.214270.116
Samtools: Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9 [19505943]
Seqtk: https://github.com/lh3/seqtk
Sourmash: Brown et al, (2016), sourmash: a library for MinHash sketching of DNA, Journal of Open Source Software, 1(5), 27, doi:10.21105/joss.00027
Spades: Lapidus A., Antipov D., Bankevich A., Gurevich A., Korobeynikov A., Nurk S., Prjibelski A., Safonova Y., Vasilinetc I., Pevzner P. A. New Frontiers of Genome Assembly with SPAdes 3.0. (poster), 2014
Unicycler: Wick RR, Judd LM, Gorrie CL, Holt KE (2017) Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 13(6): e1005595. https://doi.org/10.1371/journal.pcbi.1005595
License
Code is GPL-3.0
Contributing
We welcome contributions from the community! See our Contributing guidelines
Logo creator
The Muffin logo has been made by Tanguy Desmarez and is CC BY (Version 4) compliant