Usage

Automated usage

To avoid writing all the parameter in the CLI you can use the additional "-params-file" and provide a .yml file that contains all the parameters available for MUFFIN and described below. You can find the MUFFIN_params.yml file in the base of MUFFIN directory.

Exemple: MUFFIN_params.yml

assembler   : "metaspades"
ouptut      : "path/to/resultdir"
illumina    : "fastq_ill/"
ont         : "fastq_ont/"
cpus        : 16
memory      : "64g"
modular     : "full"

MUFFIN comand:

path/to/nextflow run $MUFFIN_pipeline -params-file MUFFIN_params.yml -profile local,conda,test

$MUFFIN_pipeline is either "path/to/MUFFIN/main.nf" or "RVanDamme/MUFFIN"

Basic usage

path/to/nextflow run $MUFFIN_pipeline --output results_dir --assembler $assembler --illumina fastq_ill/ --ont fastq_ont/ --cpus 16 --memory 64g --modular full -profile $profile_executor,$profile_engine

$MUFFIN_pipeline is either "path/to/MUFFIN/main.nf" or "RVanDamme/MUFFIN"

$assembler is either: - "metaspades" for hybrid assembly - "metaflye" for long-read assembly with short-reads polishing

What assembly approach should i use? (metaspades vs flye)

Chose your assembly approach based on the amount of data (in Gigabases). If you have more short reads go for meta-spades, more long-reads? Go for flye. However, if you have over 15 Gigabases of long read data, flye might always be the better option regardless of Illumina throughput as you get good complete genome drafts. As the each sample influences the outcome heavily we recommend in trying both if you are unsure.

$profile_executor can be: - "local" to run on your computer - "gcloud" to run on google life science cloud computing (you need to setup your project in nextflow.config) - "slurm" to run on HPC using slurm (e.g. UPPMAX)

$profile_engine can be: - "local_engine" to execute the software installed locally - "conda" to execute using conda installation - "docker" and "singularity" to execute using the docker container

You can add "-resume" at the end of the command to restart it while keeping the process that succeeded This is often used in case or error in the pipeline or if you modify slightly the command and want to avoid running everything again. One exemple is to run the pipeline without RNA and rerun it adding RNA data, in this specific case the second time you add:

--rna path/to/rna -resume

Only the transcript processes and final parsing will be run

Advanced usage

You can use RNA data to have transcriptomics analysis to do so add "--rna path/to/fastq_rna/"

You can run only partially the pipeline to do so change --modular to the right parameter: - "full" run the 3 steps of MUFFIN (assemble, classify, annotate) - "assemble" run the assembly and binning - "classify" run the classification of the bins (require a different input) - "annotate" run the annotation step of the bins (require a different input) and RNA if provided - "assemb-class" run assemble and classify step - "assemb-annot" run assemble and classify step - "class-annot" run classify and annotate step (require a different input)

To run classify and annotate independently from the assemble you need to provide a CSV file of the bins The structure of the file should correspond to:

Samplename,path/to/bin1.fa
Samplename,path/to/bin2.fa
...
Samplename,path/to/binX.fa
Samplename,path/to/binY.fa

If you run "classify" with or without "annotation" use "--bin_classify" If you run "annotate" without "classify" use "--bin_annotate"

Options

--cpus

number of thread available
default 2

--mem

number of memory available
default 16g

--assembler

which method to use
can be Hybrid with metaspades or long read + polishing with metaflye
required

--illumina

location of the dir containing the forward and reverse illumina reads in fasta or fastq
required

--nanopore

location of the dir containing the nanopore reads in fasta or fastq
required

--output

output directory
required

-profile

engine and executor
required
engine: local_engine, conda, docker, singularity
executor: local, gcloud, slurm

Complete help and options

    *********hybrid assembly and differential binning workflow for metagenomics, transcriptomics and pathway analysis*********

    MUFFIN is still under development please wait until the first non edge version realease before using it.
    Please cite us using https://www.biorxiv.org/content/10.1101/2020.02.08.939843v1

    Mafin is composed of 3 part the assembly of potential metagenome assembled genomes (MAGs); the classification of the MAGs; and the annotation of the MAGs.

        Usage example:
    nextflow run RVanDamme/MUFFIN --output result --ont nanopore/ --illumina illumina/ --assembler metaspades --rna rna/ -profile local,docker
    or 
    nextflow run RVanDamme/MUFFIN --output result --ont nanopore/ --illumina illumina/ --assembler metaflye -profile local,docker

        Input:
    --ont                       path to the directory containing the nanopore read file (fastq) (default: $params.ont)
    --illumina                  path to the directory containing the illumina read file (fastq) (default: $params.illumina)
    --rna                       path to the directory containing the RNA-seq read file (fastq) (default: none)
    --bin_classify              path to the directory containing the bins files to classify (default: none)
    --bin_annotate              path to the directory containing the bins files to annotate (default: none)
    --assembler                 the assembler to use in the assembly step (default: $params.assembler)

        Optional input:
    --check_db                  path to the checkm database
    --check_tar_db              path to the checkm database tar compressed
    --sourmash_db               path to the LCA database for sourmash (default: GTDB LCA formated)
    --eggnog_db                 path to the eggNOG database

        Output:
    --output                    path to the output directory (default: $params.output)

        Outputed files:
        You can see the output structure at https://osf.io/a6hru/
    QC                          The reads file after qc
    Assembly                    The assembly contigs file 
    Bins                        The bins produced by CONCOCT, MetaBAT2, MaxBin2 and MetaWRAP (the refining of bins)
    Mapped bin reads            The fastq files containing the reads mapped to each metawrap bin
    Unmapped bin reads          The fastq files containing the unmmaped reads of illumina and nanopore
    Reassembly                  The reassembly files of the bins (.fa and .gfa)
    Checkm                      Various file outputed by CheckM (summary, taxonomy, plots and output dir)
    Sourmash                    The classification done by sourmash
    Classify summary            The summary of the classification and quality control of the bins (csv file)
    RNA output                  The de novo assembled transcript and the quantification by Salmon
    Annotation                  The annotations files from eggNOG (tsv format)
    Parsed output               HTML files that summarize the annotations and show graphically the pathways




        Basic Parameter:
    --cpus                      max cores for local use [default: $params.cpus]
    --memory                    80% of available RAM in GB for --metamaps [default: $params.memory]


        Workflow Options:
    --skip_ill_qc               skip quality control of illumina files
    --skip_ont_qc               skip quality control of nanopore file
    --short_qc                  minimum size of the reads to be kept (default: $params.short_qc )
    --filtlong                  use filtlong to improve the quality furthermore (default: false)
    --model                     the model medaka will use (default: $params.model)
    --polish_iteration          number of iteration of pilon in the polish step (default: $params.polish_iteration)
    --extra_ill                 a list of additional ill sample file (with full path with a * instead of _R1,2.fastq) to use for the binning in Metabat2 and concoct
    --extra_ont                 a list of additional ont sample file (with full path) to use for the binning in Metabat2 and concoct
    --skip_metabat2             skip the binning using metabat2 (advanced)
    --skip_maxbin2              skip the binning using maxbin2 (advanced)
    --skip_concoct              skip the binning using concoct (advanced)

        Nextflow options:
    -profile                    change the profile of nextflow both the engine and executor more details on github README
    -resume                     resume the workflow where it stopped
    -with-report rep.html       cpu / ram usage (may cause errors)
    -with-dag chart.html        generates a flowchart for the process tree
    -with-timeline time.html    timeline (may cause errors)