Skip to content

Configuration

This page provides a complete reference for all PopMAG parameters and configuration options.

Parameter Reference

Input/Output Options

Parameters for specifying input data and output locations.

Parameter Type Default Description
--outdir string results The output directory where results will be saved. Use absolute paths for cloud storage.
--mag_paths string mag_paths.tsv Path to the samplesheet with MAG IDs and file locations.
--reads_paths string reads_paths.tsv Path to the samplesheet with reads IDs and file locations.
--metadata_file string metadata.csv Path to the metadata file. The sample_id column is required.
--publish_dir_mode string copy How to publish output files. Options: copy, copyNoFollow, link, move, rellink, symlink.

Publish Directory Modes

  • copy: Copy files to output directory (safest, uses more disk space)
  • symlink: Create symbolic links (saves space, but links break if work directory is deleted)
  • link: Create hard links (saves space, works only on same filesystem)

Quality Control Options

Parameters for MAG quality assessment and filtering.

Parameter Type Default Description
--checkm2_db string null Path to a pre-downloaded CheckM2 database. If not provided, the database will be downloaded automatically.
--checkm2_db_version integer 14897628 Specific CheckM2 database version to download (Zenodo record ID).
--skip_checkm2 boolean false Skip CheckM2 quality assessment. Use if MAGs are already quality-filtered.
--min_completeness integer 90 Minimum completeness percentage for a MAG to pass filtering.
--max_contamination integer 5 Maximum contamination percentage for a MAG to pass filtering.

Quality Thresholds

The default thresholds (min_completeness: 90, max_contamination: 5) correspond to "high-quality" MAGs according to MIMAG standards. Adjust these based on your analysis requirements.

Functional Annotation Options

Parameters for gene prediction and functional annotation.

Parameter Type Default Description
--metacerberus_db string null Path to a pre-downloaded MetaCerberus database. If not provided, the database will be downloaded automatically.
--skip_metacerberus boolean false Skip MetaCerberus functional annotation.
--annotation_db string ALL Which annotation databases to use in MetaCerberus.

SNV/VCF Options

Parameters for variant calling and VCF generation.

Parameter Type Default Description
--min_coverage integer 0 Minimum read coverage for an SNV to be included in the VCF.
--min_var_freq integer 0 Minimum variant frequency for an SNV to be included.
--include_nonvariant boolean false Include positions without variants in the VCF output.
--vcf_prefix string "" Prefix string for VCF filenames.
--vcf_suffix string "" Suffix string for VCF filenames.

InStrain Options

Parameters for population genomics analysis with InStrain.

Parameter Type Default Description
--skip_instrain_compare boolean false Skip InStrain compare step (comparison between samples).

Visualization Options

Parameters for the Shiny dashboard.

Parameter Type Default Description
--shiny_timeout integer 1500 Timeout in seconds for the Shiny app. The app will automatically close after this duration.
--skip_shiny boolean false Skip running the Shiny App.

Resource Configuration

PopMAG uses process labels to allocate resources. The default configuration is defined in conf/base.config:

Process Labels

Label CPUs Memory Time
process_single 1 6 GB 4 h
process_low 2 12 GB 4 h
process_medium 6 36 GB 8 h
process_high 12 72 GB 16 h
process_long - - 20 h
process_high_memory - 200 GB -

Dynamic Resources

Resources scale with retry attempts. For example, process_low uses 12 GB * task.attempt, so a first retry would use 24 GB.

Custom Resource Configuration

Create a custom config file to override default resources:

// custom.config
process {
    withLabel: process_high {
        cpus = 24
        memory = 128.GB
        time = 24.h
    }

    withName: INSTRAIN_PROFILE {
        cpus = 16
        memory = 96.GB
    }
}

Run with your custom config:

nextflow run daasabogalro/PopMAG \
    -profile docker \
    -c custom.config \
    ...

Special Profiles

Profile Description
debug Enable debug output and hash dumping

Combining Profiles

Profiles can be combined with commas:

# Docker on Apple Silicon
nextflow run daasabogalro/PopMAG -profile docker,debug ...

# Singularity with debug output
nextflow run daasabogalro/PopMAG -profile singularity,debug ...

Using Parameters Files

For complex configurations, use a YAML parameters file:

# params.yml

# Input/Output
mag_paths: "/data/project/mags.tsv"
reads_paths: "/data/project/reads.tsv"
metadata_file: "/data/project/metadata.csv"
outdir: "/data/project/results"

# Quality Control
min_completeness: 90
max_contamination: 5
checkm2_db: "/databases/checkm2"

# Functional Annotation
metacerberus_db: "/databases/metacerberus"
skip_metacerberus: false

# SNV Options
min_coverage: 5
min_var_freq: 0.05

# Visualization
shiny_timeout: 3600

Run with:

nextflow run daasabogalro/PopMAG \
    -profile docker \
    -params-file params.yml

Check Process Resources

You cand add to your run command to see resource usage:

nextflow run daasabogalro/PopMAG \
    -profile docker \
    -with-report report.html \
    -with-trace trace.txt \
    -with-timeline timeline.html \
    ...

These reports are automatically generated in results/pipeline_info/ by default.