Modules
Available modules
Current available modules can be queried using:
$ raft.py available-modules
This command will provide a list like the following:
abra2 germline mixcr sequenza yolotrim
accucopy gffread mm2-fast seurat
alignment gtfparse multiqc singlecell
antigen.garnish gtftogenepred neos sniffles
arcashla haplotypetools neosplice snpeff
arriba hervquant netctlpan somalier
bbmap hifiadapterfilt netmhciipan somatic
bcftools hificnv netmhcpan spladder
bedtools hiphase netmhcstabpan splice
blast hisat2 ngmlr squid
bowtie2 hlaprofiler onco star
bwa hlathena optitype starfusion
bwa_mem2 htslib pbsv starseqr
cellranger igv-snapshot-automator phlat strelka2
clens igv_reports phylowgs sv
cnvkit immuno picard2 svim
cutesv jacquard preproc telescope
deepconsensus kallisto pyclone-vi template
deephlapan kraken2 qc tidyestimate
deepvariant lens ref_utils trim_galore
delly lenstools rna_antigens trimmomatic
facets lilac rna_quant tximport
fastp lima rseqc utilities
fastqc lohhla salmon varscan2
flexbar longgf samblaster vdjer
fusion mapquik samtools verifybamid
fusioncatcher mhcflurry seq2hla viral
gatk4 mhcnuggets seq_variation virdetect
generic minimap2 seqtk whatshap
Module design strategy
Tool-level modules
Bioinformatics tools (bwa, star, salmon, etc.) in RAFT each have
their own modules which are housed in public Gitlab repositories. For example,
bwa’s module can be found `here
https://gitlab.com/landscape-of-effective-neoantigens-software/nextflow/modules/tools/bwa`_.
Each tool-level module contains processes which define support tool behavior.
Generally, each process should define one “command” from the tool. For example,
the bwa module includes a bwa index process and bwa mem process as
these two commands serve very different purposes which respect to workflows. We
strongly discourage a process definition having multiple commands as a workflow
definition should be used in these cases (see `Neosplice's workflow
https://gitlab.com/landscape-of-effective-neoantigens-software/nextflow/modules/tools/neosplice/-/blob/prod/neosplice.nf?ref_type=heads#L277`_).
Exceptions are, however, allowed. For example, bwa mem produces a SAM file.
SAM files are uncompressed and generally quite large. Many workflows will
convert a SAM file into a binary equivilent file (BAM). To save both time and
storage, the bwa module includes a `process
https://gitlab.com/landscape-of-effective-neoantigens-software/nextflow/modules/tools/bwa/-/blob/prod/bwa.nf?ref_type=heads#L85`_
which combines both bwa mem and samtools sort into a single command.
Thematic modules
Workflows are defined and organized within thematic modules. These modules cover common tasks performed within sequencing-based bioinformatics such as alignment, RNA quantification, and somatic variant calling, among others.
RAFT workflows have two crucial aspects:
RAFT workflows are generic
RAFT workflows are designed to be generic as possible. In other words,
workflows define an objective (e.g. “align reads from a FASTQ file to a
referene”) rather than a method (e.g. “Align paired-end short reads to a
reference using STAR”). A quick glance at the workflow names in the
alignment.nf module show how workflows are named as a results:
workflow manifest_to_transcript_counts
workflow raw_fqs_to_transcript_counts
workflow procd_fqs_to_transcript_counts
These workflows allow for the use FASTQ trimming, or not, and they allow the
use of different quantification tools if desired (e.g. salmon or
kallisto). Keeping workflows as generic as possible (while still
maintaining their objective) allows for future tool and capability expansion
with minimal burden.
RAFT workflows are nested
Astute readers may have also noticed that the workflows in the previous section have the same end point (transcript_counts), but different starting points (manifest, raw_fqs, and procd_fqs). This hints towards the next crucial aspect – workflows are nested.
Specifically, one can imagine workflows as nested dolls, like the following:
+-------------------------------------------+
| manifest_to_transcript_counts |
| \ |
| +-------------------------------------+|
| | raw_fqs_to_transcript_counts ||
| | \ ||
| | +-------------------------------+||
| | | procd_fqs_to_transcript_counts|||
| | +-------------------------------+||
| +-------------------------------------+|
+-------------------------------------------+
In other words, if a user calls manifest_to_transcript_counts, then
manifest_to_transcript_counts runs (which parses the user-provided
manifest) and then calls raw_fqs_to_transcript_counts (which loads FASTQs
for further processing and runs (or doesn’t run) FASTQs trimming), etc.
Workflow nesting provides several benefits:
Workflow changes are more easily propagated: If a crucial bug fix is made to the
procd_fqs_to_transcript_counts, then only a single change is required for the change to propagate instead of potentially up to three changes with non-nested workflows.Multiple entry points (e.g.
manifest,raw_fqs, orprocd_fqs) allow for easier deployment: One can imagine a scenario in which onlytranscript_countsare desired. In that case, users can directly callmanifest_to_transcript counts
MANIFEST
|
V
manifest_to_transcript_counts
|
V
TRANSCRIPT COUNTS
However, one can also imagine a scenario in which transcript_counts and gene_fusions are desired. In this case, users may call manifest_to_transcript_counts and manifest_to_gene_fusions separately; however, this will be quite inefficient as the input FASTQs must be processed twice. Alternatively, they can create the following, more efficient workflow which requires only a single instance of FASTQ trimming.
MANIFEST
|
V
manifest_to_procd_fqs
/ \
V \
procd_fqs_to_gene_fusions \
| V
| procd_fqs_to_transcript_counts
GENE FUSIONS |
TRANSCRIPT COUNTS
Generic workflows with specific tools
Generic workflows, as described above, are intended to accomplish general bioinformatics tasks. Users can specify the way in which these workflows are run by specifying which tool(s) and parameters to use when running the workflow. User modification of tools and parameters are described here.
The strategy for supporting a variety of tools for a workflow can be broken down to string parsing and conditional execution.
Tool and parameter string parsing
Each optional tool will have a <TASK>_tool and a corresponding
<TASK>_tool_parameters parameter. The <TASK>_tool parameter takes a
string containing a tool name or a comma-separated list of tools to run. The
<TASK>_tool_parameters parameter accepts a strong containing a
hash/dictionary of key, value pairs where the key is a tool name and the value
is the parameters to pass to that tool. For example,
aln_tool = 'star'
aln_tool_parameters = "['star': '--quantMode TranscriptomeSAM --outSAMtype BAM SortedByCoordinate --twopassMode Basic --outSAMunmapped Within']"
will run star with parameters --quantMode TranscriptomeSAM --outSAMtype BAM SortedByCoordinate --twopassMode Basic --outSAMunmapped Within.
Interally, parameter parsing is performed using the following logic:
aln_tool_parameters = Eval.me(aln_tool_parameters)
if( aln_tool =~ /my_favorite_tool/ ) {
my_favorite_tool_parameters = aln_tool_parameters['my_favorite_tool'] ? aln_tool_parameters['my_favorite_tool'] : ''
my_favorite_tool(
procd_fqs,
idx_files,
my_favorite_tool_parameters)
my_favorite_tool.out.alns
.set{ alns }
}
Below is a line-by-line description of the code block (scroll right):
alns = Channel.empty()
aln_tool_parameters = Eval.me(aln_tool_parameters) // Convert aln_tool_parameters string to hash.
if( aln_tool =~ /my_favorite_tool/ ) { // If aln_tool contains the string "my_favorite_tool"
my_favorite_tool_parameters = aln_tool_parameters['my_favorite_tool'] ? aln_tool_parameters['my_favorite_tool'] : '' // Then load my_favorite_tools parameters. Default to empty string if not specified
my_favorite_tool( // Run an instance of my_favorite_tools
procd_fqs, // Input FASTQs
idx_files, // Input index files
my_favorite_tool_parameters) // User-specific parameters from aln_tool_parameters parsing
my_favorite_tool.out.alns // Set output alns to generic alns variable.
.set{ alns }
}
If a tool requires multiple steps such as reference indexing and read alignment, then the following strategy is used:
if( aln_tool =~ /bbmap/ ) {
bbmap_index_parameters = aln_tool_parameters['bbmap_index'] ? aln_tool_parameters['bbmap_index'] : ''
bbmap_parameters = aln_tool_parameters['bbmap'] ? aln_tool_parameters['bbmap'] : ''
bbmap_index(
aln_ref,
bbmap_index_parameters)
bbmap_samtools_sort(
procd_fqs,
bbmap_index.out.idx_files,
bbmap_parameters)
bbmap_samtools_sort.out.alns
.set{ alns }
}
In this case, each mode of the tool being called (bbmap_index for reference
indexing and bbmap for aligning) will have separate entries in the
aln_tool_parameters hash.
Conditional execution
Users can run either a single tool or multiple tools within each workflow. While the tool selection logic is hinted at in the previous section, the explicit conditional logic is:
alns = Channel.empty()
star_alns = Channel.empty()
bbmap_alns = Channel.empty()
bwa_alns = Channel.empty()
if( aln_tool =~ /,/ ) {
println "Running multiple aligners - use tool-specific output channels."
multitools = 'True'
}
if( aln_tool =~ /star/ ) {
...
star_alns = star.out.alns
if( multitools != 'True' ) {
star.out.alns
.set{ alns }
}
}
if( aln_tool =~ /bbmap/ ) {
...
bbmap_alns = bbmap.out.alns
if( multitools != 'True' ) {
bbmap.out.alns
.set{ alns }
}
}
if( aln_tool =~ /bwa/ ) {
...
bwa_alns = bwa.out.alns
if( multitools != 'True' ) {
bwa.out.alns
.set{ alns }
}
}
emit:
alns
star_alns
bbmap_alns
bwa_alns
There are several things worth noting here:
A generic output channel,
alnsis initated at the beginning of the workflow.Each tool has an output channel initiated at the beginning of the workflow (e.g.
star_alns = Channel.empty())If a comma (
,) is detected in thealns_toolstring, then themultitoolsbinary variable is set toTrue.For each tool’s conditional block, if that tool’s name is detected in the
alns_toolsstring, then it is executed.Within each tool’s block, the tool’s process output channel is set to the workflow-level tool-specific output channel (e.g.
star_alns = star.out.alns.Within each tool’s block, if
multitoolsis set toFalse, then that tool’s process output channel is set to the generic output channel (e.g.alns = star.out.alns). This allows for users to change tools used *without* needing to change input/output channels manaully.Using multiple tools within the workflow deactivates the generic output channel. In other words, specifying multiple tools results in
alnsremaining an empty channel.
Adding new tools to a generic workflow
Users can add currently unsupported tools to a generic workflow using the following workflow:
First, users should find the best generic workflow that suites their tool. For
example, if a user wants to add an additional aligner tool, then the
alignment.nf module’s *_to_alns workflows would be ideal. If, on the
other hand, a user wants to add a new somatic variant caller, then
somatic.nf’s *_to_som_vars workflows would be appropriate. If there are
no appropriate generic workflows for the new tool, then users should create a
new generic workflow (see next section).
Next, users should create their tool-specific module. We encourage users to use
currently available tool-specific modules as templates for creating new
tool-specific modules. For example, one may use the bwa.nf module as a
starting point for a new short-read aligner tool.
Once a generic workflow is picked and the tool-specific module is finished, users should integrate their new tool into the correct workflow. The nested workflow design of RAFT modules means users will have to determine the workflow most appropriate for adding their tool. Generally, this will be the workflow that has the conditional logic for tool selection.
Users will then integrate their tool into the workflow. An example of adding the star tool to alignment.nf’s procd_fqs_to_alns workflow.
Integration involves 3 steps:
Initialize an empty channel for tool’s outputs – Note that this may be multiple channels for multiple outputs!
Add a conditional block for running tool
Add emission channels
Below is the initial workflow code followed by modifications made with explanations:
workflow procd_fqs_to_alns {
// require:
// PROCD_FQS
// params.alignment$procd_fqs_to_alns$aln_tool
// params.alignment$procd_fqs_to_alns$aln_tool_parameters
// params.alignment$procd_fqs_to_alns$aln_ref
// params.alignment$procd_fqs_to_alns$gtf
// params.alignment$procd_fqs_to_alns$alt_ref
take:
procd_fqs
aln_tool
aln_tool_parameters
aln_ref
gtf
alt_ref
main:
alns = Channel.empty()
bwa_alns = Channel.empty()
multitools = ''
if( aln_tool =~ /,/ ) {
println "Running multiple aligners -- use tool-specific output channels."
multitools = 'True'
}
aln_tool_parameters = Eval.me(aln_tool_parameters)
if( aln_tool =~ /bwa$/ ) {
bwa_index_parameters = aln_tool_parameters['bwa_index'] ? aln_tool_parameters['bwa_index'] : ''
bwa_parameters = aln_tool_parameters['bwa'] ? aln_tool_parameters['bwa'] : ''
bwa_index(
aln_ref,
bwa_index_parameters)
bwa_mem_samtools_sort(
procd_fqs,
bwa_index.out.idx_files,
bwa_parameters)
bwa_mem_samtools_sort.out.bams
.set{ bwa_alns }
if( multitools != 'True' ) {
bwa_mem_samtools_sort.out.bams
.set{ alns }
}
}
emit:
alns
bwa_alns
}
Lines that have been added to the code below are denoted by a * at the
beginning of the line.
workflow procd_fqs_to_alns {
// require:
// PROCD_FQS
// params.alignment$procd_fqs_to_alns$aln_tool
// params.alignment$procd_fqs_to_alns$aln_tool_parameters
// params.alignment$procd_fqs_to_alns$aln_ref
// params.alignment$procd_fqs_to_alns$gtf
// params.alignment$procd_fqs_to_alns$alt_ref
take:
procd_fqs
aln_tool
aln_tool_parameters
aln_ref
gtf
alt_ref
main:
alns = Channel.empty()
bwa_alns = Channel.empty()
* star_alns = Channel.empty() // Adding line for star alignment outputs
* star_alt_alns = Channel.empty() // Adding line for star alternate (transcriptome) alignment outputs
* star_junctions = Channel.empty() // Adding line for star junction outputs
multitools = ''
if( aln_tool =~ /,/ ) {
println "Running multiple aligners -- use tool-specific output channels."
multitools = 'True'
}
aln_tool_parameters = Eval.me(aln_tool_parameters)
if( aln_tool =~ /bwa$/ ) {
bwa_index_parameters = aln_tool_parameters['bwa_index'] ? aln_tool_parameters['bwa_index'] : ''
bwa_parameters = aln_tool_parameters['bwa'] ? aln_tool_parameters['bwa'] : ''
bwa_index(
aln_ref,
bwa_index_parameters)
bwa_mem_samtools_sort(
procd_fqs,
bwa_index.out.idx_files,
bwa_parameters)
bwa_mem_samtools_sort.out.bams
.set{ bwa_alns }
if( multitools != 'True' ) {
bwa_mem_samtools_sort.out.bams
.set{ alns }
}
}
* if( aln_tool =~ /star/ ) { // Adding line for enabling STAR selection
* star_index_parameters = aln_tool_parameters['star_index'] ? aln_tool_parameters['star_index'] : '' // Parsing aln_tool_parameters for star_index parameters
* star_map_parameters = aln_tool_parameters['star'] ? aln_tool_parameters['star'] : '' // Parsing aln_tool_parameters for star parameters
* star_index( // Run STAR index
* aln_ref,
* star_index_parameters)
* star_map( // Run STAR map
* procd_fqs,
* star_index.out.idx_files,
* star_map_parameters,
* gtf)
* star_map.out.alns // Set empty channels to star process outputs
* .set{ star_alns }
* star_map.out.alt_alns
* .set{ star_alt_alns }
* star_map.out.standard_junctions
* .set{ star_junctions }
* if( multitools != 'True' ) { // If there are not multiple tools called, then
* star_map.out.alns // set the STAR alignments to the generic alignments
* .set{ alns } // channel.
* }
* }
emit:
alns
bwa_alns
* star_alns // Add star_alns emission channel
* star_alt_alns // Add star_alt_alns emission channel
* star_junctions // Add star_jucntions emission channel
}
Creating new generic workflows
Users may also want to create their own workflow if current offerings do not
meet their needs. We strongly encourage users to build off of currently
available workflows if possible.For example, if a workflow for analyzing
mutational signatures is desired and the expected input files is somatic VCFs,
then the *_to_som_vars workflows from the somatic.nf module should be
used. This allows for a consistent (though modifiable) method for generating
somatic VCFs.
In this example, we’ll be providing support for the fictional tool mfmst
(My Favorite Mutational Signature Tool). We’ll start by determining which
thematic module best matches mutational signature analysis. The onco.nf is
the general oncology module, so we opt to include the mutational signature
workflows within that module.
Next, we’ll start by generating the empty workflows for the
*_to_mut_sigs family.
workflow manifest_to_mut_sigs {
}
workflow raw_fqs_to_mut_sigs {
}
workflow procd_fqs_to_mut_sigs {
}
workflow alns_to_mut_sigs {
}
workflow vcfs_to_mut_sigs {
}
Workflow syntax and relationships among nested workflows
Each workflow will require four separate components, a require: block, a
take:, a main: block, and an emit: block. Information about the
take:, main: and emit: blocks can be found within the Nextflow
documentation. The require: block is specific to LENS and is used for
populating the main.nf file when a workflow is added using the raft.py
add-step command.
RAFT workflows are nested (see section X), so both require: blocks and input: blocks of subsequent workflows. For example,
manifest_to_x require and inputs blocks ⊂ raw_fqs_to_x require and inputs blocks ⊂ procd_fqs_to_x require and inputs blocks, etc.
This relationship is due to the fact that upstream workflows require parameters for running their own processes and parameters for running downstream parameters.
For example, the manifest_to_som_vars workflow require: block is below
which is followed by a table describing which nested workflow uses the parameter.
workflow manifest_to_som_vars {
// require:
// MANIFEST
// params.somatic$manifest_to_som_vars$fq_trim_tool
// params.somatic$manifest_to_som_vars$fq_trim_tool_parameters
// params.somatic$manifest_to_som_vars$aln_tool
// params.somatic$manifest_to_som_vars$aln_tool_parameters
// params.somatic$manifest_to_som_vars$som_var_caller
// params.somatic$manifest_to_som_vars$som_var_caller_parameters
// params.somatic$manifest_to_som_vars$som_var_caller_suffix
// params.somatic$manifest_to_som_vars$aln_ref
// params.somatic$manifest_to_som_vars$bed
// params.somatic$manifest_to_som_vars$som_var_pon_vcf
// params.somatic$manifest_to_som_vars$som_var_af_vcf
// params.somatic$manifest_to_som_vars$known_sites_ref
// params.somatic$manifest_to_som_vars$species
Parameter |
Workflow usage |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
We will use the *_to_som_vars workflows for running out mutational
signature analysis. Workflows must include the parameters needed to run the
specific *_to_som_vars workflow and run some_vars_to_mut_sigs.
For example, manifest_to_mut_sigs may look something like this:
workflow manifest_to_som_vars {
// require:
// MANIFEST
// params.onco$manifest_to_mut_sigs$fq_trim_tool
// params.onco$manifest_to_mut_sigs$fq_trim_tool_parameters
// params.onco$manifest_to_mut_sigs$aln_tool
// params.onco$manifest_to_mut_sigs$aln_tool_parameters
// params.onco$manifest_to_mut_sigs$som_var_caller
// params.onco$manifest_to_mut_sigs$som_var_caller_parameters
// params.onco$manifest_to_mut_sigs$som_var_caller_suffix
// params.onco$manifest_to_mut_sigs$aln_ref
// params.onco$manifest_to_mut_sigs$bed
// params.onco$manifest_to_mut_sigs$som_var_pon_vcf
// params.onco$manifest_to_mut_sigs$som_var_af_vcf
// params.onco$manifest_to_mut_sigs$known_sites_ref
// params.onco$manifest_to_mut_sigs$species
// params.onco$manifest_to_mut_sigs$mut_sig_tool
// params.onco$manifest_to_mut_sigs$mut_sig_tool_parameters
take:
manifest
fq_trim_tool
fq_trim_tool_parameters
aln_tool
aln_tool_parameters
som_var_caller
som_var_caller_parameters
som_var_caller_suffix
aln_ref
bed
som_var_pon_vcf
som_var_af_vcf
known_sites_ref
species
mut_sig_tool
mut_sig_tool_parameters
main:
manifest_to_som_vars(
manifest,
fq_trim_tool,
fq_trim_tool_parameters,
aln_tool,
aln_tool_parameters,
som_var_caller,
som_var_caller_parameters,
som_var_caller_suffix,
aln_ref,
bed,
som_var_pon_vcf,
som_var_af_vcf,
known_sites_ref,
species)
som_vars_to_mut_sigs(
manifest_to_som_vars.out.som_vars,
mut_sig_tool,
mut_sig_tool_parameters
emit:
mut_sigs = som_vars_to_mut_sigs.out.mut_sigs
}
The alns_to_mut_sigs, however, would look like the following. Note that it
excludes parameters that are not needed. At this point, alignment files are
being provided, so information on FASTQ trimming (fastq_trim_tool and
fastq_trim_tool_parameters) are no longer needed. Likewise, information on
read alignment (aln_tool and aln_tool_parameters) are not needed.
workflow alns_to_mut_sigs {
// require:
// MANIFEST
// params.onco$alns_to_mut_sigs$som_var_caller
// params.onco$alns_to_mut_sigs$som_var_caller_parameters
// params.onco$alns_to_mut_sigs$som_var_caller_suffix
// params.onco$alns_to_mut_sigs$aln_ref
// params.onco$alns_to_mut_sigs$bed
// params.onco$alns_to_mut_sigs$som_var_pon_vcf
// params.onco$alns_to_mut_sigs$som_var_af_vcf
// params.onco$alns_to_mut_sigs$known_sites_ref
// params.onco$alns_to_mut_sigs$species
// params.onco$alns_to_mut_sigs$mut_sig_tool
// params.onco$alns_to_mut_sigs$mut_sig_tool_parameters
take:
manifest
som_var_caller
som_var_caller_parameters
som_var_caller_suffix
aln_ref
bed
som_var_pon_vcf
som_var_af_vcf
known_sites_ref
species
mut_sig_tool
mut_sig_tool_parameters
main:
alns_to_som_vars(
som_var_caller,
som_var_caller_parameters,
som_var_caller_suffix,
aln_ref,
bed,
som_var_pon_vcf,
som_var_af_vcf,
known_sites_ref,
species,
manifest)
som_vars_to_mut_sigs(
manifest_to_som_vars.out.som_vars,
mut_sig_tool,
mut_sig_tool_parameters
emit:
mut_sigs = som_vars_to_mut_sigs.out.mut_sigs
}
Finally, users should create the som_vars_to_mut_sigs workflow. This
workflow should follow the parameter string parsing and conditional execution
described in Section X.
Module interdependence
Subworkflows can be heavily dependent upon other subworkflows within their
module as well as subworkflows within other modules. For example, nearly every
thematic module involving read alignment will utilize a subworkflow from the
alignment.nf module. This interdependence allows for the minimization of
redundant code and allows for easier propagation of code fixes. We highly
encourage users to see if a currently existing subworkflow performs a task they
need before introducing a new subworkflow.