Off-the-shelf workflows ======================= This page is a technical reference for RAFT's off-the-shelf workflow configuration layout and supporting metadata. Repository organization ----------------------- Public off-the-shelf workflow configuration repositories are hosted under the `workflow-configs group `_. RAFT inspects these repositories to list available workflows (for example, ``raft available-workflows``) and to initialize workflow projects (for example, ``raft run`` and ``raft run-ots``). The general repository structure follows: .. code-block:: console workflows (subgroup) \ \___ \ \___- \ \___-- | |__-- | |__-- An example is shown below: .. image:: org_example.png :width: 600 Configuring off-the-shelf workflows ----------------------------------- Each off-the-shelf workflow typically includes at least two files: - A configuration file containing workflow parameter defaults - A JSON file describing the workflow Off-the-shelf configuration file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The configuration file defines default values for workflow parameters. Each line follows the pattern `` = ``. The configuration filename should follow the ``...config`` convention. For example: .. code-block:: console params.utilities$parse_manifest$separator = '\t' params.alignment$manifest_to_alns$fq_trim_tool = 'fastp' params.alignment$manifest_to_alns$fq_trim_tool_parameters = "[]" params.alignment$manifest_to_alns$aln_tool = 'bwa-mem2' params.alignment$manifest_to_alns$aln_tool_parameters = "[]" params.alignment$manifest_to_alns$aln_ref = "${params.ref_dir}/Homo_sapiens_assembly38.fasta" params.alignment$manifest_to_alns$gtf = '' params.alignment$manifest_to_alns$alt_ref = '' params.alignment$alns_to_procd_alns$aln_ref = "${params.ref_dir}/Homo_sapiens_assembly38.fasta" params.alignment$alns_to_procd_alns$bed = "${params.ref_dir}/hg38_exome.bed" params.alignment$alns_to_procd_alns$gtf = '' params.alignment$alns_to_procd_alns$dup_marker_tool = 'picard' params.alignment$alns_to_procd_alns$dup_marker_tool_parameters = "[]" params.alignment$alns_to_procd_alns$base_recalibrator_tool = 'gatk4' params.alignment$alns_to_procd_alns$base_recalibrator_tool_parameters = "[]" params.alignment$alns_to_procd_alns$indel_realign_tool = '' params.alignment$alns_to_procd_alns$indel_realign_tool_parameters = "[]" params.alignment$alns_to_procd_alns$known_sites_ref = "${params.ref_dir}/Homo_sapiens_assembly38.dbsnp138.vcf.gz" params.germline$manifest_to_germ_vars$germ_var_caller = 'deepvariant' params.germline$manifest_to_germ_vars$germ_var_caller_parameters = "['deepvariant': '--model_type WES']" params.germline$manifest_to_germ_vars$germ_var_caller_suffix = "['deepvariant': '.deepv']" params.germline$manifest_to_germ_vars$aln_ref = "${params.ref_dir}/Homo_sapiens_assembly38.fasta" params.germline$manifest_to_germ_vars$bed = "${params.ref_dir}/hg38_exome.bed" JSON workflow description file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The JSON workflow description file describes the workflow's building blocks and required references. An example follows: .. code-block:: console { "modules": ["germline"], "steps": { "parse_manifest": "utilities", "manifest_to_alns": "alignment", "alns_to_procd_alns": "alignment", "alns_to_germ_vars": "germline" }, "references": [ "https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta", "https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf", "hg38_exome.bed" ], "references_postproc": { "Homo_sapiens_assembly38.dbsnp138.vcf": "bgzip" }, "find_and_replace": { "manifest_to_alns(\n MANIFEST": "manifest_to_alns(\n parse_manifest.out.manifest", "alns_to_procd_alns(\n ALNS": "alns_to_procd_alns(\n manifest_to_alns.out.alns", "alns_to_germ_vars(\n ALNS": "alns_to_germ_vars(\n alns_to_procd_alns.out.procd_bams", "MANIFEST)": "parse_manifest.out.manifest)", "VCFS,": "'',", "JUNCTIONS,": "''," } } The JSON workflow description file consists of ``modules``, ``steps``, ``references``, ``references_postproc``, and ``find_and_replace``. A description of each section follows. modules +++++++ The modules section consists of any RAFT modules that need to be loaded for the workflow to function. RAFT is capable of resolving any module dependencies, so the list of required modules does not have to be exhaustive. For example, in the above example the ``germline`` module referenced loads every other module needed to support germline variant calling. steps +++++ The steps section lists every step required for the workflow. These steps are loaded into the workflow in the order presented. Generally speaking, these steps should be combined within workflow to create an end-to-end-workflow. In the above example, 1) ``parse_manifest`` parses the user-provided manifest, 2) ``manifest_to_alns`` loads the FASTQs, performs trimming, and aligns FASTQs to the reference, 3) ``alns_to_procd_alns`` performs any alignment sanitization, and 4) ``alns_to_germ_vars`` performs germline variant calling. references ++++++++++ The references section consists of all references required for running the workflow. URLs can be provided if available. References that are not available for external sources are assumed to be available in the RAFT global ``references/`` directory (``raft/references/``). references_postproc +++++++++++++++++++ The references_postproc section includes any postprocessing steps that should be performed to references after downloading them. In the above example, ``Homo_sapiens_assembly38.dbsnp138.vcf`` is compressed with ``bgzip`` after downloading. Note that this functionality can be used to run ``bash`` on scripts that download or generate references as well (see the ``lens`` workflow for an example). find_and_replace ++++++++++++++++ The find_and_replace section provides instructions for finding strings within the ``main.nf`` files and replacing them with the provided text. RAFT provides placeholder variables in CAPITAL LETTERS when creating a workflow. These variables are expected to be changed by the user prior to running the workflow. The find_and_replace section effectively automates this variable replacement, removing the burden from the end user. In the example above, we can see that the line .. code-block:: console manifest_to_alns( MANIFEST will be replaced with .. code-block:: console manifest_to_alns( parse_manifest.out.manifest Unfortunately, getting the find and replace string correct may take a bit of trial and error.