Preparing your samples
======================
.. The dataset status table below relies on raw HTML/JavaScript to fetch and
render live status from the GitLab API. That section will be empty or broken
in non-HTML output formats (PDF, LaTeX, man pages) and requires internet
access to display.
RAFT supports both user-provided samples and off-the-shelf immuno-oncology
datasets.
Making your files available to RAFT
-----------------------------------
RAFT starts each workflow using sample-associated data. These data could be
FASTQs, BAMs, VCFs, or RNA count data. Regardless of the type of input data,
they all need to be made discoverable by RAFT. Users can make input files
discoverable by RAFT by copying or symlinking them into the RAFT ``inputs/``
directories, for example ``/path/to/raft/inputs/fastqs``. Grouping input files
by dataset, such as
``/path/to/raft/inputs/fastqs/my_favorite_dataset/samp1_1.fq.gz``, is
recommended but not required.
.. note::
Input data for demonstration workflows and off-the-shelf datasets are
downloaded automatically.
The RAFT manifest
-----------------
A manifest describes the samples in a dataset and tells RAFT:
- how the samples should be named
- how the samples are related
- what input files are associated with each sample
For user-provided datasets, you supply this manifest directly. For supported
off-the-shelf datasets, RAFT can generate the run manifest automatically as
part of dataset preparation.
Using an off-the-shelf dataset
------------------------------
The table below lists the currently supported off-the-shelf immuno-oncology
datasets. Click a dataset name to view its publication, abstract, and README.
.. raw:: html
Loading datasets from GitLab...
.. toctree::
:hidden:
dataset-detail
Specifying an off-the-shelf dataset with RAFT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To create a new project from an off-the-shelf dataset, simply pass
``--dataset`` and the dataset's identifier to RAFT:
.. code-block:: console
$ raft run \
--project-id my-project \
--workflow lens \
--dataset \
--version v1.9-dev \
RAFT will clone the dataset-prep module, download the required FASTQs from
ENA/EBI, and stage them before execution. No user-provided manifest is required;
the run manifest is generated automatically.
Using your own samples
----------------------
Running your own samples through RAFT requires creating a manifest for your
dataset. RAFT provides a `web-based interface `_ to help generate that manifest.
This interface becomes available when you execute ``raft run`` on a new project
for the first time.
.. figure:: _static/manifest-generator.png
:alt: RAFT manifest generator interface
The RAFT manifest generator interface.
If you prefer to create the manifest separately and then provide it with
``raft run ... --manifest ``, you can use the hosted
manifest generator `here `_.
More information about the manifest format is available in :doc:`manifest`.
Specifying your own samples with RAFT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you already have a manifest, create a new project by passing ``--manifest``
and the manifest filename.
.. note::
Manifest TSVs must be in your RAFT's ``inputs/metadata`` directory to be
discovered by RAFT.
.. code-block:: console
$ raft run \
--project-id my-project \
--workflow lens \
--version v1.9-dev \
--manifest
Alternatively, if you want to make your manifest as part of running RAFT, then
simply run RAFT without ``--manifest``:
.. code-block:: console
$ raft run \
--project-id my-project \
--workflow lens \
--version v1.9-dev \