Manifest ======= Manifest purpose ________________ The RAFT manifest defines the relationships among samples, patients, and datasets. The number and types of samples required for each RAFT workflow varies. For example, an RNA quantification workflow may use one or multiple RNA-sequencing samples per patient. Other workflows, like somatic variant calling, will require both a normal sample and a tumor sample for each patient. RAFT manifests may contain one or more patients. Many computer clusters will allow for multiple patients to be run in parallel to reduce run time. Sample, patient, and dataset heirarchy ______________________________________ The general hierarchy of organization within RAFT follows .. code-block:: console Sample ∈ Patient ∈ Dataset In other words, samples belong to patients (patients can have multiple samples) and patients belong to datasets (datasets can have multiple patients). Manifest contents _________________ A RAFT manifest must have at least the columns defined in the table below. Columns can be in any order and other columns containing non-RAFT metadata are also allowed. .. list-table:: RAFT Columns :widths: 25 25 25 :header-rows: 1 * - Column - Description - Allowed values * - Dataset - Name for collection of patients - Free text * - Patient Name - Name for collection of samples - Free text * - Run Name - Name for the specific sample - Free text (see note below) * - File Prefix - Base name (or full path) of input files - Free text * - Sequencing Method - Sequencing protocol for sample - (RNA-Seq, WES, WXS, WGS) * - Normal - Is the sample normal or abnormal (tumor)? - (TRUE, FALSE) .. note:: A sample's ``Run_Name`` is instrumental in guiding samples through some RAFT workflows. A sample’s ``Run_Name`` should have a two-letter prefix that describes the type of sample followed by an arbitrary unique identifier. The first letter of the prefix is either ``a`` (for abnormal) or ``n`` (for normal). The second letter is either ``r`` (for RNA) or ``d`` (for DNA). For example, a sample with an ar- prefix is an abnormal (tumor) RNA sample while a sample with a nd- prefix is a normal DNA sample. Each line in the manifest after the header corresponds to a sample and provides the necessary data for running a RAFT workflow. The samples described within the manifest may, in some cases, be effectively independent (as in, the workflow does not attempt to pair samples from a patient), but in other cases, users must be careful that samples are properly labeled. For example, somatic variant calling generally requires a normal DNA sample and a tumor DNA sample. For RAFT to properly pair these samples together, they must have the correct sample prefix (nd- for the DNA tumor sample and nd- for the DNA normal sample) and be paired with the patient (``Patient_Name`` field) and dataset (``Dataset`` field). Consider the following example: .. code-block:: console Patient_Name Run_Name Dataset File_Prefix Sequencing_Method Normal Pt01 ad-Pt01-03A AML 9f7f7 WES FALSE Pt01 nd-Pt01-11A AML 8e74a WES TRUE Pt01 ar-Pt01-03A AML cdb288 RNA-Seq FALSE CTRL nr-CTRL AML CD34-U RNA-Seq TRUE .. note:: Both the tumor DNA sample (ad-Pt01-03A) and the normal DNA (nd-Pt01-11A) sample belong to the same patient (Pt01) and the same dataset (AML). User-provided manifests can be sanity-checked by using: .. code-block:: console raft.py check-manifest -m Users can then provide the manifest to RAFT’s ``run-ots`` command to run the workflow with their manifest.