|
| 1 | +# DeepClone pipelines |
| 2 | + |
| 3 | +## deepUMIcaller |
| 4 | + |
| 5 | +## Metrics |
| 6 | + |
| 7 | +## deepCSA |
| 8 | + |
| 9 | +<!-- |
| 10 | +TODO: Brief introduction on what is intogen - its website and its purpose, use webs and repo as reference. |
| 11 | +--> |
| 12 | +[](https://github.com/bbglab/intogen-plus-dsl2/)<!-- markdownlint-disable MD013 --> |
| 13 | + |
| 14 | +It's a framework for automatic and comprehensive knowledge extraction based on mutational data from |
| 15 | +sequenced tumor samples from patients. |
| 16 | + |
| 17 | +## Run IntOGen DSL2 |
| 18 | + |
| 19 | +Great effort was put to migrate IntOGen from nextflow DSL1 to nextflow DSL2. This effort allowed to be able to run the |
| 20 | +pipeline within our seqera platform dashboard. |
| 21 | + |
| 22 | +From the bbglabirb/ALP_pipelines workspace [launchpad](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad), |
| 23 | +you can access the pipelines available in our workspace. |
| 24 | + |
| 25 | +!!! question "I can't see the workspace, what should I do?" |
| 26 | + Please refer to Miguel or to Federica to solve this issue |
| 27 | + |
| 28 | +By clicking on [intOGen-plus-dsl2](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad/217132460501467?sourceWorkspaceId=97012242959019) |
| 29 | +you'll be able to launch the pipeline. |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +Before launching the pipeline, some parameters need to be configured. Here a simple but complete list of |
| 34 | +useful parameters is explained. |
| 35 | + |
| 36 | +!!! warning "We highly recommend to keep the defaults for those parameters not discussed in this page." |
| 37 | + |
| 38 | +=== "General config section" |
| 39 | + |
| 40 | + #### **Revision number**<!-- markdownlint-disable MD046 --> |
| 41 | + |
| 42 | + { height="300" style="display: block; margin: 0 auto" } |
| 43 | + |
| 44 | + By default, the **revision number** is linked to the stable tag of the pipeline. As of now - it's `2024.11-dsl2`. |
| 45 | + This can eventually be changed if a run is resumed or relaunched from the run section. |
| 46 | + |
| 47 | + !!! note "Please be aware that changing this section may affect the `resume` option" |
| 48 | + |
| 49 | + #### **Config profile** |
| 50 | + |
| 51 | + { height="300" style="display: block; margin: 0 auto" } |
| 52 | + |
| 53 | + - `test` --> this is using the [CBIOP cohort](https://github.com/bbglab/intogen-plus-dsl2/blob/dev/DSL2/tests/data/pipeline/input/cbioportal_prad_broad/data_mutations_extended.txt) in the repo [optional]<!-- markdownlint-disable MD013 --> |
| 54 | + - `test_full` --> this is using the full datasets of intogen [optional]. |
| 55 | + - `singularity` --> this is allowing the use of singularity for using the containers |
| 56 | + - `irb` --> this is allocating the right resources and queue for the slurm executor in the IRBCluster |
| 57 | + |
| 58 | + #### **Workflow run name** |
| 59 | + |
| 60 | + { height="300" style="display: block; margin: 0 auto" } |
| 61 | + |
| 62 | + It's **mandatory** to write a meaningful name. Here follows some examples: |
| 63 | + |
| 64 | + - If I am running a new combination optimization I would call the run: `optimization_combination` |
| 65 | + - If I am running a FULL run with a new final version of intogen I would call it: `v3.0_ALL` |
| 66 | + - If I am reproducing the v2024 run I would call it: `v2024_ALL` |
| 67 | + - If I am running a specific cohort from an external collaborator I would call it: `v2024_EXT_COLLAB` |
| 68 | + |
| 69 | + #### **Work directory** |
| 70 | + |
| 71 | + { height="300" style="display: block; margin: 0 auto" } |
| 72 | + |
| 73 | + By default, the work directory is `/data/bbg/nobackup2/work/IntOGenDSL2/v2024/`. |
| 74 | + For faster execution you can use the scratch partition in the cluster: `/scratch/bbg/work/IntOGenDSL2/v2025/<your-subfolder>`. |
| 75 | + Replace `<your-subfolder>` with a meaningful name, such as the `Outdir` value from the next section, to avoid conflicts. |
| 76 | + |
| 77 | + !!! warning "Delete the work folder once the intogen run finishes successfully." |
| 78 | + |
| 79 | + |
| 80 | +=== "Run parameters section" |
| 81 | + |
| 82 | + #### **Input**<!-- markdownlint-disable MD046 --> |
| 83 | + |
| 84 | + This parameter is read as a string, and it should be the absolute paths of the folder that openvariant will iterate |
| 85 | + separated by a space. Here it follows an example: |
| 86 | + |
| 87 | + ```sh |
| 88 | + /path/to/datasets/for/intogen/input1 /path/to/datasets/for/intogen/input2 /path/to/datasets/for/intogen/input3 |
| 89 | + ``` |
| 90 | + |
| 91 | + !!! question "How do I prepare the input for IntOGen?" |
| 92 | + Great question! Here the documentation where everything is explained: |
| 93 | + [intogen-plus.readthedocs](https://intogen-plus.readthedocs.io/en/v2024/usage.html#input) |
| 94 | + |
| 95 | + #### **Outdir** |
| 96 | + |
| 97 | + This parameter is where the output of intogen will be stored. By default we store |
| 98 | + intermediate runs that might fail here: |
| 99 | + |
| 100 | + ```sh |
| 101 | + /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/<MeaningfulName> |
| 102 | + ``` |
| 103 | + |
| 104 | + !!! note "It's important to add a meaningful name as a final directory output" |
| 105 | + by default IntOGen will create a folder with a date where all the results will be stored. This although |
| 106 | + requires an higher level of specificity in the top folder. |
| 107 | + |
| 108 | + e.g. If I am running an external collab for LUNG data, I will add as an `outdir` parameter: |
| 109 | + ```sh |
| 110 | + /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab |
| 111 | + ``` |
| 112 | + |
| 113 | + The IntOGen pipeline will by default create a subdirectory with the date of the |
| 114 | + launch where it will store all the files: |
| 115 | + ```sh |
| 116 | + /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab/20250423/ |
| 117 | + ``` |
| 118 | + |
| 119 | + |
| 120 | + Stable runs and releases are officially stored in a safer partition: |
| 121 | + ```sh |
| 122 | + /data/bbg/datasets/intogen/output/runs |
| 123 | + ``` |
| 124 | + |
| 125 | +Once both those sections are completed we are safe to run the pipeline. |
| 126 | + |
| 127 | +### FAQs |
| 128 | + |
| 129 | +!!! question "The pipeline failed. How do I resume?" |
| 130 | + In the [run tab](https://cloud.seqera.io/orgs/bbglabirb/workspaces/bbglab/watch) click on the three |
| 131 | + dots on the right of your run and click `Resume`. |
| 132 | + |
| 133 | +- TBC |
| 134 | + |
| 135 | +## References |
| 136 | + |
| 137 | +- Federica Brando |
| 138 | +- Miguel Grau |
0 commit comments