Skip to content

Commit 6b95f2e

Browse files
Merge pull request nf-core#86 from khersameesh24/dev
Added pipeline run modes (--mode) - image & coordinate, fixed schema, config, updated readme
2 parents 223834c + 894f19c commit 6b95f2e

File tree

30 files changed

+464
-319
lines changed

30 files changed

+464
-319
lines changed

.nf-core.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
lint:
2+
actions_ci: false
3+
files_exist:
4+
- .github/workflows/awsfulltest.yml
5+
- .github/workflows/awstest.yml
6+
files_unchanged:
7+
- .gitignore
8+
- assets/nf-core-spatialxe_logo_light.png
9+
- docs/images/nf-core-spatialxe_logo_dark.png
10+
- docs/images/nf-core-spatialxe_logo_light.png
111
nf_core_version: 3.2.1
212
repository_type: pipeline
313
template:

README.md

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,20 +40,35 @@ On release, automated continuous integration tests run the pipeline on a full-si
4040

4141
```csv
4242
sample,bundle,image
43-
test_sample,/path/to/xenium-bundle/,/path/to/morphology.ome.tif
43+
test_sample,/path/to/xenium-bundle,/path/to/morphology.ome.tif
4444
```
4545

4646
Now, you can run the pipeline using:
4747

4848
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
4949

50+
## Run image-based segmentation mode <br>
51+
52+
`CELLPOSE -> BAYSOR -> XR-IMPORT_SEGMENTATION -> SPATIALDATA -> QC`
53+
54+
```bash
55+
nextflow run nf-core/spatialxe \
56+
-profile <docker/singularity/.../institute> \
57+
--input samplesheet.csv \
58+
--outdir <OUTDIR> \
59+
--mode image
60+
```
61+
62+
## Run coordinate-based segmentation mode <br>
63+
64+
`PROSEG -> BAYSOR -> XR-IMPORT_SEGMENTATION -> SPATIALDATA -> QC`
65+
5066
```bash
5167
nextflow run nf-core/spatialxe \
5268
-profile <docker/singularity/.../institute> \
5369
--input samplesheet.csv \
5470
--outdir <OUTDIR> \
55-
--imgage_based \
56-
--segmentation cellpose
71+
--mode coordinate
5772
```
5873

5974
> [!WARNING]
@@ -69,7 +84,7 @@ For more details about the output files and reports, please refer to the
6984

7085
## Credits
7186

72-
nf-core/spatialxe was originally written by [Sameesh Kher](https://github.com/khersameesh24) and [Florian Heyl](https://github.com/heylf).
87+
nf-core/spatialxe was originally written by [Sameesh Kher](https://github.com/khersameesh24) and [Florian Heyl](https://github.com/heylf).
7388

7489
We thank the following people for their extensive assistance in the development of this pipeline:
7590

assets/config/xenium.toml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
[data]
2+
x = "x_location"
3+
y = "y_location"
4+
z = "z_location"
5+
gene = "feature_name"
6+
min_molecules_per_gene = 10
7+
exclude_genes = "NegControl*,BLANK_*,antisense_*"
8+
min_molecules_per_cell = 50
9+
10+
[segmentation]
11+
unassigned_prior_label = "UNASSIGNED"
12+
prior_segmentation_confidence = 0.5
13+
14+
[plotting]
15+
min_pixels_per_cell = 10

assets/example_samplesheet.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
sample,bundle,image
2+
xenium_prime_mouse_ileum,/home/user/raw_data/xenium/Xenium_Prime_Mouse_Ileum_tiny_outs,/home/user/raw_data/xenium/Xenium_Prime_Mouse_Ileum_tiny_outs/morphology.ome.tif

assets/samplesheet.csv

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
test_run,https://raw.githubusercontent.com/nf-core/test-datasets/spatialxe/Xenium_Prime_Mouse_Ileum_tiny_outs.tar.gz,morphology.ome.tif
1+
sample,bundle,image
2+
test_run,https://raw.githubusercontent.com/nf-core/test-datasets/spatialxe/Xenium_Prime_Mouse_Ileum_tiny_outs.tar.gz,

conf/modules.config

Lines changed: 2 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -46,73 +46,27 @@ process {
4646
path: { "${params.outdir}/baysor/run" },
4747
mode: params.publish_dir_mode,
4848
]
49-
version = "0.7.1"
50-
baysor_xenium_config =
51-
"""
52-
[data]
53-
x = \\"x_location\\"
54-
y = \\"y_location\\"
55-
z = \\"z_location\\"
56-
gene = \\"feature_name\\"
57-
min_molecules_per_gene = 10
58-
exclude_genes = \\"NegControl*,BLANK_*,antisense_*\\"
59-
min_molecules_per_cell = 50
60-
61-
[segmentation]
62-
unassigned_prior_label = \\"UNASSIGNED\\"
63-
prior_segmentation_confidence = 0.5
64-
65-
[plotting]
66-
min_pixels_per_cell = 10
67-
"""
6849
}
6950

7051
withName: BAYSOR_SEGFREE {
7152
publishDir = [
7253
path: { "${params.outdir}/baysor/segfree" },
7354
mode: params.publish_dir_mode,
7455
]
75-
version = "0.7.1"
76-
baysor_xenium_config =
77-
"""
78-
[data]
79-
x = \\"x_location\\"
80-
y = \\"y_location\\"
81-
z = \\"z_location\\"
82-
gene = \\"feature_name\\"
83-
min_molecules_per_cell = 50
84-
85-
[plotting]
86-
min_pixels_per_cell = 10
87-
"""
8856
}
8957

9058
withName: BAYSOR_CREATE_DATASET {
9159
publishDir = [
9260
path: { "${params.outdir}/baysor/create_dataset" },
9361
mode: params.publish_dir_mode,
9462
]
95-
version = "0.7.1"
9663
}
9764

9865
withName: BAYSOR_PREVIEW {
9966
publishDir = [
10067
path: { "${params.outdir}/baysor/preview" },
10168
mode: params.publish_dir_mode,
10269
]
103-
version = "0.7.1"
104-
baysor_xenium_config =
105-
"""
106-
[data]
107-
x = \\"x_location\\"
108-
y = \\"y_location\\"
109-
z = \\"z_location\\"
110-
gene = \\"feature_name\\"
111-
min_molecules_per_cell = 50
112-
113-
[plotting]
114-
min_pixels_per_cell = 10
115-
"""
11670
}
11771

11872
withName: SEGGER_CREATE_DATASET {
@@ -122,7 +76,6 @@ process {
12276
]
12377
tile_width = "120"
12478
tile_height = "120"
125-
version = "0.1.0"
12679
}
12780

12881
withName: SEGGER_TRAIN {
@@ -133,8 +86,7 @@ process {
13386
batch_size = 4 // larger batch size can speed up training, but requires more memory
13487
devices = 4 // Use multiple GPUs by increasing the devices parameter to further accelerate training
13588
max_epochs = 200 // increasing #epochs can improve model performance with more learning cycles, but extends training time
136-
ext.args = "--init_emb 8 --hidden_channels 32 --num_tx_tokens 500 --out_channels 8 --heads 2 --num_mid_layers 2 --strategy auto --precision 16-mixed"
137-
version = "0.1.0"
89+
ext.args = { "--init_emb 8 --hidden_channels 32 --num_tx_tokens 500 --out_channels 8 --heads 2 --num_mid_layers 2 --strategy auto --precision 16-mixed" }
13890
}
13991

14092
withName: SEGGER_PREDICT {
@@ -144,7 +96,6 @@ process {
14496
]
14597
batch_size = 1 // larger batch size can speed up training, but requires more memory
14698
cc_analysis = "false" // to control connected component analysis
147-
version = "0.1.0"
14899
}
149100

150101
withName: PARQUET_TO_CSV {
@@ -180,7 +131,7 @@ process {
180131
path: { "${params.outdir}/cellpose" },
181132
mode: params.publish_dir_mode,
182133
]
183-
ext.args = "--diameter 9 --channel_axis 0 --save_flows"
134+
ext.args = { "--pretrained_model nuclei --diameter 9 --channel_axis 0 --save_flows" }
184135
}
185136

186137
}

conf/test.config

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,29 @@
1111
*/
1212

1313
process {
14-
resourceLimits = [
14+
15+
withLabel: process_high {
16+
resourceLimits = [
1517
cpus: 8,
16-
memory: '16.GB',
18+
memory: '8.GB',
1719
time: '1.h'
18-
]
20+
]
21+
}
22+
23+
withName: CELLPOSE {
24+
resourceLimits = [
25+
cpus: 4,
26+
memory: '8.GB'
27+
]
28+
}
1929
}
2030

2131
params {
2232
config_profile_name = 'Test profile'
2333
config_profile_description = 'Minimal test dataset to check pipeline function'
2434

25-
// Input data
35+
// Input data
2636
input = "${projectDir}/assets/samplesheet.csv"
2737
outdir = 'results'
28-
38+
mode = 'image'
2939
}

conf/test_full.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,5 @@ params {
1717
// Input data
1818
input = "${projectDir}/assets/samplesheet.csv"
1919
outdir = 'results'
20+
mode = 'image'
2021
}

docs/usage.md

Lines changed: 67 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -6,58 +6,93 @@
66
77
## Introduction
88

9-
<!-- TODO nf-core: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website. -->
10-
119
## Samplesheet input
1210

13-
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
11+
You will need to create a samplesheet with information about the sample you would like to analyse before running the pipeline. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
12+
13+
```csv title="samplesheet.csv"
14+
sample,bundle,image
15+
breast_cancer,/path/to/xenium/bundle,/path/to/morphology.ome.tif
16+
```
17+
18+
| Column | Description |
19+
| -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
20+
| `sample` | `Required`. Custom sample name. It is recommended to follow the same name from the output of the Xenium Onboard Analysis (XOA). Avoid using spaces in the sample name. |
21+
| `bundle` | `Required`. Full path to the Xenium bundle, output of the Xenium Onboard Analysis. |
22+
| `image` | `Optional`. Full path to morphology.ome.tif. If not provided, the morphology.ome.tif from the bundle is considered. |
23+
24+
An [example samplesheet](../assets/example_samplesheet.csv) has been provided with the pipeline.
25+
26+
#### Using the samplesheet
1427

1528
```bash
1629
--input '[path to samplesheet file]'
1730
```
1831

19-
### Multiple runs of the same sample
32+
## Running the pipeline
33+
34+
The typical command for running the pipeline is as follows:
2035

21-
The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:
36+
#### Image-based segmentation mode
2237

23-
```csv title="samplesheet.csv"
24-
sample,fastq_1,fastq_2
25-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
26-
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
27-
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
28-
```
38+
This runs the default image mode:<br>
39+
`CELLPOSE ➔ BAYSOR ➔ XR-IMPORT-SEGMENTATION ➔ SPATIALDATA ➔ QC`
2940

30-
### Full samplesheet
41+
```bash
42+
nextflow run nf-core/spatialxe \
43+
--input ./samplesheet.csv \
44+
--outdir ./results \
45+
--mode image \
46+
-profile <docker/singularity/...>
47+
```
3148

32-
The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.
49+
#### Coordinate-based (transcripts-based) segmentation mode
3350

34-
A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.
51+
This runs the default coordinate mode:<br>
52+
`PROSEG ➔ PROSEG2BAYSOR ➔ XR-IMPORT-SEGMENTATION ➔ SPATIALDATA ➔ QC`
3553

36-
```csv title="samplesheet.csv"
37-
sample,fastq_1,fastq_2
38-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
39-
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
40-
CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
41-
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,
42-
TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,
43-
TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,
44-
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,
54+
```bash
55+
nextflow run nf-core/spatialxe \
56+
--input ./samplesheet.csv \
57+
--outdir ./results \
58+
--mode coordinate \
59+
-profile <docker/singularity/...>
4560
```
4661

47-
| Column | Description |
48-
| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
49-
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
50-
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
51-
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
62+
### Image-based Segmentation mode (--mode image): <br>
5263

53-
An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.
64+
- cellpose
65+
- baysor
66+
- xeniumranger
5467

55-
## Running the pipeline
68+
### Coordinate-based (transcripts-based) Segmentation methods (--mode coordinate): <br>
5669

57-
The typical command for running the pipeline is as follows:
70+
- proseg
71+
- baysor
72+
- segger
73+
74+
#### Run Segmentation with the methods methods mentioned above : <br>
75+
76+
eg: To run proseg segmentation use the `coordinate` mode and the `proseg` segmentation method
77+
78+
```bash
79+
nextflow run nf-core/spatialxe \
80+
--input ./samplesheet.csv \
81+
--outdir ./results \
82+
--mode coordinate \
83+
--segmentation proseg \
84+
-profile <docker/singularity/...>
85+
```
86+
87+
eg: To run cellpose segmentation use the `image` mode and the `cellpose` segmentation method
5888

5989
```bash
60-
nextflow run nf-core/spatialxe --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker
90+
nextflow run nf-core/spatialxe \
91+
--input ./samplesheet.csv \
92+
--outdir ./results \
93+
--mode image \
94+
--segmentation cellpose \
95+
-profile <docker/singularity/...>
6196
```
6297

6398
This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
@@ -89,7 +124,6 @@ with:
89124
```yaml title="params.yaml"
90125
input: './samplesheet.csv'
91126
outdir: './results/'
92-
genome: 'GRCh37'
93127
<...>
94128
```
95129

modules/local/baysor/create_dataset/main.nf

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ process BAYSOR_CREATE_DATASET {
2020
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
2121
error "BAYSOR_CREATE_DATASET module does not support Conda. Please use Docker / Singularity / Podman instead."
2222
}
23-
def VERSION = "${task.version}"
2423

2524
template 'create_dataset.py'
2625

@@ -29,14 +28,13 @@ process BAYSOR_CREATE_DATASET {
2928
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
3029
error "BAYSOR_CREATE_DATASET module does not support Conda. Please use Docker / Singularity / Podman instead."
3130
}
32-
def VERSION = "${task.version}"
3331

3432
"""
3533
touch sampled_transcripts.csv
3634
3735
cat <<-END_VERSIONS > versions.yml
3836
"${task.process}":
39-
Baysor-Preview Create Dataset: $VERSION
37+
baysor: 0.7.1
4038
END_VERSIONS
4139
"""
4240
}

0 commit comments

Comments
 (0)