@@ -21,42 +21,32 @@ nextflow run seqeralabs/nf-proteindesign \
2121
2222## :material-file-table: Samplesheet Format
2323
24- The samplesheet determines which mode the pipeline runs in.
25-
26- ### Mode Auto-Detection
27-
28- The pipeline automatically detects the mode based on column headers:
29-
30- | Column Present | Mode | Description |
31- | ----------------| ------| -------------|
32- | ` design_yaml ` | Design | Use pre-made YAML files |
33- | ` target_structure ` | Target/Binder | Generate design variants |
34- ### Required Columns by Mode
35-
36- === "Design Mode"
37- | Column | Required | Description |
38- |--------|----------|-------------|
39- | ` sample ` | ✅ | Unique sample identifier |
40- | ` design_yaml ` | ✅ | Path to design YAML file |
41-
42- === "Target Mode"
43- | Column | Required | Description |
44- |--------|----------|-------------|
45- | ` sample ` | ✅ | Unique sample identifier |
46- | ` target_structure ` | ✅ | Path to target structure (PDB/CIF) |
47- | ` target_residues ` | Optional | Binding site residues (comma-separated) |
48- | ` chain_type ` | Optional | Type: ` protein ` , ` peptide ` , ` nanobody ` |
49- | ` min_length ` | Optional | Minimum binder length |
50- | ` max_length ` | Optional | Maximum binder length |
51-
52- === "Binder Mode"
53- | Column | Required | Description |
54- |--------|----------|-------------|
55- | ` sample ` | ✅ | Unique sample identifier |
56- | ` target_structure ` | ✅ | Path to target structure (PDB/CIF) |
57- | ` chain_type ` | Optional | Type: ` protein ` , ` peptide ` , ` nanobody ` |
58- | ` min_length ` | Optional | Minimum binder length |
59- | ` max_length ` | Optional | Maximum binder length |
24+ The pipeline uses a CSV samplesheet to specify design jobs. Each row represents a separate design run.
25+
26+ ### Required Columns
27+
28+ | Column | Required | Description |
29+ | --------| ----------| -------------|
30+ | ` sample ` | ✅ | Unique sample identifier |
31+ | ` design_yaml ` | ✅ | Path to design YAML file (see below) |
32+
33+ ### Optional Columns
34+
35+ Additional columns can override default parameters per sample:
36+
37+ | Column | Type | Description |
38+ | --------| ------| -------------|
39+ | ` num_designs ` | Integer | Number of designs to generate (overrides ` --num_designs ` ) |
40+ | ` budget ` | Integer | Number of final designs to keep (overrides ` --budget ` ) |
41+
42+ ### Example Samplesheet
43+
44+ ``` csv
45+ sample,design_yaml,num_designs,budget
46+ protein_binder,designs/egfr_binder.yaml,10000,50
47+ nanobody_design,designs/spike_nanobody.yaml,5000,20
48+ peptide_binder,designs/il6_peptide.yaml,3000,10
49+ ```
6050
6151## :material-file-document: Design YAML Format
6252
@@ -149,14 +139,14 @@ results/
149139
150140## :material-play-circle: Example Workflows
151141
152- ### Example 1: Simple Design Mode
142+ ### Example 1: Basic Protein Design
153143
154144``` bash
155145# 1. Create design YAML
156- cat > my_design .yaml << EOF
157- name: antibody_target
146+ cat > protein_design .yaml << EOF
147+ name: egfr_binder
158148target:
159- structure: data/target .pdb
149+ structure: data/egfr .pdb
160150 residues: [10, 11, 12, 45, 46]
161151designed:
162152 chain_type: protein
168158# 2. Create samplesheet
169159cat > samples.csv << EOF
170160sample,design_yaml
171- design1,my_design .yaml
161+ egfr_binder,protein_design .yaml
172162EOF
173163
174164# 3. Run pipeline
@@ -178,43 +168,54 @@ nextflow run seqeralabs/nf-proteindesign \
178168 --outdir results
179169```
180170
181- ### Example 2: Target Mode with Analysis
171+ ### Example 2: Multiple Designs with Analysis
182172
183173``` bash
184- # 1. Create samplesheet
185- cat > targets.csv << EOF
186- sample,target_structure,target_residues,chain_type,min_length,max_length
187- egfr,data/egfr.pdb,"10,11,12,45,46",protein,60,120
188- spike,data/spike.cif,"417,484,501",nanobody,110,130
174+ # 1. Create design YAMLs for different targets
175+ cat > egfr_design.yaml << EOF
176+ name: egfr_binder
177+ target:
178+ structure: data/egfr.pdb
179+ residues: [10, 11, 12, 45, 46]
180+ designed:
181+ chain_type: protein
182+ length: [60, 120]
189183EOF
190184
191- # 2. Run with affinity prediction
185+ cat > spike_design.yaml << EOF
186+ name: spike_nanobody
187+ target:
188+ structure: data/spike.cif
189+ residues: [417, 484, 501]
190+ designed:
191+ chain_type: nanobody
192+ length: [110, 130]
193+ EOF
194+
195+ # 2. Create samplesheet
196+ cat > samples.csv << EOF
197+ sample,design_yaml,num_designs,budget
198+ egfr_binder,egfr_design.yaml,10000,50
199+ spike_nanobody,spike_design.yaml,5000,20
200+ EOF
201+
202+ # 3. Run with analysis modules
192203nextflow run seqeralabs/nf-proteindesign \
193204 -profile docker \
194- --mode target \
195- --input targets.csv \
205+ --input samples.csv \
196206 --outdir results \
197- --n_samples 30 \
198- --run_prodigy
207+ --run_proteinmpnn \
208+ --run_protenix_refold \
209+ --run_prodigy \
210+ --run_consolidation
199211```
200212
201- ### Example 3: Binder Mode (No Binding Site)
213+ ### Example 3: Test Run
202214
203215``` bash
204- # 1. Create samplesheet
205- cat > binders.csv << EOF
206- sample,target_structure,chain_type,min_length,max_length
207- binder1,data/target1.pdb,protein,50,100
208- binder2,data/target2.pdb,nanobody,110,130
209- EOF
210-
211- # 2. Run pipeline
216+ # Use built-in test profile
212217nextflow run seqeralabs/nf-proteindesign \
213- -profile docker \
214- --mode binder \
215- --input binders.csv \
216- --outdir results \
217- --n_samples 20
218+ -profile test_design_protein,docker
218219```
219220
220221## :material-refresh: Resume Failed Runs
@@ -348,9 +349,9 @@ nextflow run ... --n_samples 10 # Reduce batch size
348349
349350## :material-arrow-right: Next Steps
350351
351- - Learn about [ Pipeline Modes] ( ../modes/overview.md ) in detail
352352- Check the [ Quick Reference] ( quick-reference.md ) for common commands
353353- Explore [ Analysis Tools] ( ../analysis/prodigy.md ) integration
354+ - Review [ Pipeline Parameters] ( ../reference/parameters.md ) for advanced configuration
354355
355356---
356357
0 commit comments