You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"
133
133
```
134
134
135
+
## Batch Job Submission
136
+
137
+
The framework supports automatic batch job submission to maximize parallelism in SLURM cluster environments. Instead of submitting jobs one-by-one, it groups test cases into batches and submits entire batches when needed.
138
+
139
+
### Quick Start
140
+
141
+
**Default batch size (5 jobs per batch):**
142
+
```bash
143
+
# Run all tests with default batching
144
+
poetry run pytest --disagg test_disagg.py -s -vv
145
+
146
+
# Run with test list
147
+
poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/all.txt
148
+
```
149
+
150
+
**Custom batch size:**
151
+
```bash
152
+
# Set batch size via command line
153
+
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=10
154
+
155
+
# Set batch size via environment variable
156
+
export DISAGG_BATCH_SIZE=20
157
+
poetry run pytest --disagg test_disagg.py -s -vv
158
+
159
+
# Submit all jobs at once (unlimited batch)
160
+
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=0
161
+
```
162
+
163
+
### How Batch Submission Works
164
+
165
+
```
166
+
Pytest Collection Phase:
167
+
- Collects all test cases (e.g., 100 tests)
168
+
- BatchManager splits them into batches (e.g., 20 batches of 5)
169
+
170
+
Pytest Execution Phase:
171
+
Test 0 runs:
172
+
-> Triggers submission of Batch 0 (jobs 0-4)
173
+
-> Waits for job 0 to complete
174
+
175
+
Test 1-4 run:
176
+
-> Batch 0 already submitted, directly wait for completion
177
+
178
+
Test 5 runs:
179
+
-> Triggers submission of Batch 1 (jobs 5-9)
180
+
-> Waits for job 5 to complete
181
+
182
+
... and so on
183
+
```
184
+
185
+
### Key Benefits
186
+
187
+
- **Parallel Execution**: All jobs in a batch run simultaneously on SLURM cluster
188
+
- **Reduced Wait Time**: Total time ≈ MAX(job time) instead of SUM(job times)
189
+
- **Automatic Management**: No need to manually split test lists
190
+
- **Lazy Loading**: Only submits batches when needed
191
+
192
+
### Configuration Options
193
+
194
+
**Priority**: Command line option > Environment variable > Default (5)
195
+
196
+
**Examples:**
197
+
198
+
```bash
199
+
# Small batch for quick testing
200
+
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=3 \
201
+
--disagg-test-list=./testlist/debug.txt
202
+
203
+
# Large batch for production
204
+
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=50 \
205
+
--disagg-test-list=./testlist/all.txt
206
+
207
+
# Submit all at once
208
+
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=0
209
+
```
210
+
211
+
### Timeout Configuration
212
+
213
+
The default timeout for waiting for job completion is **10 hours (36000 seconds)**, which accounts for:
0 commit comments