Skip to content

Commit 5528f3b

Browse files
authored
fix: update planner + docs now that profiling results are stored in /data (#4098)
Signed-off-by: Hannah Zhang <[email protected]>
1 parent 83a3fe4 commit 5528f3b

File tree

7 files changed

+49
-16
lines changed

7 files changed

+49
-16
lines changed

deploy/utils/README.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,25 +92,38 @@ python3 -m deploy.utils.inject_manifest \
9292
--dest /data/configs/disagg.yaml
9393
```
9494

95-
**Download benchmark/profiling results:**
95+
**Download benchmark results:**
9696

9797
```bash
98-
# After benchmarking or profiling completes, download results
98+
# After benchmarking completes, download results
9999
python3 -m deploy.utils.download_pvc_results \
100100
--namespace $NAMESPACE \
101-
--output-dir ./pvc_files \
101+
--output-dir ./benchmarks/results \
102102
--folder /data/results \
103103
--no-config # optional: skip *.yaml/*.yml in the download
104104
```
105105

106+
**Download profiling results (optional, for local inspection):**
107+
108+
```bash
109+
# Optional: Download profiling data for local analysis
110+
# The planner reads directly from the PVC, so this is only needed for inspection
111+
python3 -m deploy.utils.download_pvc_results \
112+
--namespace $NAMESPACE \
113+
--output-dir ./profiling_data \
114+
--folder /data
115+
```
116+
117+
> **Note on Profiling Results**: When using DGDR (DynamoGraphDeploymentRequest) for SLA-driven profiling, profiling data is stored in `/data/` on the PVC. The planner component reads this data directly from the PVC, so downloading is **optional** - only needed if you want to inspect the profiling results locally (e.g., view performance plots, check configurations).
118+
106119
#### Path Requirements
107120

108121
**Important**: The PVC is mounted at `/data` in the access pod for security reasons. All destination paths must start with `/data/`.
109122

110123
**Common path patterns:**
111124
- `/data/configs/` - Configuration files (DGD manifests)
112-
- `/data/results/` - Benchmark results
113-
- `/data/profiling_results/` - Profiling data
125+
- `/data/results/` - Benchmark results (for download after benchmarking jobs)
126+
- `/data/` - Profiling data (used directly by planner, typically not downloaded)
114127
- `/data/benchmarking/` - Benchmarking artifacts
115128

116129
**User-friendly error messages**: If you forget the `/data/` prefix, the script will show a helpful error message with the correct path and example commands.

deploy/utils/download_pvc_results.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ def main():
182182
parser.add_argument(
183183
"--folder",
184184
required=True,
185-
help="Absolute folder path in the PVC to download, must start with /data/, e.g. /data/profiling_results or /data/benchmarking_results",
185+
help="Absolute folder path in the PVC to download, must start with /data",
186186
)
187187

188188
args = parser.parse_args()
@@ -192,10 +192,6 @@ def main():
192192
print("❌ Error: Folder path must start with '/data/'")
193193
print(f" Provided: {args.folder}")
194194
print(" Quick Fix: Add '/data/' prefix to your path")
195-
print(" Examples:")
196-
print(" /profiling_results → /data/profiling_results")
197-
print(" /benchmarking_results → /data/benchmarking_results")
198-
print(" /configs → /data/configs")
199195
sys.exit(1)
200196

201197
print("📥 PVC Results Download")

deploy/utils/inject_manifest.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,6 @@ def main():
134134
print("🔍 Common patterns:")
135135
print(" /configs/file.yaml → /data/configs/file.yaml")
136136
print(" /results/data.yaml → /data/results/data.yaml")
137-
print(" /profiling_results/... → /data/profiling_results/...")
138137
print("=" * 60)
139138
sys.exit(1)
140139

docs/planner/sla_planner_quickstart.md

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -345,14 +345,18 @@ DGDRs are **immutable** - if you need to update SLAs or configuration:
345345

346346
### Manual Deployment Control
347347

348-
Disable auto-deployment to review configurations before deploying:
348+
There are two ways to manually control deployment after profiling:
349+
350+
#### Option 1: Use DGDR-Generated Configuration (Recommended)
351+
352+
Disable auto-deployment to review the generated DGD before applying:
349353

350354
```yaml
351355
spec:
352356
autoApply: false
353357
```
354358

355-
Then manually apply the generated DGD:
359+
Then manually extract and apply the generated DGD:
356360

357361
```bash
358362
# Extract generated config
@@ -365,6 +369,27 @@ vi my-dgd.yaml
365369
kubectl apply -f my-dgd.yaml -n $NAMESPACE
366370
```
367371

372+
The generated DGD includes optimized configurations and the SLA planner component.
373+
374+
#### Option 2: Use Standalone Planner Templates (Advanced)
375+
376+
For advanced use cases, you can manually deploy using the standalone planner templates in `examples/backends/*/deploy/disagg_planner.yaml`:
377+
378+
```bash
379+
# After profiling completes, profiling data is stored on the PVC at /data
380+
381+
# Optional: Download profiling results for local inspection
382+
python3 -m deploy.utils.download_pvc_results \
383+
--namespace $NAMESPACE \
384+
--output-dir ./profiling_data \
385+
--folder /data
386+
387+
# Update backend planner manifest as needed, then deploy
388+
kubectl apply -f examples/backends/<backend>/deploy/disagg_planner.yaml -n $NAMESPACE
389+
```
390+
391+
> **Note**: The standalone templates are provided as examples and may need customization for your model and requirements. The DGDR-generated configuration (Option 1) is recommended as it's automatically tuned to your profiling results and SLA targets.
392+
368393
### Relationship to DynamoGraphDeployment (DGD)
369394

370395
- **DGDR**: High-level "intent" - what you want deployed

examples/backends/sglang/deploy/disagg_planner.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ spec:
3737
- --environment=kubernetes
3838
- --backend=sglang
3939
- --adjustment-interval=60
40-
- --profile-results-dir=/data/profiling_results
40+
- --profile-results-dir=/data
4141
decode:
4242
dynamoNamespace: dynamo
4343
envFromSecret: hf-token-secret

examples/backends/trtllm/deploy/disagg_planner.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ spec:
5757
- --environment=kubernetes
5858
- --backend=trtllm
5959
- --adjustment-interval=60
60-
- --profile-results-dir=/data/profiling_results
60+
- --profile-results-dir=/data
6161
- --prometheus-port=9085
6262
TRTLLMDecodeWorker:
6363
dynamoNamespace: trtllm-disagg-planner

examples/backends/vllm/deploy/disagg_planner.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ spec:
3636
- --environment=kubernetes
3737
- --backend=vllm
3838
- --adjustment-interval=60
39-
- --profile-results-dir=/data/profiling_results
39+
- --profile-results-dir=/data
4040
VllmDecodeWorker:
4141
dynamoNamespace: vllm-disagg-planner
4242
envFromSecret: hf-token-secret

0 commit comments

Comments
 (0)