Skip to content

Commit 73b05a5

Browse files
committed
Merge remote-tracking branch 'ds/master' into refactor-for-cloud-providers
2 parents 63f30ce + b914214 commit 73b05a5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+4709
-2392
lines changed

.agent/GPU_TEE_DEPLOYMENT.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# GPU TEE Deployment Guide
2+
3+
Learnings from deploying GPU workloads to Phala Cloud TEE infrastructure.
4+
5+
## Instance Types
6+
7+
Query available instance types:
8+
```bash
9+
curl -s "https://cloud-api.phala.network/api/v1/instance-types" | jq
10+
```
11+
12+
### CPU-only (Intel TDX)
13+
- `tdx.small` through `tdx.8xlarge`
14+
15+
### GPU (H200 + TDX)
16+
- `h200.small` — Single H200 GPU, suitable for inference
17+
- `h200.16xlarge` — Multi-GPU for larger workloads
18+
- `h200.8x.large` — High-memory configuration
19+
20+
## Deployment Commands
21+
22+
### GPU Deployment
23+
```bash
24+
phala deploy -n my-app -c docker-compose.yaml \
25+
--instance-type h200.small \
26+
--region US-EAST-1 \
27+
--image dstack-nvidia-dev-0.5.4.1
28+
```
29+
30+
Key flags:
31+
- `--instance-type h200.small` — Required for GPU access
32+
- `--image dstack-nvidia-dev-0.5.4.1` — NVIDIA development image with GPU drivers
33+
- `--region US-EAST-1` — Region with GPU nodes (gpu-use2)
34+
35+
### Debugging
36+
```bash
37+
# Check CVM status
38+
phala cvms list
39+
40+
# View serial logs (boot + container output)
41+
phala cvms serial-logs <app_id> --tail 100
42+
43+
# Delete CVM
44+
phala cvms delete <name-or-id> --force
45+
```
46+
47+
## Docker Compose GPU Configuration
48+
49+
GPU devices must be explicitly reserved in docker-compose.yaml:
50+
51+
```yaml
52+
services:
53+
my-gpu-app:
54+
image: my-image
55+
deploy:
56+
resources:
57+
reservations:
58+
devices:
59+
- driver: nvidia
60+
count: all
61+
capabilities: [gpu]
62+
```
63+
64+
Without the `deploy.resources.reservations.devices` section, the container will fail with:
65+
```
66+
libcuda.so.1: cannot open shared object file: No such file or directory
67+
```
68+
69+
## vLLM Example
70+
71+
Working docker-compose.yaml for vLLM inference:
72+
73+
```yaml
74+
services:
75+
vllm:
76+
image: vllm/vllm-openai:latest
77+
volumes:
78+
- /var/run/dstack.sock:/var/run/dstack.sock
79+
environment:
80+
- NVIDIA_VISIBLE_DEVICES=all
81+
- HF_TOKEN=${HF_TOKEN:-}
82+
ports:
83+
- "8000:8000"
84+
command: >
85+
--model Qwen/Qwen2.5-1.5B-Instruct
86+
--host 0.0.0.0
87+
--port 8000
88+
--max-model-len 4096
89+
--gpu-memory-utilization 0.8
90+
deploy:
91+
resources:
92+
reservations:
93+
devices:
94+
- driver: nvidia
95+
count: all
96+
capabilities: [gpu]
97+
```
98+
99+
## Endpoint URLs
100+
101+
After deployment, the app is accessible at:
102+
```
103+
https://<app_id>-<port>.dstack-pha-<region>.phala.network
104+
```
105+
106+
Example for vLLM on port 8000:
107+
```bash
108+
# List models
109+
curl https://<app_id>-8000.dstack-pha-use2.phala.network/v1/models
110+
111+
# Chat completion
112+
curl -X POST https://<app_id>-8000.dstack-pha-use2.phala.network/v1/chat/completions \
113+
-H "Content-Type: application/json" \
114+
-d '{"model": "Qwen/Qwen2.5-1.5B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'
115+
```
116+
117+
## vllm-proxy (Response Signing)
118+
119+
vllm-proxy provides response signing and attestation for vLLM inference. It sits between clients and vLLM, signing responses with TEE-derived keys.
120+
121+
### Configuration
122+
123+
**IMPORTANT**: The authentication environment variable is `TOKEN`, not `AUTH_TOKEN`.
124+
125+
```yaml
126+
services:
127+
vllm:
128+
image: vllm/vllm-openai:latest
129+
environment:
130+
- NVIDIA_VISIBLE_DEVICES=all
131+
command: >
132+
--model Qwen/Qwen2.5-1.5B-Instruct
133+
--host 0.0.0.0
134+
--port 8000
135+
--max-model-len 4096
136+
--gpu-memory-utilization 0.8
137+
deploy:
138+
resources:
139+
reservations:
140+
devices:
141+
- driver: nvidia
142+
count: all
143+
capabilities: [gpu]
144+
145+
proxy:
146+
image: phalanetwork/vllm-proxy:v0.2.18
147+
volumes:
148+
- /var/run/dstack.sock:/var/run/dstack.sock # Required for TEE key derivation
149+
environment:
150+
- VLLM_BASE_URL=http://vllm:8000
151+
- MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
152+
- TOKEN=your-secret-token # NOT AUTH_TOKEN
153+
ports:
154+
- "8000:8000"
155+
depends_on:
156+
- vllm
157+
```
158+
159+
### API Endpoints
160+
161+
```bash
162+
# List models (no auth required)
163+
curl https://<endpoint>/v1/models
164+
165+
# Chat completion (requires auth)
166+
curl -X POST https://<endpoint>/v1/chat/completions \
167+
-H "Content-Type: application/json" \
168+
-H "Authorization: Bearer your-secret-token" \
169+
-d '{"model": "Qwen/Qwen2.5-1.5B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'
170+
171+
# Get response signature
172+
curl https://<endpoint>/v1/signature/<chat_id> \
173+
-H "Authorization: Bearer your-secret-token"
174+
175+
# Attestation report
176+
curl https://<endpoint>/v1/attestation/report \
177+
-H "Authorization: Bearer your-secret-token"
178+
```
179+
180+
### Tested Configuration
181+
182+
- Image: `phalanetwork/vllm-proxy:v0.2.18`
183+
- Instance: `h200.small`
184+
- Region: `US-EAST-1`
185+
- Model: `Qwen/Qwen2.5-1.5B-Instruct`
186+
187+
### vllm-proxy Issues
188+
189+
**"Invalid token" error**:
190+
- Check that you're using `TOKEN` environment variable, not `AUTH_TOKEN`
191+
- Verify the token value matches your request header
192+
193+
**"All connection attempts failed" from proxy**:
194+
- vLLM is still loading the model (takes 1-2 minutes after container starts)
195+
- Wait for vLLM to show "Uvicorn running on" in serial logs
196+
197+
**NVML error on attestation**:
198+
- GPU confidential computing attestation may not be fully available
199+
- This doesn't affect inference or response signing
200+
201+
## Common Issues
202+
203+
### "No available resources match your requirements"
204+
- GPU nodes are limited. Wait for other CVMs to finish or try a different region.
205+
- Ensure you're using the correct instance type (`h200.small`).
206+
207+
### Container crashes with GPU errors
208+
- Add `deploy.resources.reservations.devices` section to docker-compose.yaml.
209+
- Verify using NVIDIA development image (`dstack-nvidia-dev-*`).
210+
211+
### Image pull takes too long
212+
- Large images (5GB+ for vLLM) take 3-5 minutes to download and extract.
213+
- Check serial logs for progress.
214+
215+
## Testing Workflow
216+
217+
1. Deploy: `phala deploy -n test -c docker-compose.yaml --instance-type h200.small --region US-EAST-1 --image dstack-nvidia-dev-0.5.4.1`
218+
2. Wait for status: `phala cvms list` (wait for "running")
219+
3. Check logs: `phala cvms serial-logs <app_id> --tail 100`
220+
4. Test API: `curl https://<app_id>-<port>.dstack-pha-use2.phala.network/...`
221+
5. Cleanup: `phala cvms delete <name> --force`
222+
223+
## GPU Wrapper Script
224+
225+
For repeated GPU deployments, use a wrapper script:
226+
227+
```bash
228+
#!/bin/bash
229+
# phala-gpu.sh
230+
source "$(dirname "$0")/.env"
231+
export PHALA_CLOUD_API_KEY=$PHALA_CLOUD_API_GPU
232+
phala "$@"
233+
```
234+
235+
This allows maintaining separate API keys for CPU and GPU workspaces.

.agent/WRITING_GUIDE.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Documentation Writing Guide
2+
3+
Guidelines for writing dstack documentation, README, and marketing content.
4+
5+
## Writing Style
6+
7+
- **Don't over-explain** why a framework is needed — assert the solution, hint at alternatives being insufficient
8+
- **Avoid analogies as taglines** (e.g., "X for Y") — if it's a new category, don't frame it as a better version of something else
9+
- **Problem → Solution flow** without explicit labels like "The problem:" or "The solution:"
10+
- **Demonstrate features through actions**, not parenthetical annotations
11+
- Bad: "Generates quotes (enabling *workload identity*)"
12+
- Good: "Generates TDX attestation quotes so users can verify exactly what's running"
13+
14+
## Procedural Documentation (Guides & Tutorials)
15+
16+
### Test Before You Document
17+
- **Run every command** before documenting it — reading code is not enough
18+
- Commands may prompt for confirmation, require undocumented env vars, or fail silently
19+
- Create a test environment and execute the full flow end-to-end
20+
21+
### Show What Success Looks Like
22+
- **Add sample outputs** after commands so users can verify they're on track
23+
- For deployment commands, show the key values users need to note (addresses, IDs)
24+
- For validation commands, show both success and failure outputs
25+
26+
### Environment Variables
27+
- **List all required env vars explicitly** — don't assume users will discover them
28+
- If multiple tools use similar-but-different var names, clarify which is which
29+
- Show the export pattern once, then reference it in subsequent commands
30+
31+
### Avoid Expert Blind Spots
32+
- If you say "add the hash", explain how to compute the hash
33+
- If you reference a file, explain where to find it
34+
- If a value comes from a previous step, remind users which step
35+
36+
### Cross-Reference Related Docs
37+
- Link to prerequisite guides (don't repeat content)
38+
- Link to detailed guides for optional deep-dives
39+
- Use anchor links for specific sections when possible
40+
41+
## Security Documentation
42+
43+
### Trust Model Framing
44+
45+
**Distinguish trust from verification:**
46+
- "Trust" = cannot be verified, must assume correct (e.g., hardware)
47+
- "Verify" = can be cryptographically proven (e.g., measured software)
48+
49+
**Correct framing:**
50+
- Bad: "You must trust the OS" (when it's verifiable)
51+
- Good: "The OS is measured during boot and recorded in the attestation quote. You verify it by..."
52+
53+
### Limitations: Be Honest, Not Alarmist
54+
55+
State limitations plainly without false mitigations:
56+
- Bad: "X is a single point of failure. Mitigate by running your own X."
57+
- Good: "X is protected by [mechanism]. Like all [category] systems, [inherent limitation]. We are developing [actual solution] to address this."
58+
59+
Don't suggest mitigations that don't actually help. If something is an inherent limitation of the technology, say so.
60+
61+
## Documentation Quality Checklist
62+
63+
From doc-requirements.md:
64+
65+
1. **No bullet point walls** — Max 3-5 bullets before breaking with prose
66+
2. **No redundancy** — Don't present same info from opposite perspectives
67+
3. **Conversational language** — Write like explaining to a peer
68+
4. **Short paragraphs** — Max 4 sentences per paragraph
69+
5. **Lead with key takeaway** — First sentence tells reader why this matters
70+
6. **Active voice** — "TEE encrypts memory" not "Memory is encrypted by TEE"
71+
7. **Minimal em-dashes** — Max 1-2 per page, replace with "because", "so", or separate sentences
72+
73+
### Redundancy Patterns to Avoid
74+
75+
These often say the same thing:
76+
- "What we protect against" + "What you don't need to trust"
77+
- "Security guarantees" + "What attestation proves"
78+
79+
Combine into single sections. One detailed explanation, brief references elsewhere.
80+
81+
## README Structure
82+
83+
### Order Matters
84+
- **Quick Start before Prerequisites** — Lead with what it does, not setup
85+
- **How It Works after Quick Start** — Users want to run it first, understand later
86+
- Cleanup at the end, Further Reading last
87+
88+
### Don't Duplicate
89+
- Link to conceptual docs instead of repeating content
90+
- If an overview README duplicates an example README, cut the overview
91+
- One detailed explanation, brief references elsewhere
92+
93+
### Remove Unrealistic Sections
94+
- If most users can't actually do something (e.g., run locally without special hardware), don't include it
95+
- Don't document workflows that require resources users don't have
96+
97+
### Match the Workflow to the User
98+
- Use tools your audience already knows (e.g., Jupyter for ML practitioners)
99+
- Prefer official/existing images when they exist — don't reinvent
100+
- Make the correct path the default, mention alternatives briefly
101+
102+
## Code Examples
103+
104+
### Question Every Snippet
105+
- Does this code actually demonstrate something meaningful?
106+
- Would a reader understand what it does without the prose?
107+
- `do_thing(b"magic-string")` means nothing — show real use or remove it
108+
109+
### Diagrams
110+
- Mermaid over ASCII art — GitHub renders it nicely
111+
- Keep diagrams simple — 3-5 nodes max
112+
- Label edges with actions, not just arrows
113+
114+
## Conciseness
115+
116+
### Less is More
117+
- 30 lines beats 150 if it says the same thing
118+
- Cut sections that don't help users accomplish their goal
119+
- Tables for reference, prose for explanation — don't over-table
120+
121+
### Performance and Benchmarks
122+
- One memorable number + link to full report
123+
- Don't overwhelm with data the reader didn't ask for
124+
125+
### Reader-First Writing
126+
- Ask "what does the reader want to know?" not "what do I want to say?"
127+
- If a section answers a question nobody asked, cut it
128+
129+
## Maintenance
130+
131+
### Consistency Checks
132+
- After terminology changes, grep for related terms across all files
133+
- Use correct industry/vendor terminology (e.g., "Confidential Computing" not "Encrypted Computing")
134+
135+
### Clean Up Old Files
136+
- When approach changes, delete orphaned files (old scripts, Dockerfiles)
137+
- Don't leave artifacts from previous implementations

0 commit comments

Comments
 (0)