Skip to content

Commit ee02eeb

Browse files
Deployment Groups Feature README (#92)
* Deployment Groups README * fix typo * fix typo * fixes due to Vishnu's comments * fix title
1 parent e64a23e commit ee02eeb

File tree

2 files changed

+581
-0
lines changed

2 files changed

+581
-0
lines changed
Lines changed: 363 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,363 @@
1+
# Deployment Groups
2+
3+
#### Connected multi-container deployments in a single blueprint
4+
5+
Deployment Groups let you spin up several deployments — each derived from its own blueprint — in a single `POST /deployment` request and treat them as one cohesive application. OCI AI Blueprints automatically sequences those member deployments according to the depends_on relationships you declare, publishes each deployment’s outputs (such as service URLs or internal dns name) for easy discovery, and then injects those outputs wherever you reference the placeholder `${deployment_name.export_key}` inside downstream blueprints. What once required a series of separate API calls stitched together with hard-coded endpoints can now be expressed declaratively in one step, with OCI AI Blueprints resolving every cross-service connection at runtime.
6+
7+
## Pre-Filled Samples
8+
9+
| Feature Showcase | Title | Description | Blueprint File |
10+
| ----------------------------------------------------------------------------------- | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ |
11+
| Create multiple deployments using deployment groups using Llama Stack as an example | Deployment Groups Showcase: Llama Stack | Deploys Postgres, ChromaDB, vLLM and Jaegar as separate deployments at once and waits until those deployments have been successfully deployed before deploying the Llama Stack deployment that depends on the initial deployments. We are also using export variables from the Postgres, ChromaDB, vLLM and Jaegar deployments in the Llama Stack deployment. | [llama_stack_basic.json](llama_stack_basic.json) |
12+
13+
---
14+
15+
# In-Depth Feature Overview
16+
17+
## What Are Deployment Groups?
18+
19+
Deployment Groups is a powerful feature that allows you to deploy multiple interconnected containers as a single logical application. Instead of managing individual deployments separately, you can define complex multi-service applications with automatic dependency management, service discovery, and dynamic value injection between components.
20+
21+
## Key Features
22+
23+
### 1. **Dependency Management**
24+
25+
- Define dependencies between sub-deployments using `depends_on`
26+
- Automatic topological sorting ensures correct deployment order
27+
- Circular dependency detection prevents invalid configurations
28+
- Dependent services wait for their dependencies to be ready
29+
30+
### 2. **Service Discovery & Value Injection**
31+
32+
- Sub-deployments can export values (service URLs, internal DNS names, etc.)
33+
- Reference values from other deployments dynamically using `${deployment_name.export_var_name}` syntax (e.g., `${database.internal_dns_name}` automatically becomes the actual internal dns name from the "database" sub_deployment when deployed)
34+
- Dynamic value resolution at deployment time
35+
- Works in any blueprint field: image URIs, ports, environment variables, command args, etc.
36+
37+
### 3. **Unified Management**
38+
39+
- Deploy multiple services (blueprints) with a single POST /deployment API call
40+
- Group-level operations for deployment, monitoring, and cleanup
41+
- Consistent lifecycle management across all components
42+
43+
### 4. **Backward Compatibility**
44+
45+
- Existing single deployments continue to work unchanged
46+
- Same API endpoints (`/deployment`) for both single deployments and deployment groups
47+
- No breaking changes to current workflows
48+
49+
## API Endpoints
50+
51+
### Main Endpoints
52+
53+
| Endpoint | Method | Description |
54+
| ------------------------- | ------ | ----------------------------------------------------- |
55+
| `/deployment` | POST | Create single deployment OR deployment group |
56+
| `/deployment_groups` | GET | List all deployment groups |
57+
| `/deployment_groups/{id}` | GET | Get specific deployment group details |
58+
| `/undeploy` | POST | Undeploy single deployment OR entire deployment group |
59+
60+
### Deployment Groups API Response
61+
62+
**List Deployment Groups** (`GET /deployment_groups`)
63+
64+
```json
65+
{
66+
"deployment_groups": [
67+
{
68+
"deployment_group_id": "abc123",
69+
"deployment_group_name": "llama-stack",
70+
"creation_date": "2025-01-15 14:30:00 UTC",
71+
"deployments": [
72+
{
73+
"mode": "service",
74+
"recipe_id": "postgres",
75+
"deployment_uuid": "def456",
76+
"deployment_name": "postgres-llama-stack",
77+
"sub_deployment_name": "postgres",
78+
"deployment_status": "monitoring",
79+
"creation_date": "2025-01-15 14:30:00 UTC"
80+
},
81+
{
82+
"mode": "service",
83+
"recipe_id": "vllm",
84+
"deployment_uuid": "ghi789",
85+
"deployment_name": "vllm-llama-stack",
86+
"sub_deployment_name": "vllm",
87+
"deployment_status": "monitoring",
88+
"creation_date": "2025-01-15 14:31:00 UTC"
89+
}
90+
]
91+
}
92+
]
93+
}
94+
```
95+
96+
**Get Specific Deployment Group** (`GET /deployment_groups/{id}`)
97+
98+
```json
99+
{
100+
"deployment_group_id": "abc123",
101+
"deployment_group_name": "llama-stack",
102+
"creation_date": "2025-01-15 14:30:00 UTC",
103+
"deployments": [
104+
{
105+
"mode": "service",
106+
"recipe_id": "postgres",
107+
"deployment_uuid": "def456",
108+
"deployment_name": "postgres-llama-stack",
109+
"sub_deployment_name": "postgres",
110+
"deployment_status": "monitoring",
111+
"creation_date": "2025-01-15 14:30:00 UTC"
112+
}
113+
]
114+
}
115+
```
116+
117+
## Schema Structure
118+
119+
### Deployment Group Schema
120+
121+
```json
122+
{
123+
"deployment_group": {
124+
"name": "string",
125+
"deployments": [
126+
{
127+
"name": "string",
128+
"recipe": {
129+
// Standard recipe configuration
130+
"deployment_name": "string",
131+
"recipe_mode": "service|job|update|shared_node_pool|team"
132+
// ... all other recipe fields
133+
},
134+
"depends_on": ["string"], // Optional: names of dependencies
135+
"exports": ["string"] // Optional: values to export
136+
}
137+
]
138+
}
139+
}
140+
```
141+
142+
### Available Export Types
143+
144+
- `service_url` - Public endpoint (load balancer URL)
145+
- `internal_dns_name` - Cluster-internal DNS record for service-to-service communication
146+
147+
## How It Works
148+
149+
### 1. Deployment Creation
150+
151+
When you submit a deployment group, the system:
152+
153+
1. **Validates** the configuration and checks for circular dependencies
154+
2. **Parses** sub-deployments and resolves dependency order
155+
3. **Creates** individual deployments with unique names
156+
4. **Schedules** deployments based on dependency relationships
157+
158+
### 2. Dependency Resolution
159+
160+
- Sub-deployments without dependencies start immediately
161+
- Dependent sub-deployments wait in `SCHEDULED` state
162+
- As dependencies reach `MONITORING` state, dependent deployments activate
163+
- Automatic retry logic handles temporary export collection failures
164+
165+
### 3. Value Injection
166+
167+
- When a sub-deployment becomes ready, its exports are collected
168+
- Placeholders like `${postgres.internal_dns_name}` are resolved
169+
- Recipe configurations are updated with actual values
170+
- Dependent deployments deploy with resolved configurations
171+
172+
## Real-World Example: LLaMA Stack Application
173+
174+
An example can be found here: [llama-stack_blueprint](llama_stack_basic.json)
175+
176+
### What Happens During LLaMA Stack Blueprint Deployment:
177+
178+
1. **Postgres deploys first** (no dependencies)
179+
180+
- Exports: `internal_dns_name: "postgres-llama-stack.default.svc.cluster.local"`
181+
182+
2. **vLLM deploys second** (no dependencies)
183+
184+
- Exports: `internal_dns_name: "vllm-llama-stack.default.svc.cluster.local"`
185+
186+
3. **ChromaDB deploys third** (no dependencies)
187+
188+
- Exports: `internal_dns_name: "chroma-llama-stack.default.svc.cluster.local"`
189+
190+
4. **Jager deploys fourth** (no dependencies)
191+
192+
- Exports: `internal_dns_name: "jaeger-llama-stack.default.svc.cluster.local"`
193+
194+
5. **LLaMA Stack App deploys last** (depends on postgres, vllm, chromadb and jager deployments)
195+
- Environment variables get resolved:
196+
- `VLLM_URL: "http://vllm-llama-stack.default.svc.cluster.local/v1"`
197+
- `POSTGRES_HOST: "postgres-llama-stack.default.svc.cluster.local"`
198+
- `CHROMADB_URL: "http://chroma-llama-stack.default.svc.cluster.local:8000"`
199+
- `OTEL_TRACE_ENDPOINT: "http://jaeger-llama-stack.default.svc.cluster.local/jaeger/v1/traces"`
200+
201+
## Advanced Features
202+
203+
### 1. **Flexible Value Injection**
204+
205+
The `${sub_deployment.export}` syntax works in any blueprint field:
206+
207+
```json
208+
{
209+
"name": "api-service",
210+
"depends_on": ["database", "config"],
211+
"recipe": {
212+
"recipe_image_uri": "registry.com/${config.app_version}/api",
213+
"recipe_container_port": "${config.port}",
214+
"recipe_container_command_args": [
215+
"--database-url",
216+
"${database.internal_dns_name}",
217+
"--config-endpoint",
218+
"${config.service_url}"
219+
],
220+
"recipe_container_env": [
221+
{ "key": "DB_HOST", "value": "${database.internal_dns_name}" },
222+
{ "key": "CONFIG_URL", "value": "${config.service_url}" }
223+
]
224+
}
225+
}
226+
```
227+
228+
### 2. **Group-Level Operations**
229+
230+
**Undeploy Entire Group:**
231+
232+
```bash
233+
POST /undeploy
234+
{
235+
"deployment_group_id": "abc123"
236+
}
237+
```
238+
239+
- Undeploys all sub-deployments in reverse dependency order
240+
- Ensures dependents are removed before their dependencies
241+
242+
**Monitor Group Status:**
243+
244+
```bash
245+
GET /deployment_groups/abc123
246+
```
247+
248+
- Shows status of all sub-deployments
249+
- Tracks deployment progress and any issues
250+
251+
### 3. **Error Handling & Resilience**
252+
253+
The system includes robust error handling:
254+
255+
- **Export Collection Failures**: If a service isn't ready to export values, dependent deployments wait for the next processing cycle
256+
- **Dependency Validation**: Circular dependencies and missing references are caught during validation
257+
- **Graceful Degradation**: Recipe properties are still available even if export collection fails
258+
259+
### 4. **Validation Features**
260+
261+
Built-in validation ensures:
262+
263+
- No circular dependencies
264+
- All dependencies exist in the group
265+
- Required exports are declared
266+
- Proper schema compliance
267+
- Unique sub-deployment names
268+
269+
## Migration Guide
270+
271+
### From Single Recipes to Deployment Groups
272+
273+
**Before (Multiple API calls):**
274+
275+
```bash
276+
# Deploy database
277+
POST /deployment
278+
{"recipe_mode": "service", "deployment_name": "my-db", ...}
279+
280+
# Deploy API (manually specify database URL)
281+
POST /deployment
282+
{"recipe_mode": "service", "deployment_name": "my-api",
283+
"recipe_container_env": [{"key": "DB_URL", "value": "hardcoded-url"}]}
284+
```
285+
286+
**After (Single API call):**
287+
288+
```bash
289+
POST /deployment
290+
{
291+
"deployment_group": {
292+
"name": "my-app",
293+
"deployments": [
294+
{
295+
"name": "database",
296+
"recipe": {"deployment_name": "my-db", ...},
297+
"exports": ["internal_dns_name"]
298+
},
299+
{
300+
"name": "api",
301+
"depends_on": ["database"],
302+
"recipe": {
303+
"deployment_name": "my-api",
304+
"recipe_container_env": [
305+
{"key": "DB_URL", "value": "${database.internal_dns_name}"}
306+
]
307+
}
308+
}
309+
]
310+
}
311+
}
312+
```
313+
314+
## Best Practices
315+
316+
1. **Use Meaningful Names**: Choose descriptive names for sub-deployments
317+
2. **Minimize Dependencies**: Keep coupling between services as loose as possible
318+
3. **Export Only What's Needed**: Only export values that other services actually use
319+
4. **Plan Deployment Order**: Consider the logical deployment sequence when designing dependencies
320+
5. **Handle Failures Gracefully**: Design services to handle temporary unavailability of dependencies
321+
6. **Use Internal DNS**: Prefer `internal_dns_name` over `service_url` for service-to-service communication
322+
7. **Group Related Services**: Keep logically related services together in the same deployment group
323+
324+
## Troubleshooting
325+
326+
### Common Issues:
327+
328+
1. **"Circular dependency detected"**
329+
330+
- Check your `depends_on` relationships for cycles
331+
- Ensure no sub-deployment depends on itself directly or indirectly
332+
333+
2. **"Missing dependency"**
334+
335+
- Verify all names in `depends_on` match actual sub-deployment names
336+
- Check for typos in dependency names
337+
338+
3. **"Export not available"**
339+
340+
- Ensure the exporting service declares the export in its `exports` array
341+
- Check that the service is in `MONITORING` state
342+
- Verify the export type is supported (`service_url`, `internal_dns_name`)
343+
344+
4. **"Dependencies not satisfied"**
345+
- Check the status of dependency services
346+
- Look for any failures in the dependency chain
347+
- Review sub_deployment logs for export collection errors
348+
349+
### Monitoring Deployment Groups:
350+
351+
Use the deployment groups API to monitor progress:
352+
353+
```bash
354+
GET /deployment_groups/{id}
355+
```
356+
357+
Look for:
358+
359+
- Sub-deployment statuses (`scheduled`, `active`, `monitoring`, `failed`)
360+
- Creation timestamps to track deployment progress
361+
- Any sub-deployments stuck in `scheduled` state (indicates dependency issues)
362+
363+
The Deployment Groups feature transforms complex multi-service applications from a manual orchestration challenge into a declarative, automated deployment experience while maintaining full backward compatibility with existing workflows.

0 commit comments

Comments
 (0)