Skip to content

Commit e64a23e

Browse files
Llama Stack with Single Deployment Group (#90)
* initial llama stack with deployment groups changes * Update llama_stack_basic.json to replace service_url with internal_dns_name in exports and environment variable configuration * Update blueprint description
1 parent 1043d10 commit e64a23e

File tree

7 files changed

+223
-208
lines changed

7 files changed

+223
-208
lines changed

docs/sample_blueprints/llama-stack/README.md

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,15 @@
22

33
#### Pre-packaged GenAI runtime — vLLM + ChromaDB + Postgres (optional Jaeger) ready for one-click deployment
44

5-
Deploy Llama Stack on OCI via OCI AI Blueprints. In order to get the full Llama Stack Application up and running, you will need to deploy the following pre-filled samples in a specific order. Before deploying the pre-filled samples, make sure to have two object storage buckets created in the same compartment that OCI AI Blueprints is deployed into named `chromadb` and `llamastack`.
5+
Deploy Llama Stack on OCI via OCI AI Blueprints. For more information on Llama Stack: https://github.com/meta-llama/llama-stack
66

7-
Order of Pre-Filled Sample Deployments:
8-
9-
1. vLLM Inference Engine
10-
2. Postgres DB
11-
3. Chroma DB
12-
4. Jaegar
13-
5. Llama Stack Main App
7+
We are using Postgres for the backend store, chromaDB for the vector database, Jaeger for tracing and vLLM for inference serving.
148

159
## Pre-Filled Samples
1610

17-
| Feature Showcase | Title | Description | Blueprint File |
18-
| ------------------------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------- |
19-
| vLLM inference engine for large language model serving | vLLM Inference with Llama 3.1 8B Instruct | Deploys a vLLM inference service running NousResearch/Meta-Llama-3.1-8B-Instruct model with GPU acceleration on VM.GPU.A10.2 nodes. | [vllm_llama_stack.json](vllm_llama_stack.json) |
20-
| PostgreSQL database backend for Llama Stack data persistence | PostgreSQL Database for Llama Stack | Deploys a PostgreSQL database instance that serves as the primary data store for Llama Stack application state and metadata. | [postgres_db.json](postgres_db.json) |
21-
| ChromaDB vector database for retrieval-augmented generation (RAG) capabilities | ChromaDB Vector Database | Deploys ChromaDB vector database with persistent storage for embedding storage and similarity search in RAG workflows. | [chroma_db.json](chroma_db.json) |
22-
| Jaeger distributed tracing for observability and telemetry | Jaeger Tracing Service | Deploys Jaeger for distributed tracing and telemetry collection to monitor and debug Llama Stack operations. | [jaegar.json](jaegar.json) |
23-
| Main Llama Stack application that orchestrates all components | Llama Stack Main Application | Deploys the main Llama Stack application that connects to vLLM, PostgreSQL, ChromaDB, and Jaeger to provide a unified API for inference, RAG, and telemetry. | [llamastack.json](llamastack.json) |
11+
| Feature Showcase | Title | Description | Blueprint File |
12+
| ------------------------------------ | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ |
13+
| Full Llama Stack Basic Configuration | Llama 3.1 8B Model with vLLM | Deploys a Llama Stack on OCI AI Blueprints with Postgres, ChromaDB, vLLM and Jaegar. Uses Llama 3.1 8B model on one A10 VM to showcase the usage of LLama Stack on OCI | [llama_stack_basic.json](llama_stack_basic.json) |
2414

2515
---
2616

docs/sample_blueprints/llama-stack/chroma_db.json

Lines changed: 0 additions & 32 deletions
This file was deleted.

docs/sample_blueprints/llama-stack/jaegar.json

Lines changed: 0 additions & 21 deletions
This file was deleted.
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
{
2+
"deployment_group": {
3+
"name": "group",
4+
"deployments": [
5+
{
6+
"name": "postgres",
7+
"recipe": {
8+
"recipe_id": "postgres",
9+
"deployment_name": "postgres",
10+
"recipe_mode": "service",
11+
"recipe_node_pool_size": 1,
12+
"recipe_node_shape": "VM.Standard.E4.Flex",
13+
"recipe_flex_shape_ocpu_count": 2,
14+
"recipe_flex_shape_memory_size_in_gbs": 16,
15+
"recipe_node_boot_volume_size_in_gbs": 200,
16+
"recipe_ephemeral_storage_size": 100,
17+
"recipe_image_uri": "docker.io/library/postgres:latest",
18+
"recipe_container_port": "5432",
19+
"recipe_host_port": "5432",
20+
"recipe_container_env": [
21+
{
22+
"key": "POSTGRES_USER",
23+
"value": "llamastack"
24+
},
25+
{
26+
"key": "POSTGRES_PASSWORD",
27+
"value": "llamastack"
28+
},
29+
{
30+
"key": "POSTGRES_DB",
31+
"value": "llamastack"
32+
}
33+
],
34+
"recipe_replica_count": 1
35+
},
36+
"exports": ["internal_dns_name"]
37+
},
38+
{
39+
"name": "chroma",
40+
"recipe": {
41+
"recipe_id": "chromadb",
42+
"deployment_name": "chroma",
43+
"recipe_mode": "service",
44+
"recipe_node_pool_size": 1,
45+
"recipe_node_shape": "VM.Standard.E4.Flex",
46+
"recipe_flex_shape_ocpu_count": 2,
47+
"recipe_flex_shape_memory_size_in_gbs": 16,
48+
"recipe_node_boot_volume_size_in_gbs": 200,
49+
"recipe_ephemeral_storage_size": 100,
50+
"recipe_image_uri": "docker.io/chromadb/chroma:latest",
51+
"recipe_container_port": "8000",
52+
"recipe_host_port": "8000",
53+
"recipe_container_env": [
54+
{
55+
"key": "IS_PERSISTENT",
56+
"value": "TRUE"
57+
},
58+
{
59+
"key": "ANONYMIZED_TELEMETRY",
60+
"value": "FALSE"
61+
}
62+
],
63+
"recipe_replica_count": 1,
64+
"output_object_storage": [
65+
{
66+
"bucket_name": "chromadb",
67+
"mount_location": "/chroma/chroma",
68+
"volume_size_in_gbs": 500
69+
}
70+
]
71+
},
72+
"exports": ["internal_dns_name"]
73+
},
74+
{
75+
"name": "vllm",
76+
"recipe": {
77+
"recipe_id": "llm_inference_nvidia",
78+
"deployment_name": "vllm",
79+
"recipe_mode": "service",
80+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.6.pos1",
81+
"recipe_node_shape": "VM.GPU.A10.2",
82+
"input_object_storage": [
83+
{
84+
"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/IFknABDAjiiF5LATogUbRCcVQ9KL6aFUC1j-P5NSeUcaB2lntXLaR935rxa-E-u1/n/iduyx1qnmway/b/corrino_hf_oss_models/o/",
85+
"mount_location": "/models",
86+
"volume_size_in_gbs": 500,
87+
"include": ["NousResearch/Meta-Llama-3.1-8B-Instruct"]
88+
}
89+
],
90+
"recipe_container_env": [
91+
{
92+
"key": "tensor_parallel_size",
93+
"value": "2"
94+
},
95+
{
96+
"key": "model_name",
97+
"value": "NousResearch/Meta-Llama-3.1-8B-Instruct"
98+
},
99+
{
100+
"key": "Model_Path",
101+
"value": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct"
102+
}
103+
],
104+
"recipe_replica_count": 1,
105+
"recipe_container_port": "8000",
106+
"recipe_nvidia_gpu_count": 2,
107+
"recipe_node_pool_size": 1,
108+
"recipe_node_boot_volume_size_in_gbs": 200,
109+
"recipe_container_command_args": [
110+
"--model",
111+
"$(Model_Path)",
112+
"--tensor-parallel-size",
113+
"$(tensor_parallel_size)"
114+
],
115+
"recipe_ephemeral_storage_size": 100,
116+
"recipe_shared_memory_volume_size_limit_in_mb": 200
117+
},
118+
"exports": ["internal_dns_name"]
119+
},
120+
{
121+
"name": "jaeger",
122+
"recipe": {
123+
"recipe_id": "jaeger",
124+
"deployment_name": "jaeger",
125+
"recipe_mode": "service",
126+
"recipe_node_pool_size": 1,
127+
"recipe_node_shape": "VM.Standard.E4.Flex",
128+
"recipe_flex_shape_ocpu_count": 2,
129+
"recipe_flex_shape_memory_size_in_gbs": 16,
130+
"recipe_node_boot_volume_size_in_gbs": 200,
131+
"recipe_ephemeral_storage_size": 100,
132+
"recipe_image_uri": "docker.io/jaegertracing/jaeger:latest",
133+
"recipe_container_port": "16686",
134+
"recipe_additional_ingress_ports": [
135+
{
136+
"name": "jaeger",
137+
"port": 4318,
138+
"path": "/jaeger"
139+
}
140+
],
141+
"recipe_replica_count": 1
142+
},
143+
"exports": ["internal_dns_name"]
144+
},
145+
{
146+
"name": "llamastack_app",
147+
"recipe": {
148+
"recipe_id": "llamastack_app",
149+
"deployment_name": "llamastack_app",
150+
"recipe_mode": "service",
151+
"recipe_node_pool_size": 1,
152+
"recipe_node_shape": "VM.Standard.E4.Flex",
153+
"recipe_flex_shape_ocpu_count": 2,
154+
"recipe_flex_shape_memory_size_in_gbs": 16,
155+
"recipe_node_boot_volume_size_in_gbs": 200,
156+
"recipe_ephemeral_storage_size": 100,
157+
"recipe_image_uri": "docker.io/llamastack/distribution-postgres-demo:latest",
158+
"recipe_container_port": "8321",
159+
"recipe_container_env": [
160+
{
161+
"key": "INFERENCE_MODEL",
162+
"value": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct"
163+
},
164+
{
165+
"key": "VLLM_URL",
166+
"value": "http://${vllm.internal_dns_name}/v1"
167+
},
168+
{
169+
"key": "ENABLE_CHROMADB",
170+
"value": "1"
171+
},
172+
{
173+
"key": "CHROMADB_URL",
174+
"value": "http://${chroma.internal_dns_name}:8000"
175+
},
176+
{
177+
"key": "POSTGRES_HOST",
178+
"value": "${postgres.internal_dns_name}"
179+
},
180+
{
181+
"key": "POSTGRES_PORT",
182+
"value": "5432"
183+
},
184+
{
185+
"key": "POSTGRES_DB",
186+
"value": "llamastack"
187+
},
188+
{
189+
"key": "POSTGRES_USER",
190+
"value": "llamastack"
191+
},
192+
{
193+
"key": "POSTGRES_PASSWORD",
194+
"value": "llamastack"
195+
},
196+
{
197+
"key": "TELEMETRY_SINKS",
198+
"value": "console,otel_trace"
199+
},
200+
{
201+
"key": "OTEL_TRACE_ENDPOINT",
202+
"value": "http://${jaeger.internal_dns_name}/jaeger/v1/traces"
203+
}
204+
],
205+
"output_object_storage": [
206+
{
207+
"bucket_name": "llamastack",
208+
"mount_location": "/root/.llama",
209+
"volume_size_in_gbs": 100
210+
}
211+
],
212+
"recipe_replica_count": 1
213+
},
214+
"depends_on": ["postgres", "chroma", "vllm", "jaeger"]
215+
}
216+
]
217+
}
218+
}

docs/sample_blueprints/llama-stack/llamastack.json

Lines changed: 0 additions & 67 deletions
This file was deleted.

0 commit comments

Comments
 (0)