Skip to content

Commit 6a74908

Browse files
Updated llama-stack blueprint and docs. (#108)
* Updated llama-stack blueprint and docs. * Updated CPU inference PAR to 2027 exp. * Update llamastack docs for fallback testing. * Anchored llamastack version. * Versioned all llama-stack containers.
1 parent d7179de commit 6a74908

File tree

5 files changed

+52
-32
lines changed

5 files changed

+52
-32
lines changed

docs/sample_blueprints/model_serving/cpu-inference/cpu-inference-gemma.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"recipe_node_shape": "BM.Standard.E5.192",
77
"input_object_storage": [
88
{
9-
"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/3qaZRZ0A38V-k0A0eYPqx8XPB06V2WLTj6zOYXKYK97k--yNzEqcV3qsa0MdUcr3/n/iduyx1qnmway/b/ollama-models/o/",
9+
"par": "https://iduyx1qnmway.objectstorage.us-ashburn-1.oci.customer-oci.com/p/ActTC68_vMHU92rTYGp-XUiGQrE_P6Jl22b5OPIlcTHMzMjSS99_TAgSVsk_8zmQ/n/iduyx1qnmway/b/ollama-models/o/",
1010
"mount_location": "/models",
1111
"volume_size_in_gbs": 20
1212
}

docs/sample_blueprints/model_serving/cpu-inference/cpu-inference-mistral-bm.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"recipe_node_shape": "BM.Standard.E4.128",
77
"input_object_storage": [
88
{
9-
"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/3qaZRZ0A38V-k0A0eYPqx8XPB06V2WLTj6zOYXKYK97k--yNzEqcV3qsa0MdUcr3/n/iduyx1qnmway/b/ollama-models/o/",
9+
"par": "https://iduyx1qnmway.objectstorage.us-ashburn-1.oci.customer-oci.com/p/ActTC68_vMHU92rTYGp-XUiGQrE_P6Jl22b5OPIlcTHMzMjSS99_TAgSVsk_8zmQ/n/iduyx1qnmway/b/ollama-models/o/",
1010
"mount_location": "/models",
1111
"volume_size_in_gbs": 20
1212
}

docs/sample_blueprints/model_serving/cpu-inference/cpu-inference-mistral-vm.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"recipe_flex_shape_memory_size_in_gbs": 64,
99
"input_object_storage": [
1010
{
11-
"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/3qaZRZ0A38V-k0A0eYPqx8XPB06V2WLTj6zOYXKYK97k--yNzEqcV3qsa0MdUcr3/n/iduyx1qnmway/b/ollama-models/o/",
11+
"par": "https://iduyx1qnmway.objectstorage.us-ashburn-1.oci.customer-oci.com/p/ActTC68_vMHU92rTYGp-XUiGQrE_P6Jl22b5OPIlcTHMzMjSS99_TAgSVsk_8zmQ/n/iduyx1qnmway/b/ollama-models/o/",
1212
"mount_location": "/models",
1313
"volume_size_in_gbs": 20
1414
}

docs/sample_blueprints/partner_blueprints/llama-stack/README.md

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,20 +50,39 @@ To test your llama stack implementation please follow the steps below.
5050

5151
2. Install uv command line interface tool via the steps [here](https://docs.astral.sh/uv/getting-started/installation/)
5252

53-
3. Clone the following repo: [https://github.com/meta-llama/llama-stack-evals](https://github.com/meta-llama/llama-stack-evals)
53+
3. Clone the following repo: [https://github.com/meta-llama/llama-verifications](https://github.com/meta-llama/llama-verifications)
5454

5555
4. Go to your llama-stack deployment and grab the `Public Endpoint` (ex: `llamastack-app7.129-213-194-241.nip.io`)
5656

5757
5. Run the following curl command to test the model list feature: `curl http://<llama_stack_deployment_endpoint>/v1/openai/v1/models`
5858

5959
6. You can use llama-stack-evals repo (which you previously cloned) to run verifications / benchmark evaluations against this llama stack deployments’s OpenAI endpoint. Note: If you are using the blueprint unmodified (aka using the NousResearch/Meta-Llama-3.1-8B-Instruct model, some of the tests will fail on purpose since this tests multi-modal inputs which this model does not support)
6060

61+
**Note**: It is possible for this test to fail if the self-signed certificate hasn't finished generating yet. The errors indicate this like:
6162
```
62-
cd llama-stack-evals # make sure you are in the llama-stack-evals repo
63+
E httpx.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1010)
64+
```
65+
If you see this message, it does not mean that llama-stack isn't working, just that these tests won't succeed until certs are generated.
66+
67+
```
68+
cd llama-verifications # make sure you are in the llama-verifications repo
6369
64-
uvx llama-stack-evals run-tests --openai-compat-endpoint http://<llama_stack_deployment_endpoint>/v1/openai/v1 --model "<MODEL_YOU_USED_IN_VLLM_DEPLOYMENT>"
70+
export OPENAI_API_KEY="t" # dummy key
71+
uvx llama-verifications run-tests --openai-compat-endpoint http://<llama_stack_deployment_endpoint>/v1/openai/v1 --model "<MODEL_YOU_USED_IN_VLLM_DEPLOYMENT>"
6572
66-
# ex: uvx llama-stack-evals run-tests --openai-compat-endpoint http://llamastack-app7.129-213-194-241.nip.io/v1/openai/v1 --model "NousResearch/Meta-Llama-3.1-8B-Instruct"
73+
# ex: uvx llama-verifications run-tests --openai-compat-endpoint http://llamastack-app7.129-213-194-241.nip.io/v1/openai/v1 --model "Meta-Llama-3.1-8B-Instruct"
74+
```
75+
An additional way to test with `curl` if the certs have not finished (-k allows insecure):
76+
```bash
77+
curl -Lk -X POST http://<llama_stack_deployment_endpoint>/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d '{
78+
"model": "Meta-Llama-3.1-8B-Instruct",
79+
"messages": [
80+
{"role": "system", "content": "You are a helpful assistant."},
81+
{"role": "user", "content": "Hello! Can you tell me a fun fact about GPUs?"}
82+
],
83+
"max_tokens": 100,
84+
"temperature": 0.7
85+
}'
6786
```
6887

6988
## How to Use It

docs/sample_blueprints/partner_blueprints/llama-stack/llama_stack_basic.json

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"recipe_flex_shape_memory_size_in_gbs": 16,
1515
"recipe_node_boot_volume_size_in_gbs": 200,
1616
"recipe_ephemeral_storage_size": 100,
17-
"recipe_image_uri": "docker.io/library/postgres:latest",
17+
"recipe_image_uri": "docker.io/library/postgres:13",
1818
"recipe_container_port": "5432",
1919
"recipe_host_port": "5432",
2020
"recipe_container_env": [
@@ -47,7 +47,7 @@
4747
"recipe_flex_shape_memory_size_in_gbs": 16,
4848
"recipe_node_boot_volume_size_in_gbs": 200,
4949
"recipe_ephemeral_storage_size": 100,
50-
"recipe_image_uri": "docker.io/chromadb/chroma:latest",
50+
"recipe_image_uri": "docker.io/chromadb/chroma:1.0.20",
5151
"recipe_container_port": "8000",
5252
"recipe_host_port": "8000",
5353
"recipe_container_env": [
@@ -77,7 +77,7 @@
7777
"recipe_id": "llm_inference_nvidia",
7878
"deployment_name": "vllm",
7979
"recipe_mode": "service",
80-
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:vllmv0.6.6.pos1",
80+
"recipe_image_uri": "docker.io/vllm/vllm-openai:v0.9.1",
8181
"recipe_node_shape": "VM.GPU.A10.2",
8282
"input_object_storage": [
8383
{
@@ -87,33 +87,30 @@
8787
"include": ["NousResearch/Meta-Llama-3.1-8B-Instruct"]
8888
}
8989
],
90-
"recipe_container_env": [
91-
{
92-
"key": "tensor_parallel_size",
93-
"value": "2"
94-
},
95-
{
96-
"key": "model_name",
97-
"value": "NousResearch/Meta-Llama-3.1-8B-Instruct"
98-
},
99-
{
100-
"key": "Model_Path",
101-
"value": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct"
102-
}
103-
],
10490
"recipe_replica_count": 1,
10591
"recipe_container_port": "8000",
10692
"recipe_nvidia_gpu_count": 2,
10793
"recipe_node_pool_size": 1,
10894
"recipe_node_boot_volume_size_in_gbs": 200,
10995
"recipe_container_command_args": [
11096
"--model",
111-
"$(Model_Path)",
97+
"/models/NousResearch/Meta-Llama-3.1-8B-Instruct",
11298
"--tensor-parallel-size",
113-
"$(tensor_parallel_size)"
99+
"2",
100+
"--served-model-name",
101+
"Meta-Llama-3.1-8B-Instruct"
114102
],
115103
"recipe_ephemeral_storage_size": 100,
116-
"recipe_shared_memory_volume_size_limit_in_mb": 200
104+
"recipe_shared_memory_volume_size_limit_in_mb": 200,
105+
"recipe_readiness_probe_params": {
106+
"endpoint_path": "/health",
107+
"port": 8000,
108+
"scheme": "HTTP",
109+
"initial_delay_seconds": 20,
110+
"period_seconds": 30,
111+
"success_threshold": 1,
112+
"timeout_seconds": 10
113+
}
117114
},
118115
"exports": ["internal_dns_name"]
119116
},
@@ -129,7 +126,7 @@
129126
"recipe_flex_shape_memory_size_in_gbs": 16,
130127
"recipe_node_boot_volume_size_in_gbs": 200,
131128
"recipe_ephemeral_storage_size": 100,
132-
"recipe_image_uri": "docker.io/jaegertracing/jaeger:latest",
129+
"recipe_image_uri": "docker.io/jaegertracing/jaeger:2.9.0",
133130
"recipe_container_port": "16686",
134131
"recipe_additional_ingress_ports": [
135132
{
@@ -154,12 +151,12 @@
154151
"recipe_flex_shape_memory_size_in_gbs": 16,
155152
"recipe_node_boot_volume_size_in_gbs": 200,
156153
"recipe_ephemeral_storage_size": 100,
157-
"recipe_image_uri": "docker.io/llamastack/distribution-postgres-demo:latest",
154+
"recipe_image_uri": "docker.io/llamastack/distribution-postgres-demo:0.2.18",
158155
"recipe_container_port": "8321",
159156
"recipe_container_env": [
160157
{
161158
"key": "INFERENCE_MODEL",
162-
"value": "/models/NousResearch/Meta-Llama-3.1-8B-Instruct"
159+
"value": "Meta-Llama-3.1-8B-Instruct"
163160
},
164161
{
165162
"key": "VLLM_URL",
@@ -173,6 +170,10 @@
173170
"key": "CHROMADB_URL",
174171
"value": "http://${chroma.internal_dns_name}:8000"
175172
},
173+
{
174+
"key": "ENABLE_POSTGRES",
175+
"value": "1"
176+
},
176177
{
177178
"key": "POSTGRES_HOST",
178179
"value": "${postgres.internal_dns_name}"
@@ -198,8 +199,8 @@
198199
"value": "console,otel_trace"
199200
},
200201
{
201-
"key": "OTEL_TRACE_ENDPOINT",
202-
"value": "http://${jaeger.internal_dns_name}/jaeger/v1/traces"
202+
"key": "OTEL_EXPORTER_OTLP_ENDPOINT",
203+
"value": "http://${jaeger.internal_dns_name}/jaeger/"
203204
}
204205
],
205206
"output_object_storage": [

0 commit comments

Comments
 (0)