Skip to content

Commit bbe2d9f

Browse files
committed
adding the redame file
1 parent 3880d69 commit bbe2d9f

File tree

1 file changed

+169
-0
lines changed

1 file changed

+169
-0
lines changed

docs/cpu_inference/readme.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# CPU Inference with Ollama
2+
3+
This blueprint explains how to use CPU inference for running large language models using Ollama. It includes two main deployment strategies:
4+
- Serving pre-saved models directly from Object Storage
5+
- Pulling models from Ollama and saving them to Object Storage
6+
7+
---
8+
9+
## Why CPU Inference?
10+
11+
CPU inference is ideal for:
12+
- Low-throughput or cost-sensitive deployments
13+
- Offline testing and validation
14+
- Prototyping without GPU dependency
15+
16+
---
17+
18+
## Supported Models
19+
20+
Ollama supports several high-quality open-source LLMs. Below is a small set of commonly used models:
21+
22+
| Model Name | Description |
23+
|------------|--------------------------------|
24+
| gemma | Lightweight open LLM by Google |
25+
| llama2 | Meta’s large language model |
26+
| mistral | Open-weight performant LLM |
27+
| phi3 | Microsoft’s compact LLM |
28+
29+
---
30+
31+
## Deploying with OCI AI Blueprint
32+
33+
### Running Ollama Models from Object Storage
34+
35+
If you've already pushed your model to **Object Storage**, use the following service-mode recipe to run it. Ensure your model files are in the **blob + manifest** format used by Ollama.
36+
37+
#### Recipe Configuration
38+
39+
| Field | Description |
40+
|--------------------------------|------------------------------------------------|
41+
| recipe_id | `cpu_inference` – Identifier for the recipe |
42+
| recipe_mode | `service` – Used for long-running inference |
43+
| deployment_name | Custom name for the deployment |
44+
| recipe_image_uri | URI for the container image in OCIR |
45+
| recipe_node_shape | OCI shape, e.g., `BM.Standard.E4.128` |
46+
| input_object_storage | Object Storage bucket mounted as input |
47+
| recipe_container_env | List of environment variables |
48+
| recipe_replica_count | Number of replicas |
49+
| recipe_container_port | Port to expose the container |
50+
| recipe_node_pool_size | Number of nodes in the pool |
51+
| recipe_node_boot_volume_size_in_gbs | Boot volume size in GB |
52+
| recipe_container_command_args | Arguments for the container command |
53+
| recipe_ephemeral_storage_size | Temporary scratch storage |
54+
55+
#### Sample Recipe (Service Mode)
56+
```json
57+
{
58+
"recipe_id": "cpu_inference",
59+
"recipe_mode": "service",
60+
"deployment_name": "gemma and BME4 service",
61+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:cpu_inference_service_v0.2",
62+
"recipe_node_shape": "BM.Standard.E4.128",
63+
"input_object_storage": [
64+
{
65+
"bucket_name": "ollama-models",
66+
"mount_location": "/models",
67+
"volume_size_in_gbs": 20
68+
}
69+
],
70+
"recipe_container_env": [
71+
{ "key": "MODEL_NAME", "value": "gemma" },
72+
{ "key": "PROMPT", "value": "What is the capital of France?" }
73+
],
74+
"recipe_replica_count": 1,
75+
"recipe_container_port": "11434",
76+
"recipe_node_pool_size": 1,
77+
"recipe_node_boot_volume_size_in_gbs": 200,
78+
"recipe_container_command_args": [
79+
"--input_directory", "/models", "--model_name", "gemma"
80+
],
81+
"recipe_ephemeral_storage_size": 100
82+
}
83+
```
84+
85+
---
86+
87+
### Accessing the API
88+
89+
Once deployed, send inference requests to the model via the exposed port:
90+
91+
```bash
92+
curl http://<PUBLIC_IP>:11434/api/generate -d '{
93+
"model": "gemma",
94+
"prompt": "What is the capital of France?",
95+
"stream": false
96+
}'
97+
```
98+
99+
### Example Public Inference Calls
100+
```bash
101+
curl -L POST https://cpu-inference-mismistral.130-162-199-33.nip.io/api/generate \
102+
-d '{ "model": "mistral", "prompt": "What is the capital of Germany?" }' \
103+
| jq -r 'select(.response) | .response' | paste -sd " "
104+
105+
curl -L -k POST https://cpu-inference-mistral-flexe4.130-162-199-33.nip.io/api/generate \
106+
-d '{ "model": "mistral", "prompt": "What is the capital of Germany?" }' \
107+
| jq -r 'select(.response) | .response' | paste -sd " "
108+
```
109+
---
110+
111+
### Pulling from Ollama and Saving to Object Storage
112+
113+
To download a model from Ollama and store it in Object Storage, use the job-mode recipe below.
114+
115+
#### Recipe Configuration
116+
117+
| Field | Description |
118+
|--------------------------------|------------------------------------------------|
119+
| recipe_id | `cpu_inference` – Same recipe base |
120+
| recipe_mode | `job` – One-time job to save a model |
121+
| deployment_name | Custom name for the saving job |
122+
| recipe_image_uri | OCIR URI of the saver image |
123+
| recipe_node_shape | Compute shape used for the job |
124+
| output_object_storage | Where to store pulled models |
125+
| recipe_container_env | Environment variables including model name |
126+
| recipe_replica_count | Set to 1 |
127+
| recipe_container_port | Typically `11434` for Ollama |
128+
| recipe_node_pool_size | Set to 1 |
129+
| recipe_node_boot_volume_size_in_gbs | Size in GB |
130+
| recipe_container_command_args | Set output directory and model name |
131+
| recipe_ephemeral_storage_size | Temporary storage |
132+
133+
#### Sample Recipe (Job Mode)
134+
```json
135+
{
136+
"recipe_id": "cpu_inference",
137+
"recipe_mode": "job",
138+
"deployment_name": "gemma and BME4 saver",
139+
"recipe_image_uri": "iad.ocir.io/iduyx1qnmway/corrino-devops-repository:cpu_inference_saver_v0.2",
140+
"recipe_node_shape": "BM.Standard.E4.128",
141+
"output_object_storage": [
142+
{
143+
"bucket_name": "ollama-models",
144+
"mount_location": "/models",
145+
"volume_size_in_gbs": 20
146+
}
147+
],
148+
"recipe_container_env": [
149+
{ "key": "MODEL_NAME", "value": "gemma" },
150+
{ "key": "PROMPT", "value": "What is the capital of France?" }
151+
],
152+
"recipe_replica_count": 1,
153+
"recipe_container_port": "11434",
154+
"recipe_node_pool_size": 1,
155+
"recipe_node_boot_volume_size_in_gbs": 200,
156+
"recipe_container_command_args": [
157+
"--output_directory", "/models", "--model_name", "gemma"
158+
],
159+
"recipe_ephemeral_storage_size": 100
160+
}
161+
```
162+
163+
---
164+
165+
## Final Notes
166+
167+
- Ensure all OCI IAM permissions are set to allow Object Storage access.
168+
- Confirm that bucket region and deployment region match.
169+
- Use the job-mode recipe once to save a model, and the service-mode recipe repeatedly to serve it.

0 commit comments

Comments
 (0)