Skip to content

Commit 3880d69

Browse files
committed
adding the readme file
1 parent 1dd4c8d commit 3880d69

File tree

1 file changed

+26
-1
lines changed

1 file changed

+26
-1
lines changed

docs/healthcheck/readme.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ By following this blueprint, you can identify and localize issues such as therma
1616

1717
Below is a simplified overview:
1818

19-
<img width="888" alt="Screenshot 2025-03-13 101052" src="https://github.com/user-attachments/assets/723a8861-388c-4585-b53f-778c2d5c73d6" />
19+
<img width="893" alt="image" src="https://github.com/user-attachments/assets/e44f7ffe-19cf-48be-a026-e27fddfbed3c" />
20+
2021

2122
### Key Points
2223

@@ -172,6 +173,30 @@ This is an example of json file which be used to deploy into OCI AI Blueprints:
172173
```
173174
---
174175

176+
## Explanation of Healthcheck Recipe Fields
177+
178+
| Field | Type | Example Value | Description |
179+
|---------------------------------------|-------------|-------------------------------------------------------------------------------|-------------|
180+
| `recipe_id` | string | `"healthcheck"` | Identifier for the recipe |
181+
| `recipe_mode` | string | `"job"` | Whether the recipe runs as a one-time job or a service |
182+
| `deployment_name` | string | `"healthcheck"` | Name of the deployment/job |
183+
| `recipe_image_uri` | string | `"iad.ocir.io/.../healthcheck_v0.3"` | URI of the container image stored in OCI Container Registry |
184+
| `recipe_node_shape` | string | `"VM.GPU.A10.2"` | Compute shape to use for this job |
185+
| `output_object_storage.bucket_name` | string | `"healthcheck2"` | Name of the Object Storage bucket to write results |
186+
| `output_object_storage.mount_location`| string | `"/healthcheck_results"` | Directory inside the container where the bucket will be mounted |
187+
| `output_object_storage.volume_size_in_gbs` | integer | `20` | Storage volume size (GB) for the mounted bucket |
188+
| `recipe_container_command_args` | list | `[--dtype, float16, --output_dir, /healthcheck_results, --expected_gpus, A10:2,A100:0,H100:0]` | Arguments passed to the container |
189+
| `--dtype` | string | `"float16"` | Precision type for computations (e.g. float16, float32) |
190+
| `--output_dir` | string | `"/healthcheck_results"` | Directory for writing output (maps to mounted bucket) |
191+
| `--expected_gpus` | string | `"A10:2,A100:0,H100:0"` | Expected GPU types and counts |
192+
| `recipe_replica_count` | integer | `1` | Number of replicas (containers) to run |
193+
| `recipe_nvidia_gpu_count` | integer | `2` | Number of GPUs to allocate |
194+
| `recipe_node_pool_size` | integer | `1` | Number of nodes to provision |
195+
| `recipe_node_boot_volume_size_in_gbs`| integer | `200` | Size of the boot volume (GB) |
196+
| `recipe_ephemeral_storage_size` | integer | `100` | Ephemeral scratch storage size (GB) |
197+
| `recipe_shared_memory_volume_size_limit_in_mb` | integer | `1000` | Size of shared memory volume (`/dev/shm`) in MB |
198+
| `recipe_use_shared_node_pool` | boolean | `true` | Whether to run on a shared node pool |
199+
175200
## 8. Contact
176201

177202
For questions or additional information, open an issue in this blueprint or contact the maintainers directly.

0 commit comments

Comments
 (0)