You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Document new options in databricks_model_serving resource (#4789)
## Changes
<!-- Summary of your changes that are easy to understand -->
## Tests
<!--
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->
- [x] relevant change in `docs/` folder
---------
Co-authored-by: Vuong <[email protected]>
Copy file name to clipboardExpand all lines: docs/resources/model_serving.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -193,9 +193,11 @@ The following arguments are supported:
193
193
*`palm_api_key_plaintext` - The PaLM API key provided as a plaintext string.
194
194
*`entity_name` - The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type `FEATURE_SPEC` in the UC. If it is a UC object, the full name of the object should be given in the form of `catalog_name.schema_name.model_name`.
195
195
*`entity_version` - The version of the model in Databricks Model Registry to be served or empty if the entity is a `FEATURE_SPEC`.
196
-
*`min_provisioned_throughput`- The minimum tokens per second that the endpoint can scale down to.
197
-
*`max_provisioned_throughput` - The maximum tokens per second that the endpoint can scale up to.
198
-
*`workload_size` - The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are `Small` (4 - 4 provisioned concurrency), `Medium` (8 - 16 provisioned concurrency), and `Large` (16 - 64 provisioned concurrency). If `scale-to-zero` is enabled, the lower bound of the provisioned concurrency for each workload size is 0.
196
+
*`workload_size` - The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are `Small` (4 - 4 provisioned concurrency), `Medium` (8 - 16 provisioned concurrency), and `Large` (16 - 64 provisioned concurrency). If `scale-to-zero` is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Conflicts with `min_provisioned_concurrency` and `max_provisioned_concurrency`.
197
+
*`min_provisioned_concurrency` - The minimum provisioned concurrency that the endpoint can scale down to. Conflicts with `workload_size`.
198
+
*`max_provisioned_concurrency` - The maximum provisioned concurrency that the endpoint can scale up to. Conflicts with `workload_size`.
199
+
*`min_provisioned_throughput` - The minimum tokens per second that the endpoint can scale down to.
200
+
*`max_provisioned_throughput` - The maximum tokens per second that the endpoint can scale up to.
199
201
*`workload_type` - The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is `CPU`. For deep learning workloads, GPU acceleration is available by selecting workload types like `GPU_SMALL` and others. See the available [GPU types](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types).
200
202
*`scale_to_zero_enabled` - Whether the compute resources for the served entity should scale down to zero.
201
203
*`environment_vars` - An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and is subject to change. Example entity environment variables that refer to Databricks secrets: ```{"OPENAI_API_KEY": "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": "{{secrets/my_scope2/my_key2}}"}```
0 commit comments