Skip to content

Commit d14ae30

Browse files
alexottnkvuong
andauthored
Document new options in databricks_model_serving resource (#4789)
## Changes <!-- Summary of your changes that are easy to understand --> ## Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] relevant change in `docs/` folder --------- Co-authored-by: Vuong <[email protected]>
1 parent 8fa03d6 commit d14ae30

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
* Document `tags` attribute in `databricks_pipeline` resource ([#4783](https://github.com/databricks/terraform-provider-databricks/pull/4783)).
1616

1717
* Recommend OAuth instead of PAT in guides ([#4787](https://github.com/databricks/terraform-provider-databricks/pull/4787))
18+
* Document new options in `databricks_model_serving` resource ([#4789](https://github.com/databricks/terraform-provider-databricks/pull/4789))
1819

1920
### Exporter
2021

docs/resources/model_serving.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -193,9 +193,11 @@ The following arguments are supported:
193193
* `palm_api_key_plaintext` - The PaLM API key provided as a plaintext string.
194194
* `entity_name` - The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type `FEATURE_SPEC` in the UC. If it is a UC object, the full name of the object should be given in the form of `catalog_name.schema_name.model_name`.
195195
* `entity_version` - The version of the model in Databricks Model Registry to be served or empty if the entity is a `FEATURE_SPEC`.
196-
* `min_provisioned_throughput`- The minimum tokens per second that the endpoint can scale down to.
197-
* `max_provisioned_throughput` - The maximum tokens per second that the endpoint can scale up to.
198-
* `workload_size` - The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are `Small` (4 - 4 provisioned concurrency), `Medium` (8 - 16 provisioned concurrency), and `Large` (16 - 64 provisioned concurrency). If `scale-to-zero` is enabled, the lower bound of the provisioned concurrency for each workload size is 0.
196+
* `workload_size` - The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are `Small` (4 - 4 provisioned concurrency), `Medium` (8 - 16 provisioned concurrency), and `Large` (16 - 64 provisioned concurrency). If `scale-to-zero` is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Conflicts with `min_provisioned_concurrency` and `max_provisioned_concurrency`.
197+
* `min_provisioned_concurrency` - The minimum provisioned concurrency that the endpoint can scale down to. Conflicts with `workload_size`.
198+
* `max_provisioned_concurrency` - The maximum provisioned concurrency that the endpoint can scale up to. Conflicts with `workload_size`.
199+
* `min_provisioned_throughput` - The minimum tokens per second that the endpoint can scale down to.
200+
* `max_provisioned_throughput` - The maximum tokens per second that the endpoint can scale up to.
199201
* `workload_type` - The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is `CPU`. For deep learning workloads, GPU acceleration is available by selecting workload types like `GPU_SMALL` and others. See the available [GPU types](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types).
200202
* `scale_to_zero_enabled` - Whether the compute resources for the served entity should scale down to zero.
201203
* `environment_vars` - An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and is subject to change. Example entity environment variables that refer to Databricks secrets: ```{"OPENAI_API_KEY": "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": "{{secrets/my_scope2/my_key2}}"}```

0 commit comments

Comments
 (0)