Document new options in databricks_model_serving resource (#4789)

alexott · nkvuong · web-flow · commit d14ae30aa99e · 2025-06-20T08:30:07.000Z
## Changes
&lt;!-- Summary of your changes that are easy to understand --&gt;

## Tests
&lt;!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
--&gt;

- [x] relevant change in `docs/` folder

---------

Co-authored-by: Vuong &lt;vuong.nguyen@databricks.com&gt;
diff --git a/NEXT_CHANGELOG.md b/NEXT_CHANGELOG.md
@@ -15,6 +15,7 @@
 * Document `tags` attribute in `databricks_pipeline` resource ([#4783](https://github.com/databricks/terraform-provider-databricks/pull/4783)).
 
 * Recommend OAuth instead of PAT in guides ([#4787](https://github.com/databricks/terraform-provider-databricks/pull/4787))
+* Document new options in `databricks_model_serving` resource ([#4789](https://github.com/databricks/terraform-provider-databricks/pull/4789))
 
 ### Exporter
 
diff --git a/docs/resources/model_serving.md b/docs/resources/model_serving.md
@@ -193,9 +193,11 @@ The following arguments are supported:
       * `palm_api_key_plaintext` - The PaLM API key provided as a plaintext string.
 * `entity_name` - The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type `FEATURE_SPEC` in the UC. If it is a UC object, the full name of the object should be given in the form of `catalog_name.schema_name.model_name`.
 * `entity_version` - The version of the model in Databricks Model Registry to be served or empty if the entity is a `FEATURE_SPEC`.
-* `min_provisioned_throughput`- The minimum tokens per second that the endpoint can scale down to.
-* `max_provisioned_throughput` -  The maximum tokens per second that the endpoint can scale up to.
-* `workload_size` - The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are `Small` (4 - 4 provisioned concurrency), `Medium` (8 - 16 provisioned concurrency), and `Large` (16 - 64 provisioned concurrency). If `scale-to-zero` is enabled, the lower bound of the provisioned concurrency for each workload size is 0.
+* `workload_size` - The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are `Small` (4 - 4 provisioned concurrency), `Medium` (8 - 16 provisioned concurrency), and `Large` (16 - 64 provisioned concurrency). If `scale-to-zero` is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Conflicts with `min_provisioned_concurrency` and `max_provisioned_concurrency`.
+* `min_provisioned_concurrency` - The minimum provisioned concurrency that the endpoint can scale down to. Conflicts with `workload_size`.
+* `max_provisioned_concurrency` - The maximum provisioned concurrency that the endpoint can scale up to. Conflicts with `workload_size`.
+* `min_provisioned_throughput` - The minimum tokens per second that the endpoint can scale down to.
+* `max_provisioned_throughput` - The maximum tokens per second that the endpoint can scale up to.
 * `workload_type` - The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is `CPU`. For deep learning workloads, GPU acceleration is available by selecting workload types like `GPU_SMALL` and others. See the available [GPU types](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types).
 * `scale_to_zero_enabled` - Whether the compute resources for the served entity should scale down to zero.
 * `environment_vars` - An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and is subject to change. Example entity environment variables that refer to Databricks secrets: ```{"OPENAI_API_KEY": "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": "{{secrets/my_scope2/my_key2}}"}```