Skip to content

Commit ec852ca

Browse files
authored
[Doc] Add more model serving examples (#4658)
## Changes Add more model serving examples ## Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] relevant change in `docs/` folder
1 parent 5dc517b commit ec852ca

File tree

2 files changed

+79
-5
lines changed

2 files changed

+79
-5
lines changed

NEXT_CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010

1111
### Documentation
1212

13+
* Add more examples for `databricks_model_serving` ([#4658](https://github.com/databricks/terraform-provider-databricks/pull/4658))
14+
1315
### Exporter
1416

1517
* Correctly handle account-level identities when generating the code ([#4650](https://github.com/databricks/terraform-provider-databricks/pull/4650))

docs/resources/model_serving.md

Lines changed: 77 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ This resource allows you to manage [Model Serving](https://docs.databricks.com/m
99

1010
## Example Usage
1111

12+
Creating a CPU serving endpoint
13+
1214
```hcl
1315
resource "databricks_model_serving" "this" {
1416
name = "ads-serving-endpoint"
@@ -41,6 +43,78 @@ resource "databricks_model_serving" "this" {
4143
}
4244
```
4345

46+
Creating a Foundation Model endpoint
47+
48+
```hcl
49+
resource "databricks_model_serving" "llama" {
50+
name = "llama_3_2_3b_instruct"
51+
ai_gateway {
52+
usage_tracking_config {
53+
enabled = true
54+
}
55+
}
56+
config {
57+
served_entities {
58+
name = "meta_llama_v3_2_3b_instruct-3"
59+
entity_name = "system.ai.llama_v3_2_3b_instruct"
60+
entity_version = "2"
61+
scale_to_zero_enabled = true
62+
max_provisioned_throughput = 44000
63+
}
64+
}
65+
}
66+
```
67+
68+
Creating an External Model endpoint
69+
70+
```hcl
71+
resource "databricks_model_serving" "gpt_4o" {
72+
name = "gpt-4o-mini"
73+
ai_gateway {
74+
usage_tracking_config {
75+
enabled = true
76+
}
77+
rate_limits {
78+
calls = 10
79+
key = "endpoint"
80+
renewal_period = "minute"
81+
}
82+
inference_table_config {
83+
enabled = true
84+
table_name_prefix = "gpt-4o-mini"
85+
catalog_name = "ml"
86+
schema_name = "ai_gateway"
87+
}
88+
guardrails {
89+
input {
90+
invalid_keywords = ["SuperSecretProject"]
91+
pii {
92+
behavior = "BLOCK"
93+
}
94+
}
95+
output {
96+
pii {
97+
behavior = "BLOCK"
98+
}
99+
}
100+
}
101+
}
102+
config {
103+
served_entities {
104+
name = "gpt-4o-mini"
105+
external_model {
106+
name = "gpt-4o-mini"
107+
provider = "openai"
108+
task = "llm/v1/chat"
109+
openai_config {
110+
openai_api_key = "{{secrets/llm_scope/openai_api_key}}"
111+
}
112+
}
113+
}
114+
}
115+
}
116+
```
117+
44118
## Argument Reference
45119

46120
The following arguments are supported:
@@ -67,7 +141,7 @@ The following arguments are supported:
67141
* `config` - The config for the external model, which must match the provider. *Note that API keys could be provided either as a reference to the Databricks Secret (parameters without `_plaintext` suffix) or in plain text (parameters with `_plaintext` suffix)!*
68142
* `ai21labs_config` - AI21Labs Config
69143
* `ai21labs_api_key` - The Databricks secret key reference for an AI21Labs API key.
70-
* `ai21labs_api_key_plaintext` - An AI21 Labs API key provided as a plaintext string.
144+
* `ai21labs_api_key_plaintext` - An AI21 Labs API key provided as a plaintext string.
71145
* `anthropic_config` - Anthropic Config
72146
* `anthropic_api_key` - The Databricks secret key reference for an Anthropic API key.
73147
* `anthropic_api_key_plaintext` - The Anthropic API key provided as a plaintext string.
@@ -108,7 +182,7 @@ The following arguments are supported:
108182
* `microsoft_entra_client_secret` - The Databricks secret key reference for a client secret used for Microsoft Entra ID authentication.
109183
* `microsoft_entra_client_secret_plaintext` - The client secret used for Microsoft Entra ID authentication provided as a plaintext string.
110184
* `microsoft_entra_tenant_id` - This field is only required for Azure AD OpenAI and is the Microsoft Entra Tenant ID.
111-
* `openai_api_base` - This is the base URL for the OpenAI API (default: "https://api.openai.com/v1"). For Azure OpenAI, this field is required and is the base URL for the Azure OpenAI API service provided by Azure.
185+
* `openai_api_base` - This is the base URL for the OpenAI API (default: "<https://api.openai.com/v1>"). For Azure OpenAI, this field is required and is the base URL for the Azure OpenAI API service provided by Azure.
112186
* `openai_api_version` - This is an optional field to specify the OpenAI API version. For Azure OpenAI, this field is required and is the version of the Azure OpenAI service to utilize, specified by a date.
113187
* `openai_organization` - This is an optional field to specify the organization in OpenAI or Azure OpenAI.
114188
* `openai_deployment_name` - This field is only required for Azure OpenAI and is the name of the deployment resource for the Azure OpenAI service.
@@ -172,7 +246,7 @@ The following arguments are supported:
172246
* `valid_topics` - The list of allowed topics. Given a chat request, this guardrail flags the request if its topic is not in the allowed topics.
173247
* `safety` - the boolean flag that indicates whether the safety filter is enabled.
174248
* `pii` - Block with configuration for guardrail PII filter:
175-
* `behavior` - a string that describes the behavior for PII filter. Currently only `BLOCK` value is supported.
249+
* `behavior` - a string that describes the behavior for PII filter. Currently only `BLOCK` value is supported.
176250
* `output` - A block with configuration for output guardrail filters. Has the same structure as `input` block.
177251
* `rate_limits` - (Optional) Block describing rate limits for AI gateway. For details see the description of `rate_limits` block above.
178252
* `usage_tracking_config` - (Optional) Block with configuration for payload logging using inference tables. For details see the description of `auto_capture_config` block above.
@@ -219,5 +293,3 @@ The following resources are often used in the same context:
219293
* [databricks_notebook](notebook.md) to manage [Databricks Notebooks](https://docs.databricks.com/notebooks/index.html).
220294
* [databricks_notebook](../data-sources/notebook.md) data to export a notebook from Databricks Workspace.
221295
* [databricks_repo](repo.md) to manage [Databricks Repos](https://docs.databricks.com/repos.html).
222-
223-

0 commit comments

Comments
 (0)