You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: openapi.yaml
+18-12Lines changed: 18 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -7206,17 +7206,6 @@ components:
7206
7206
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same `seed` and parameters should return the same result.
7207
7207
7208
7208
Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.
7209
-
service_level:
7210
-
description: |
7211
-
Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
7212
-
- If set to 'auto', the system will utilize scale tier credits until they are exhausted.
7213
-
- If set to 'default', the request will be processed in the shared cluster.
7214
-
7215
-
When this parameter is set, the response body will include the `service_tier` utilized.
7216
-
type: string
7217
-
enum: ["auto", "default"]
7218
-
nullable: true
7219
-
default: null
7220
7209
stop:
7221
7210
description: &completions_stop_description >
7222
7211
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
@@ -7936,6 +7925,17 @@ components:
7936
7925
Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.
7937
7926
x-oaiMeta:
7938
7927
beta: true
7928
+
service_tier:
7929
+
description: |
7930
+
Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
7931
+
- If set to 'auto', the system will utilize scale tier credits until they are exhausted.
7932
+
- If set to 'default', the request will be processed in the shared cluster.
7933
+
7934
+
When this parameter is set, the response body will include the `service_tier` utilized.
7935
+
type: string
7936
+
enum: ["auto", "default"]
7937
+
nullable: true
7938
+
default: null
7939
7939
stop:
7940
7940
description: |
7941
7941
Up to 4 sequences where the API will stop generating further tokens.
@@ -8077,7 +8077,7 @@ components:
8077
8077
model:
8078
8078
type: string
8079
8079
description: The model used for the chat completion.
8080
-
scale_tier:
8080
+
service_tier:
8081
8081
description: The service tier used for processing the request. This field is only included if the `service_tier` parameter is specified in the request.
8082
8082
type: string
8083
8083
enum: ["scale", "default"]
@@ -8259,6 +8259,12 @@ components:
8259
8259
model:
8260
8260
type: string
8261
8261
description: The model to generate the completion.
8262
+
service_tier:
8263
+
description: The service tier used for processing the request. This field is only included if the `service_tier` parameter is specified in the request.
0 commit comments