Skip to content

Commit 8e40390

Browse files
authored
Merge branch 'main' into sg-next-apr30
2 parents 6c322fb + cf3a634 commit 8e40390

File tree

2 files changed

+167
-78
lines changed

2 files changed

+167
-78
lines changed

docs/cody/enterprise/model-config-examples.mdx

Lines changed: 162 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -133,22 +133,47 @@ Below are configuration examples for setting up various LLM providers using BYOK
133133
],
134134
"modelOverrides": [
135135
{
136-
"modelRef": "anthropic::2024-10-22::claude-3.5-sonnet",
137-
"displayName": "Claude 3.5 Sonnet",
138-
"modelName": "claude-3-5-sonnet-latest",
136+
"modelRef": "anthropic::2024-10-22::claude-3-7-sonnet-latest",
137+
"displayName": "Claude 3.7 Sonnet",
138+
"modelName": "claude-3-7-sonnet-latest",
139139
"capabilities": ["chat"],
140140
"category": "accuracy",
141141
"status": "stable",
142142
"contextWindow": {
143-
"maxInputTokens": 45000,
144-
"maxOutputTokens": 4000
145-
}
143+
"maxInputTokens": 132000,
144+
"maxOutputTokens": 8192
145+
},
146146
},
147+
{
148+
"modelRef": "anthropic::2024-10-22::claude-3-7-sonnet-extended-thinking",
149+
"displayName": "Claude 3.7 Sonnet Extended Thinking",
150+
"modelName": "claude-3-7-sonnet-latest",
151+
"capabilities": ["chat", "reasoning"],
152+
"category": "accuracy",
153+
"status": "stable",
154+
"contextWindow": {
155+
"maxInputTokens": 93000,
156+
"maxOutputTokens": 64000
157+
},
158+
"reasoningEffort": "low"
159+
},
160+
{
161+
"modelRef": "anthropic::2024-10-22::claude-3-5-haiku-latest",
162+
"displayName": "Claude 3.5 Haiku",
163+
"modelName": "claude-3-5-haiku-latest",
164+
"capabilities": ["autocomplete", "edit", "chat"],
165+
"category": "speed",
166+
"status": "stable",
167+
"contextWindow": {
168+
"maxInputTokens": 132000,
169+
"maxOutputTokens": 8192
170+
},
171+
}
147172
],
148173
"defaultModels": {
149-
"chat": "anthropic::2024-10-22::claude-3.5-sonnet",
150-
"fastChat": "anthropic::2023-06-01::claude-3-haiku",
151-
"codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
174+
"chat": "anthropic::2024-10-22::claude-3-7-sonnet-latest",
175+
"fastChat": "anthropic::2024-10-22::claude-3-5-haiku-latest",
176+
"codeCompletion": "anthropic::2024-10-22::claude-3-5-haiku-latest"
152177
}
153178
}
154179
```
@@ -157,8 +182,9 @@ In the configuration above,
157182

158183
- Set up a provider override for Anthropic, routing requests for this provider directly to the specified Anthropic endpoint (bypassing Cody Gateway)
159184
- Add three Anthropic models:
160-
- Two models with chat capabilities (`"anthropic::2024-10-22::claude-3.5-sonnet"` and `"anthropic::2023-06-01::claude-3-haiku"`), providing options for chat users
161-
- One model with autocomplete capability (`"fireworks::v1::deepseek-coder-v2-lite-base"`)
185+
- `"anthropic::2024-10-22::claude-3-7-sonnet-latest"` with chat, vision, and tools capabilities
186+
- `"anthropic::2024-10-22::claude-3-7-sonnet-extended-thinking"` with chat and reasoning capabilities (note: to enable [Claude's extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) model override should include "reasoning" capability and have "reasoningEffort" defined)
187+
- `"anthropic::2024-10-22::claude-3-5-haiku-latest"` with autocomplete, edit, chat, and tools capabilities
162188
- Set the configured models as default models for Cody features in the `"defaultModels"` field
163189

164190
</Accordion>
@@ -239,45 +265,61 @@ In the configuration above,
239265
}
240266
],
241267
"modelOverrides": [
242-
{
243-
"modelRef": "openai::2024-02-01::gpt-4o",
244-
"displayName": "GPT-4o",
245-
"modelName": "gpt-4o",
246-
"capabilities": ["chat"],
247-
"category": "accuracy",
248-
"status": "stable",
249-
"contextWindow": {
268+
{
269+
"modelRef": "openai::unknown::gpt-4o",
270+
"displayName": "GPT-4o",
271+
"modelName": "gpt-4o",
272+
"capabilities": ["chat"],
273+
"category": "accuracy",
274+
"status": "stable",
275+
"contextWindow": {
250276
"maxInputTokens": 45000,
251277
"maxOutputTokens": 4000
278+
}
279+
},
280+
{
281+
"modelRef": "openai::unknown::gpt-4.1-nano",
282+
"displayName": "GPT-4.1-nano",
283+
"modelName": "gpt-4.1-nano",
284+
"capabilities": ["edit", "chat", "autocomplete"],
285+
"category": "speed",
286+
"status": "stable",
287+
"tier": "free",
288+
"contextWindow": {
289+
"maxInputTokens": 77000,
290+
"maxOutputTokens": 16000
291+
}
292+
},
293+
{
294+
"modelRef": "openai::unknown::o3",
295+
"displayName": "o3",
296+
"modelName": "o3",
297+
"capabilities": ["chat", "reasoning"],
298+
"category": "accuracy",
299+
"status": "stable",
300+
"tier": "pro",
301+
"contextWindow": {
302+
"maxInputTokens": 68000,
303+
"maxOutputTokens": 100000
304+
},
305+
"reasoningEffort": "medium"
252306
}
253-
},
254-
{
255-
"modelRef": "openai::unknown::gpt-3.5-turbo-instruct",
256-
"displayName": "GPT-3.5 Turbo Instruct",
257-
"modelName": "gpt-3.5-turbo-instruct",
258-
"capabilities": ["autocomplete"],
259-
"category": "speed",
260-
"status": "stable",
261-
"contextWindow": {
262-
"maxInputTokens": 7000,
263-
"maxOutputTokens": 4000
264-
}
307+
],
308+
"defaultModels": {
309+
"chat": "openai::unknown::gpt-4o",
310+
"fastChat": "openai::unknown::gpt-4.1-nano",
311+
"codeCompletion": "openai::unknown::gpt-4.1-nano"
265312
}
266-
],
267-
"defaultModels": {
268-
"chat": "openai::2024-02-01::gpt-4o",
269-
"fastChat": "openai::2024-02-01::gpt-4o",
270-
"codeCompletion": "openai::unknown::gpt-3.5-turbo-instruct"
271-
}
272313
}
273314
```
274315

275316
In the configuration above,
276317

277318
- Set up a provider override for OpenAI, routing requests for this provider directly to the specified OpenAI endpoint (bypassing Cody Gateway)
278-
- Add two OpenAI models:
279-
- `"openai::2024-02-01::gpt-4o"` with "chat" capabilities - used for "chat" and "fastChat"
280-
- `"openai::unknown::gpt-3.5-turbo-instruct"` with "autocomplete" capability - used for "autocomplete"
319+
- Add three OpenAI models:
320+
- `"openai::2024-02-01::gpt-4o"` with chat capability - used as a default model for chat
321+
- `"openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
322+
- `"openai::unknown::o3"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined).
281323

282324
</Accordion>
283325

@@ -313,6 +355,33 @@ In the configuration above,
313355
"maxOutputTokens": 4000
314356
}
315357
},
358+
{
359+
"modelRef": "azure-openai::unknown::gpt-4.1-nano",
360+
"displayName": "GPT-4.1-nano",
361+
"modelName": "gpt-4.1-nano",
362+
"capabilities": ["edit", "chat", "autocomplete"],
363+
"category": "speed",
364+
"status": "stable",
365+
"tier": "free",
366+
"contextWindow": {
367+
"maxInputTokens": 77000,
368+
"maxOutputTokens": 16000
369+
}
370+
},
371+
{
372+
"modelRef": "azure-openai::unknown::o3-mini",
373+
"displayName": "o3-mini",
374+
"modelName": "o3-mini",
375+
"capabilities": ["chat", "reasoning"],
376+
"category": "accuracy",
377+
"status": "stable",
378+
"tier": "pro",
379+
"contextWindow": {
380+
"maxInputTokens": 68000,
381+
"maxOutputTokens": 100000
382+
},
383+
"reasoningEffort": "medium"
384+
},
316385
{
317386
"modelRef": "azure-openai::unknown::gpt-35-turbo-instruct-test",
318387
"displayName": "GPT-3.5 Turbo Instruct",
@@ -328,8 +397,8 @@ In the configuration above,
328397
],
329398
"defaultModels": {
330399
"chat": "azure-openai::unknown::gpt-4o",
331-
"fastChat": "azure-openai::unknown::gpt-4o",
332-
"codeCompletion": "azure-openai::unknown::gpt-35-turbo-instruct-test"
400+
"fastChat": "azure-openai::unknown::gpt-4.1-nano",
401+
"codeCompletion": "azure-openai::unknown::gpt-4.1-nano"
333402
}
334403
}
335404
```
@@ -338,9 +407,11 @@ In the configuration above,
338407

339408
- Set up a provider override for Azure OpenAI, routing requests for this provider directly to the specified Azure OpenAI endpoint (bypassing Cody Gateway).
340409
**Note:** For Azure OpenAI, ensure that the `modelName` matches the name defined in your Azure portal configuration for the model.
341-
- Add two OpenAI models:
342-
- `"azure-openai::unknown::gpt-4o"` with "chat" capability - used for "chat" and "fastChat"
343-
- `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - used for "autocomplete"
410+
- Add four OpenAI models:
411+
- `"azure-openai::unknown::gpt-4o"` with chat capability - used as a default model for chat
412+
- `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
413+
- `"azure-openai::unknown::o3-mini"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined)
414+
- `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - included as an alternative model
344415
- Since `"azure-openai::unknown::gpt-35-turbo-instruct-test"` is not supported on the newer OpenAI `"v1/chat/completions"` endpoint, we set `"useDeprecatedCompletionsAPI"` to `true` to route requests to the legacy `"v1/completions"` endpoint. This setting is unnecessary if you are using a model supported on the `"v1/chat/completions"` endpoint.
345416

346417
</Accordion>
@@ -499,48 +570,63 @@ In the configuration above,
499570
],
500571
"modelOverrides": [
501572
{
502-
"modelRef": "google::unknown::claude-3-5-sonnet",
503-
"displayName": "Claude 3.5 Sonnet (via Google/Vertex)",
504-
"modelName": "claude-3-5-sonnet@20240620",
505-
"contextWindow": {
506-
"maxInputTokens": 45000,
507-
"maxOutputTokens": 4000
508-
},
509-
"capabilities": ["chat"],
510-
"category": "accuracy",
511-
"status": "stable"
573+
"modelRef": "google::20250219::claude-3-7-sonnet",
574+
"displayName": "Claude 3.7 Sonnet",
575+
"modelName": "claude-3-7-sonnet@20250219",
576+
"capabilities": ["chat", "vision", "tools"],
577+
"category": "accuracy",
578+
"status": "stable",
579+
"contextWindow": {
580+
"maxInputTokens": 132000,
581+
"maxOutputTokens": 8192
582+
}
512583
},
513584
{
514-
"modelRef": "google::unknown::claude-3-haiku",
515-
"displayName": "Claude 3 Haiku",
516-
"modelName": "claude-3-haiku@20240307",
517-
"capabilities": ["autocomplete", "chat"],
518-
"category": "speed",
519-
"status": "stable",
520-
"contextWindow": {
521-
"maxInputTokens": 7000,
522-
"maxOutputTokens": 4000
523-
}
585+
"modelRef": "google::20250219::claude-3-7-sonnet-extended-thinking",
586+
"displayName": "Claude 3.7 Sonnet Extended Thinking",
587+
"modelName": "claude-3-7-sonnet@20250219",
588+
"capabilities": ["chat", "reasoning"],
589+
"category": "accuracy",
590+
"status": "stable",
591+
"reasoningEffort": "medium",
592+
"contextWindow": {
593+
"maxInputTokens": 93000,
594+
"maxOutputTokens": 64000
595+
}
524596
},
525-
],
526-
"defaultModels": {
527-
"chat": "google::unknown::claude-3-5-sonnet",
528-
"fastChat": "google::unknown::claude-3-5-sonnet",
529-
"codeCompletion": "google::unknown::claude-3-haiku"
530-
}
597+
{
598+
"modelRef": "google::20250219::claude-3-5-haiku",
599+
"displayName": "Claude 3.5 Haiku",
600+
"modelName": "claude-3-5-haiku@20241022",
601+
"capabilities": ["autocomplete", "edit", "chat", "tools"],
602+
"category": "speed",
603+
"status": "stable",
604+
"contextWindow": {
605+
"maxInputTokens": 132000,
606+
"maxOutputTokens": 8192
607+
}
608+
}
609+
],
610+
"defaultModels": {
611+
"chat": "google::20250219::claude-3.5-sonnet",
612+
"fastChat": "google::20250219::claude-3-5-haiku",
613+
"codeCompletion": "google::20250219::claude-3-5-haiku"
614+
}
531615
}
532616
```
533617

534618
In the configuration above,
535619

536620
- Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
537-
- Add two Anthropic models:
538-
- `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat"
539-
- `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
621+
- Add three Anthropic models:
622+
- `"google::unknown::claude-3-7-sonnet"` with chat, vision, and tools capabilities
623+
- `"google::unknown::claude-3-7-sonnet-extended-thinking"` with chat and reasoning capabilities (note: to enable [Claude's extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) model override should include "reasoning" capability and have "reasoningEffort" defined)
624+
- `"google::unknown::claude-3-5-haiku"` with autocomplete, edit, chat, and tools capabilities
625+
- Set the configured models as default models for Cody features in the `"defaultModels"` field
540626

541627
</Accordion>
542628

543-
<Accordion title="Google Vertex (public)">
629+
<Accordion title="Google Vertex (Gemini)">
544630

545631
```json
546632
"modelConfiguration": {
@@ -559,7 +645,7 @@ In the configuration above,
559645
"modelOverrides": [
560646
{
561647
"modelRef": "google::unknown::claude-3-5-sonnet",
562-
"displayName": "Claude 3.5 Sonnet (via Google/Vertex)",
648+
"displayName": "Claude 3.5 Sonnet (via Google Vertex)",
563649
"modelName": "claude-3-5-sonnet@20240620",
564650
"contextWindow": {
565651
"maxInputTokens": 45000,

docs/cody/enterprise/model-configuration.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,7 @@ This field is an array of items, each with the following fields:
215215
- `${apiVersionId}` specifies the API version, which helps detect compatibility issues between models and Sourcegraph instances. For example, `"2023-06-01"` can indicate that the model uses that version of the Anthropic API. If unsure, you may set this to `"unknown"` when defining custom models
216216
- `displayName`: An optional, user-friendly name for the model. If not set, clients should display the `ModelID` part of the `modelRef` instead (not the `modelName`)
217217
- `modelName`: A unique identifier the API provider uses to specify which model is being invoked. This is the identifier that the LLM provider recognizes to determine the model you are calling
218-
- `capabilities`: A list of capabilities that the model supports. Supported values: **autocomplete** and **chat**
218+
- `capabilities`: A list of capabilities that the model supports. Supported values: `autocomplete`, `chat`, `vision`, `reasoning`, `edit`, `tools`.
219219
- `category`: Specifies the model's category with the following options:
220220
- `"balanced"`: Typically the best default choice for most users. This category is suited for models like Sonnet 3.5 (as of October 2024)
221221
- `"speed"`: Ideal for low-parameter models that may not suit general-purpose chat but are beneficial for specialized tasks, such as query rewriting
@@ -225,6 +225,9 @@ This field is an array of items, each with the following fields:
225225
- `contextWindow`: An object that defines the **number of tokens** (units of text) that can be sent to the LLM. This setting influences response time and request cost and may vary according to the limits set by each LLM model or provider. It includes two fields:
226226
- `maxInputTokens`: Specifies the maximum number of tokens for the contextual data in the prompt (e.g., question, relevant snippets)
227227
- `maxOutputTokens`: Specifies the maximum number of tokens allowed in the response
228+
- `reasoningEffort`: Specifies the effort on reasoning for reasoning models (having `reasoning` capability). Supported values: `high`, `medium`, `low`. How this value is treated depends on the specific provider.
229+
For example, for Anthropic models supporting thinking, `low` effort means that the minimum [`thinking.budget_tokens`](https://docs.anthropic.com/en/api/messages#body-thinking) value (1024) will be used. For other `reasoningEffort` values, the `contextWindow.maxOutputTokens / 2` value will be used.
230+
For OpenAI reasoning models, the `reasoningEffort` field value corresponds to the [`reasoning_effort`](https://platform.openai.com/docs/api-reference/chat/create#chat-create-reasoning_effort) request body value.
228231
- `serverSideConfig`: Additional configuration for the model. It can be one of the following:
229232

230233
- `awsBedrockProvisionedThroughput`: Specifies provisioned throughput settings for AWS Bedrock models with the following fields:
@@ -326,7 +329,7 @@ In this modelOverrides config example:
326329
- The model is configured to use the `"chat"` and `"reasoning"` capabilities
327330
- The `reasoningEffort` can be set to 3 different options in the Model Config. These options are `high`, `medium` and `low`
328331
- The default `reasoningEffort` is set to `low`
329-
- When the reasoning effort is `low`, 1024 tokens is used as the thinking budget. With `medium` and `high` the thinking budget is set via `max_tokens_to_sample/2`
332+
- For Anthropic models supporting thinking, when the reasoning effort is `low`, 1024 tokens is used as the thinking budget. With `medium` and `high` the thinking budget is set to half of the maxOutputTokens value
330333

331334
Refer to the [examples page](/cody/enterprise/model-config-examples) for additional examples.
332335

0 commit comments

Comments
 (0)