Skip to content

Commit 3b06e29

Browse files
committed
Update model configuration documentation
1 parent 3394998 commit 3b06e29

File tree

2 files changed

+161
-75
lines changed

2 files changed

+161
-75
lines changed

docs/cody/enterprise/model-config-examples.mdx

Lines changed: 159 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -133,22 +133,47 @@ Below are configuration examples for setting up various LLM providers using BYOK
133133
],
134134
"modelOverrides": [
135135
{
136-
"modelRef": "anthropic::2024-10-22::claude-3.5-sonnet",
137-
"displayName": "Claude 3.5 Sonnet",
138-
"modelName": "claude-3-5-sonnet-latest",
136+
"modelRef": "anthropic::2024-10-22::claude-3-7-sonnet-latest",
137+
"displayName": "Claude 3.7 Sonnet",
138+
"modelName": "claude-3-7-sonnet-latest",
139139
"capabilities": ["chat"],
140140
"category": "accuracy",
141141
"status": "stable",
142142
"contextWindow": {
143-
"maxInputTokens": 45000,
144-
"maxOutputTokens": 4000
145-
}
143+
"maxInputTokens": 132000,
144+
"maxOutputTokens": 8192
145+
},
146146
},
147+
{
148+
"modelRef": "anthropic::2024-10-22::claude-3-7-sonnet-extended-thinking",
149+
"displayName": "Claude 3.7 Sonnet Extended Thinking",
150+
"modelName": "claude-3-7-sonnet-latest",
151+
"capabilities": ["chat", "reasoning"],
152+
"category": "accuracy",
153+
"status": "stable",
154+
"contextWindow": {
155+
"maxInputTokens": 93000,
156+
"maxOutputTokens": 64000
157+
},
158+
"reasoningEffort": "low"
159+
},
160+
{
161+
"modelRef": "anthropic::2024-10-22::claude-3-5-haiku-latest",
162+
"displayName": "Claude 3.5 Haiku",
163+
"modelName": "claude-3-5-haiku-latest",
164+
"capabilities": ["autocomplete", "edit", "chat"],
165+
"category": "speed",
166+
"status": "stable",
167+
"contextWindow": {
168+
"maxInputTokens": 132000,
169+
"maxOutputTokens": 8192
170+
},
171+
}
147172
],
148173
"defaultModels": {
149-
"chat": "anthropic::2024-10-22::claude-3.5-sonnet",
150-
"fastChat": "anthropic::2023-06-01::claude-3-haiku",
151-
"codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
174+
"chat": "anthropic::2024-10-22::claude-3-7-sonnet-latest",
175+
"fastChat": "anthropic::2024-10-22::claude-3-5-haiku-latest",
176+
"codeCompletion": "anthropic::2024-10-22::claude-3-5-haiku-latest"
152177
}
153178
}
154179
```
@@ -157,8 +182,9 @@ In the configuration above,
157182

158183
- Set up a provider override for Anthropic, routing requests for this provider directly to the specified Anthropic endpoint (bypassing Cody Gateway)
159184
- Add three Anthropic models:
160-
- Two models with chat capabilities (`"anthropic::2024-10-22::claude-3.5-sonnet"` and `"anthropic::2023-06-01::claude-3-haiku"`), providing options for chat users
161-
- One model with autocomplete capability (`"fireworks::v1::deepseek-coder-v2-lite-base"`)
185+
- `"anthropic::2024-10-22::claude-3-7-sonnet-latest"` with chat, vision, and tools capabilities
186+
- `"anthropic::2024-10-22::claude-3-7-sonnet-extended-thinking"` with chat and reasoning capabilities (note: to enable [Claude's extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) model override should include "reasoning" capability and have "reasoningEffort" defined)
187+
- `"anthropic::2024-10-22::claude-3-5-haiku-latest"` with autocomplete, edit, chat, and tools capabilities
162188
- Set the configured models as default models for Cody features in the `"defaultModels"` field
163189

164190
</Accordion>
@@ -239,45 +265,61 @@ In the configuration above,
239265
}
240266
],
241267
"modelOverrides": [
242-
{
243-
"modelRef": "openai::2024-02-01::gpt-4o",
244-
"displayName": "GPT-4o",
245-
"modelName": "gpt-4o",
246-
"capabilities": ["chat"],
247-
"category": "accuracy",
248-
"status": "stable",
249-
"contextWindow": {
268+
{
269+
"modelRef": "openai::unknown::gpt-4o",
270+
"displayName": "GPT-4o",
271+
"modelName": "gpt-4o",
272+
"capabilities": ["chat"],
273+
"category": "accuracy",
274+
"status": "stable",
275+
"contextWindow": {
250276
"maxInputTokens": 45000,
251277
"maxOutputTokens": 4000
278+
}
279+
},
280+
{
281+
"modelRef": "openai::unknown::gpt-4.1-nano",
282+
"displayName": "GPT-4.1-nano",
283+
"modelName": "gpt-4.1-nano",
284+
"capabilities": ["edit", "chat", "autocomplete"],
285+
"category": "speed",
286+
"status": "stable",
287+
"tier": "free",
288+
"contextWindow": {
289+
"maxInputTokens": 77000,
290+
"maxOutputTokens": 16000
291+
}
292+
},
293+
{
294+
"modelRef": "openai::unknown::o3",
295+
"displayName": "o3",
296+
"modelName": "o3",
297+
"capabilities": ["chat", "reasoning"],
298+
"category": "accuracy",
299+
"status": "stable",
300+
"tier": "pro",
301+
"contextWindow": {
302+
"maxInputTokens": 68000,
303+
"maxOutputTokens": 100000
304+
},
305+
"reasoningEffort": "medium"
252306
}
253-
},
254-
{
255-
"modelRef": "openai::unknown::gpt-3.5-turbo-instruct",
256-
"displayName": "GPT-3.5 Turbo Instruct",
257-
"modelName": "gpt-3.5-turbo-instruct",
258-
"capabilities": ["autocomplete"],
259-
"category": "speed",
260-
"status": "stable",
261-
"contextWindow": {
262-
"maxInputTokens": 7000,
263-
"maxOutputTokens": 4000
264-
}
307+
],
308+
"defaultModels": {
309+
"chat": "openai::unknown::gpt-4o",
310+
"fastChat": "openai::unknown::gpt-4.1-nano",
311+
"codeCompletion": "openai::unknown::gpt-4.1-nano"
265312
}
266-
],
267-
"defaultModels": {
268-
"chat": "openai::2024-02-01::gpt-4o",
269-
"fastChat": "openai::2024-02-01::gpt-4o",
270-
"codeCompletion": "openai::unknown::gpt-3.5-turbo-instruct"
271-
}
272313
}
273314
```
274315

275316
In the configuration above,
276317

277318
- Set up a provider override for OpenAI, routing requests for this provider directly to the specified OpenAI endpoint (bypassing Cody Gateway)
278-
- Add two OpenAI models:
279-
- `"openai::2024-02-01::gpt-4o"` with "chat" capabilities - used for "chat" and "fastChat"
280-
- `"openai::unknown::gpt-3.5-turbo-instruct"` with "autocomplete" capability - used for "autocomplete"
319+
- Add three OpenAI models:
320+
- `"openai::2024-02-01::gpt-4o"` with chat capability - used as a default model for chat
321+
- `"openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
322+
- `"openai::unknown::o3"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined).
281323

282324
</Accordion>
283325

@@ -313,6 +355,33 @@ In the configuration above,
313355
"maxOutputTokens": 4000
314356
}
315357
},
358+
{
359+
"modelRef": "azure-openai::unknown::gpt-4.1-nano",
360+
"displayName": "GPT-4.1-nano",
361+
"modelName": "gpt-4.1-nano",
362+
"capabilities": ["edit", "chat", "autocomplete"],
363+
"category": "speed",
364+
"status": "stable",
365+
"tier": "free",
366+
"contextWindow": {
367+
"maxInputTokens": 77000,
368+
"maxOutputTokens": 16000
369+
}
370+
},
371+
{
372+
"modelRef": "azure-openai::unknown::o3-mini",
373+
"displayName": "o3-mini",
374+
"modelName": "o3-mini",
375+
"capabilities": ["chat", "reasoning"],
376+
"category": "accuracy",
377+
"status": "stable",
378+
"tier": "pro",
379+
"contextWindow": {
380+
"maxInputTokens": 68000,
381+
"maxOutputTokens": 100000
382+
},
383+
"reasoningEffort": "medium"
384+
},
316385
{
317386
"modelRef": "azure-openai::unknown::gpt-35-turbo-instruct-test",
318387
"displayName": "GPT-3.5 Turbo Instruct",
@@ -328,8 +397,8 @@ In the configuration above,
328397
],
329398
"defaultModels": {
330399
"chat": "azure-openai::unknown::gpt-4o",
331-
"fastChat": "azure-openai::unknown::gpt-4o",
332-
"codeCompletion": "azure-openai::unknown::gpt-35-turbo-instruct-test"
400+
"fastChat": "azure-openai::unknown::gpt-4.1-nano",
401+
"codeCompletion": "azure-openai::unknown::gpt-4.1-nano"
333402
}
334403
}
335404
```
@@ -338,9 +407,11 @@ In the configuration above,
338407

339408
- Set up a provider override for Azure OpenAI, routing requests for this provider directly to the specified Azure OpenAI endpoint (bypassing Cody Gateway).
340409
**Note:** For Azure OpenAI, ensure that the `modelName` matches the name defined in your Azure portal configuration for the model.
341-
- Add two OpenAI models:
342-
- `"azure-openai::unknown::gpt-4o"` with "chat" capability - used for "chat" and "fastChat"
343-
- `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - used for "autocomplete"
410+
- Add four OpenAI models:
411+
- `"azure-openai::unknown::gpt-4o"` with chat capability - used as a default model for chat
412+
- `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
413+
- `"azure-openai::unknown::o3-mini"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined)
414+
- `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - included as an alternative model
344415
- Since `"azure-openai::unknown::gpt-35-turbo-instruct-test"` is not supported on the newer OpenAI `"v1/chat/completions"` endpoint, we set `"useDeprecatedCompletionsAPI"` to `true` to route requests to the legacy `"v1/completions"` endpoint. This setting is unnecessary if you are using a model supported on the `"v1/chat/completions"` endpoint.
345416

346417
</Accordion>
@@ -499,44 +570,58 @@ In the configuration above,
499570
],
500571
"modelOverrides": [
501572
{
502-
"modelRef": "google::unknown::claude-3-5-sonnet",
503-
"displayName": "Claude 3.5 Sonnet (via Google/Vertex)",
504-
"modelName": "claude-3-5-sonnet@20240620",
505-
"contextWindow": {
506-
"maxInputTokens": 45000,
507-
"maxOutputTokens": 4000
508-
},
509-
"capabilities": ["chat"],
510-
"category": "accuracy",
511-
"status": "stable"
573+
"modelRef": "google::unknown::claude-3-7-sonnet",
574+
"displayName": "Claude 3.7 Sonnet",
575+
"modelName": "claude-3-7-sonnet",
576+
"capabilities": ["chat", "vision", "tools"],
577+
"category": "accuracy",
578+
"status": "stable",
579+
"contextWindow": {
580+
"maxInputTokens": 132000,
581+
"maxOutputTokens": 8192
582+
}
512583
},
513584
{
514-
"modelRef": "google::unknown::claude-3-haiku",
515-
"displayName": "Claude 3 Haiku",
516-
"modelName": "claude-3-haiku@20240307",
517-
"capabilities": ["autocomplete", "chat"],
518-
"category": "speed",
519-
"status": "stable",
520-
"contextWindow": {
521-
"maxInputTokens": 7000,
522-
"maxOutputTokens": 4000
523-
}
585+
"modelRef": "google::unknown::claude-3-7-sonnet-extended-thinking",
586+
"displayName": "Claude 3.7 Sonnet Extended Thinking",
587+
"modelName": "claude-3-7-sonnet",
588+
"capabilities": ["chat"],
589+
"category": "accuracy",
590+
"status": "stable",
591+
"contextWindow": {
592+
"maxInputTokens": 93000,
593+
"maxOutputTokens": 64000
594+
}
524595
},
525-
],
526-
"defaultModels": {
527-
"chat": "google::unknown::claude-3-5-sonnet",
528-
"fastChat": "google::unknown::claude-3-5-sonnet",
529-
"codeCompletion": "google::unknown::claude-3-haiku"
530-
}
596+
{
597+
"modelRef": "google::unknown::claude-3-5-haiku",
598+
"displayName": "Claude 3.5 Haiku",
599+
"modelName": "claude-3-5-haiku-latest",
600+
"capabilities": ["autocomplete", "edit", "chat", "tools"],
601+
"category": "speed",
602+
"status": "stable",
603+
"contextWindow": {
604+
"maxInputTokens": 132000,
605+
"maxOutputTokens": 8192
606+
}
607+
}
608+
],
609+
"defaultModels": {
610+
"chat": "google::unknown::claude-3.5-sonnet",
611+
"fastChat": "google::unknown::claude-3-5-haiku",
612+
"codeCompletion": "google::unknown::claude-3-5-haiku"
613+
}
531614
}
532615
```
533616

534617
In the configuration above,
535618

536619
- Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
537-
- Add two Anthropic models:
538-
- `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat"
539-
- `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
620+
- Add three Anthropic models:
621+
- `"google::unknown::claude-3-7-sonnet"` with chat, vision, and tools capabilities
622+
- `"google::unknown::claude-3-7-sonnet-extended-thinking"` with chat and reasoning capabilities (note: to enable [Claude's extended thinking](https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking) model override should include "reasoning" capability and have "reasoningEffort" defined)
623+
- `"google::unknown::claude-3-5-haiku"` with autocomplete, edit, chat, and tools capabilities
624+
- Set the configured models as default models for Cody features in the `"defaultModels"` field
540625

541626
</Accordion>
542627

docs/cody/enterprise/model-configuration.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,7 @@ This field is an array of items, each with the following fields:
215215
- `${apiVersionId}` specifies the API version, which helps detect compatibility issues between models and Sourcegraph instances. For example, `"2023-06-01"` can indicate that the model uses that version of the Anthropic API. If unsure, you may set this to `"unknown"` when defining custom models
216216
- `displayName`: An optional, user-friendly name for the model. If not set, clients should display the `ModelID` part of the `modelRef` instead (not the `modelName`)
217217
- `modelName`: A unique identifier the API provider uses to specify which model is being invoked. This is the identifier that the LLM provider recognizes to determine the model you are calling
218-
- `capabilities`: A list of capabilities that the model supports. Supported values: **autocomplete** and **chat**
218+
- `capabilities`: A list of capabilities that the model supports. Supported values: `autocomplete`, `chat`, `vision`, `reasoning`, `edit`, `tools`.
219219
- `category`: Specifies the model's category with the following options:
220220
- `"balanced"`: Typically the best default choice for most users. This category is suited for models like Sonnet 3.5 (as of October 2024)
221221
- `"speed"`: Ideal for low-parameter models that may not suit general-purpose chat but are beneficial for specialized tasks, such as query rewriting
@@ -225,6 +225,7 @@ This field is an array of items, each with the following fields:
225225
- `contextWindow`: An object that defines the **number of tokens** (units of text) that can be sent to the LLM. This setting influences response time and request cost and may vary according to the limits set by each LLM model or provider. It includes two fields:
226226
- `maxInputTokens`: Specifies the maximum number of tokens for the contextual data in the prompt (e.g., question, relevant snippets)
227227
- `maxOutputTokens`: Specifies the maximum number of tokens allowed in the response
228+
- `reasoningEffort`: Specifies the effort on reasoning for reasoning models (having `reasoning` capability). Supported values: `high`, `medium`, `low`.
228229
- `serverSideConfig`: Additional configuration for the model. It can be one of the following:
229230

230231
- `awsBedrockProvisionedThroughput`: Specifies provisioned throughput settings for AWS Bedrock models with the following fields:

0 commit comments

Comments
 (0)