Skip to content

Commit 0b99271

Browse files
modelconfig: document new OpenAI-compatible configurations (#1148)
Documents the new `openaicompatible` model configurations supported as part of https://linear.app/sourcegraph/issue/CORE-1019/enablement-ga-self-hosted-models-additional-openai-compatible --------- Signed-off-by: Emi <[email protected]> Co-authored-by: Maedah Batool <[email protected]>
1 parent 987f092 commit 0b99271

File tree

2 files changed

+55
-4
lines changed

2 files changed

+55
-4
lines changed

docs/cody/enterprise/model-config-examples.mdx

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -318,8 +318,8 @@ In the configuration above,
318318
- Set up a provider override for OpenAI, routing requests for this provider directly to the specified OpenAI endpoint (bypassing Cody Gateway)
319319
- Add three OpenAI models:
320320
- `"openai::2024-02-01::gpt-4o"` with chat capability - used as a default model for chat
321-
- `"openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
322-
- `"openai::unknown::o3"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined).
321+
- `"openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
322+
- `"openai::unknown::o3"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined).
323323

324324
</Accordion>
325325

@@ -505,6 +505,48 @@ In the configuration above,
505505
- Set `clientSideConfig.openaicompatible` to `{}` to indicate to Cody clients that these models are OpenAI-compatible, ensuring the appropriate code paths are utilized
506506
- Designate these models as the default choices for chat and autocomplete, respectively
507507

508+
## Disabling legacy completions
509+
510+
Available in Sourcegraph 6.4+ and 6.3.2692
511+
512+
By default, Cody will send Autocomplete requests to the legacy OpenAI /completions endpoint (i.e. for pure-inference requests) - if your OpenAI-compatible API endpoint supports only /chat/completions, you may disable the use of the legacy completions endpoint by adding the following above your serverSideConfig endpoints list:
513+
514+
```json
515+
"serverSideConfig": {
516+
"type": "openaicompatible",
517+
"useLegacyCompletions": false,
518+
// ^ add this to disable /completions and make Cody only use /chat/completions
519+
"endpoints": [
520+
{
521+
"url": "https://api-inference.huggingface.co/models/meta-llama/CodeLlama-7b-hf/v1/",
522+
"accessToken": "token"
523+
}
524+
]
525+
}
526+
```
527+
528+
## Sending custom HTTP headers
529+
530+
531+
<Callout type="info">Available in Sourcegraph v6.4+ and v6.3.2692</Callout>
532+
533+
By default, Cody will only send an `Authorization: Bearer <accessToken>` header to OpenAI-compatible endpoints. You may configure custom HTTP headers if you like under the URL of endpoints:
534+
535+
```json
536+
"serverSideConfig": {
537+
"type": "openaicompatible",
538+
"endpoints": [
539+
{
540+
"url": "https://api-inference.huggingface.co/models/meta-llama/CodeLlama-7b-hf/v1/",
541+
"headers": { "X-api-key": "foo", "My-Custom-Http-Header": "bar" },
542+
// ^ add this to configure custom headers
543+
}
544+
]
545+
}
546+
```
547+
548+
<Callout type="note">When using custom headers, both `accessToken` and `accessTokenQuery` configuration settings are ignored.</Callout>
549+
508550
</Accordion>
509551

510552
<Accordion title="Google Gemini">
@@ -745,7 +787,7 @@ Provider override `serverSideConfig` fields:
745787
Provisioned throughput for Amazon Bedrock models can be configured using the `"awsBedrockProvisionedThroughput"` server-side configuration type. Refer to the [Model Overrides](/cody/enterprise/model-configuration#model-overrides) section for more details.
746788

747789
<Callout type="note">
748-
If using [IAM roles for EC2 / instance role binding](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html),
790+
If using [IAM roles for EC2 / instance role binding](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html),
749791
you may need to increase the [HttpPutResponseHopLimit
750792
](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_InstanceMetadataOptionsRequest.html#:~:text=HttpPutResponseHopLimit) instance metadata option to a higher value (e.g., 2) to ensure that the metadata service can be accessed from the frontend container running in the EC2 instance. See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-IMDS-existing-instances.html) for instructions.
751793
</Callout>

docs/cody/enterprise/model-configuration.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,13 +249,22 @@ For OpenAI reasoning models, the `reasoningEffort` field value corresponds to th
249249
"displayName": "huggingface",
250250
"serverSideConfig": {
251251
"type": "openaicompatible",
252+
// optional: disable the use of /completions for autocomplete requests, instead using
253+
// only /chat/completions. (available in Sourcegraph 6.4+ and 6.3.2692)
254+
//
255+
// "useLegacyCompletions": false,
252256
"endpoints": [
253257
{
254258
"url": "https://api-inference.huggingface.co/models/meta-llama/CodeLlama-7b-hf/v1/",
255259
"accessToken": "token"
260+
261+
// optional: send custom headers (in which case accessToken above is not used)
262+
// (available in Sourcegraph 6.4+ and 6.3.2692)
263+
//
264+
// "headers": { "X-api-key": "foo", "My-Custom-Http-Header": "bar" },
256265
}
257266
]
258-
}
267+
}
259268
}
260269
],
261270
"modelOverrides": [

0 commit comments

Comments
 (0)