From 2400c325c542840c7fd82a9e47e1dff226b14144 Mon Sep 17 00:00:00 2001
From: Emi <emi@sourcegraph.com>
Date: Wed, 28 May 2025 12:30:52 -0700
Subject: [PATCH] AWS Bedrock / Vertex Claude reasoning docs; fix issue with
 autocomplete vs. codeCompletion

Signed-off-by: Emi <emi@sourcegraph.com>
---
 docs/cody/capabilities/supported-models.mdx   | 14 +++---
 .../cody/enterprise/model-config-examples.mdx |  6 +--
 docs/cody/enterprise/model-configuration.mdx  | 49 +++++++++++++++++--
 public/llms.txt                               | 14 +++---
 4 files changed, 63 insertions(+), 20 deletions(-)
diff --git a/docs/cody/capabilities/supported-models.mdx b/docs/cody/capabilities/supported-models.mdx
index b543111e5..44334a6e6 100644
--- a/docs/cody/capabilities/supported-models.mdx
+++ b/docs/cody/capabilities/supported-models.mdx
@@ -29,17 +29,19 @@ Cody supports a variety of cutting-edge large language models for use in chat an
 
 <Callout type="note">To use Claude 3 Sonnet models with Cody Enterprise, make sure you've upgraded your Sourcegraph instance to the latest version. </Callout>
 
-### Claude 3.7 Sonnet
+### Claude 3.7 and 4 Sonnet
 
-Claude 3.7 has two variants — Claude 3.7 Sonnet and Claude 3.7 Extended Thinking — to support deep reasoning and fast, responsive edit workflows. This means you can use Claude 3.7 in different contexts depending on whether long-form reasoning is required or for tasks where speed and performance are a priority.
+Claude 3.7 and 4 Sonnet have two variants; the base version, and the 'extended thinking' version which supports deep reasoning and fast, responsive edit workflows. Cody enables using both, and lets the user select which to use in the model dropdown selector, so the user can choose whether to use extended thinkig depending on their work task.
 
-Claude 3.7 Extended Thinking is the recommended default chat model for Cloud customers. Self-hosted customers are encouraged to follow this recommendation, as Claude 3.7 outperforms 3.5 in most scenarios.
+<Callout type="note">
+    Claude 4 support is available starting in Sourcegraph v6.4+ and v6.3.4167.
+</Callout>
 
-#### Claude 3.7 for GCP
+#### Claude 3.7 and 4 via Google Vertex, via AWS Bedrock
 
-In addition, Sourcegraph Enterprise customers using GCP Vertex (Google Cloud Platform) for Claude models can use both these variants of Claude 3.7 to optimize extended reasoning and deeper understanding. Customers using AWS Bedrock do not have the Claude 3.7 Extended Thinking variant.
+Starting in Sourcegraph v6.4+ and v6.3.416, Claude 3.7 Extended Thinking - as well as Claude 4 base and extended thinking variants - are available in Sourcegraph when using Claude through either Google Vertex or AWS Bedrock.
 
-<Callout type="info">Claude 3.7 Sonnet with thinking is not supported for BYOK deployments.</Callout>
+See [Model Configuration: Reasoning models](/cody/enterprise/model-configuration#reasoning-models) for more information.
 
 ## Autocomplete
 
diff --git a/docs/cody/enterprise/model-config-examples.mdx b/docs/cody/enterprise/model-config-examples.mdx
index 06c7f30a4..99671fc94 100644
--- a/docs/cody/enterprise/model-config-examples.mdx
+++ b/docs/cody/enterprise/model-config-examples.mdx
@@ -104,7 +104,7 @@ In the configuration above, we:
 -   Define a new provider with the ID `"anthropic-byok"` and configure it to use the Anthropic API
 -   Since this provider is unknown to Sourcegraph, no Sourcegraph-supplied models are available. Therefore, we add a custom model in the `"modelOverrides"` section
 -   Use the custom model configured in the previous step (`"anthropic-byok::2024-10-22::claude-3.5-sonnet"`) for `"chat"`. Requests are sent directly to the Anthropic API as set in the provider override
--   For `"fastChat"` and `"autocomplete"`, we use Sourcegraph-provided models via Cody Gateway
+-   For `"fastChat"` and `"codeCompletion"`, we use Sourcegraph-provided models via Cody Gateway
 
 ## Config examples for various LLM providers
 
@@ -244,7 +244,7 @@ In the configuration above,
 -   Set up a provider override for Fireworks, routing requests for this provider directly to the specified Fireworks endpoint (bypassing Cody Gateway)
 -   Add two Fireworks models:
     -   `"fireworks::v1::mixtral-8x7b-instruct"` with "chat" capabiity - used for "chat" and "fastChat"
-    -   `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "autocomplete"
+    -   `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "codeCompletion"
 
 </Accordion>
 
@@ -721,7 +721,7 @@ In the configuration above,
 In the configuration above,
 
 -   Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
--   Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
+-   Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "codeCompletion"
 
 </Accordion>
 
diff --git a/docs/cody/enterprise/model-configuration.mdx b/docs/cody/enterprise/model-configuration.mdx
index f37dfbd2a..929a74026 100644
--- a/docs/cody/enterprise/model-configuration.mdx
+++ b/docs/cody/enterprise/model-configuration.mdx
@@ -89,7 +89,7 @@ To disable all Sourcegraph-provided models and use only the models explicitly de
 
 ## Default models
 
-The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"autocomplete"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
+The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"codeCompletion"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
 
 If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
 
@@ -168,7 +168,7 @@ Example configuration:
   "defaultModels": {
     "chat": "google::v1::gemini-1.5-pro",
     "fastChat": "anthropic::2023-06-01::claude-3-haiku",
-    "autocomplete": "fireworks::v1::deepseek-coder-v2-lite-base"
+    "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
   }
 }
 ```
@@ -291,7 +291,7 @@ For OpenAI reasoning models, the `reasoningEffort` field value corresponds to th
   "defaultModels": {
     "chat": "google::v1::gemini-1.5-pro",
     "fastChat": "anthropic::2023-06-01::claude-3-haiku",
-    "autocomplete": "huggingface-codellama::v1::CodeLlama-7b-hf"
+    "codeCompletion": "huggingface-codellama::v1::CodeLlama-7b-hf"
   }
 }
 ```
@@ -303,7 +303,7 @@ In the example above:
 -   A custom model, `"CodeLlama-7b-hf"`, is added using the `"huggingface-codellama"` provider
 -   Default models are set up as follows:
     -   Sourcegraph-provided models are used for `"chat"` and `"fastChat"` (accessed via Cody Gateway)
-    -   The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"autocomplete"` (connecting directly to Hugging Face’s OpenAI-compatible API)
+    -   The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"codeCompletion"` (connecting directly to Hugging Face’s OpenAI-compatible API)
 
 #### Example configuration with Claude 3.7 Sonnet
 
@@ -478,3 +478,44 @@ The response includes:
     "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
 }
 ```
+
+## Reasoning models
+
+<Callout type="note">
+    Claude 3.7 and 4 support is available starting in Sourcegraph v6.4+ and v6.3.4167 out of-the-box when using Cody Gateway.
+    
+    This section is primarily relevant to Sourcegraph Enterprise customers using AWS Bedrock or Google Vertex.
+</Callout>
+
+Reasoning models can be added via `modelOverrides` in the site configuration by adding the `reasoning` capability to the `capabilities` list, and setting the `reasoningEffort` field on the model. Both must be set for the models' reasoning functionality to be used (otherwise the base model without reasoning / exteded thinking will be used.)
+
+For example, this `modelOverride` would create a `Claude Sonnet 4 with Thinking` option in the Cody model selector menu, and when the user chats with Cody with that model selected, it would use Claude Sonnet 4's Extended Thinking support with a `low` reasoning effort for the users' chat:
+
+```json
+{
+    "modelRef": "bedrock::2024-10-22::claude-sonnet-4-thinking-latest",
+    "displayName": "Claude Sonnet 4 with Thinking",
+    "modelName": "claude-sonnet-4-20250514",
+    "contextWindow": {
+      "maxInputTokens": 93000,
+      "maxOutputTokens": 64000,
+      "maxUserInputTokens": 18000
+    },
+    "capabilities": [
+      "chat",
+      "reasoning"
+    ],
+    "reasoningEffort": "low",
+    "category": "accuracy",
+    "status": "stable"
+}
+```
+
+<Accordion title="Understading reasoningEffort">
+
+The `reasoningEffort` field is only used by reasoning models (those having `reasoning` in their `capabilities` section). Supported values are `high`, `medium`, `low`. How this value is treated depends on the specific provider:
+
+* `anthropic` provider treats e.g. `low` effort to mean that the minimum [`thinking.budget_tokens`](https://docs.anthropic.com/en/api/messages#body-thinking) value (1024) will be used. For other `reasoningEffort` values, the `contextWindow.maxOutputTokens / 2` value will be used.
+* `openai` provider maps the `reasoningEffort` field value to the [OpenAI `reasoning_effort`](https://platform.openai.com/docs/api-reference/chat/create#chat-create-reasoning_effort) request body value.
+
+</Accordion>
diff --git a/public/llms.txt b/public/llms.txt
index 23ba28f45..30645c425 100644
--- a/public/llms.txt
+++ b/public/llms.txt
@@ -14532,7 +14532,7 @@ To disable all Sourcegraph-provided models and use only the models explicitly de
 
 ## Default models
 
-The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"autocomplete"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
+The `"modelConfiguration"` setting includes a `"defaultModels"` field, which allows you to specify the LLM model used for each Cody feature (`"chat"`, `"fastChat"`, and `"codeCompletion"`). The values for each feature should be `modelRef`s of either Sourcegraph-provided models or models configured in the `modelOverrides` section.
 
 If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
 
@@ -14611,7 +14611,7 @@ Example configuration:
   "defaultModels": {
     "chat": "google::v1::gemini-1.5-pro",
     "fastChat": "anthropic::2023-06-01::claude-3-haiku",
-    "autocomplete": "fireworks::v1::deepseek-coder-v2-lite-base"
+    "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base"
   }
 }
 ```
@@ -14725,7 +14725,7 @@ For OpenAI reasoning models, the `reasoningEffort` field value corresponds to th
   "defaultModels": {
     "chat": "google::v1::gemini-1.5-pro",
     "fastChat": "anthropic::2023-06-01::claude-3-haiku",
-    "autocomplete": "huggingface-codellama::v1::CodeLlama-7b-hf"
+    "codeCompletion": "huggingface-codellama::v1::CodeLlama-7b-hf"
   }
 }
 ```
@@ -14737,7 +14737,7 @@ In the example above:
 -   A custom model, `"CodeLlama-7b-hf"`, is added using the `"huggingface-codellama"` provider
 -   Default models are set up as follows:
     -   Sourcegraph-provided models are used for `"chat"` and `"fastChat"` (accessed via Cody Gateway)
-    -   The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"autocomplete"` (connecting directly to Hugging Face’s OpenAI-compatible API)
+    -   The newly configured model, `"huggingface-codellama::v1::CodeLlama-7b-hf"`, is used for `"codeCompletion"` (connecting directly to Hugging Face’s OpenAI-compatible API)
 
 #### Example configuration with Claude 3.7 Sonnet
 
@@ -15162,7 +15162,7 @@ In the configuration above,
 -   Set up a provider override for Fireworks, routing requests for this provider directly to the specified Fireworks endpoint (bypassing Cody Gateway)
 -   Add two Fireworks models:
     -   `"fireworks::v1::mixtral-8x7b-instruct"` with "chat" capabiity - used for "chat" and "fastChat"
-    -   `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "autocomplete"
+    -   `"fireworks::v1::starcoder-16b"` with "autocomplete" capability - used for "codeCompletion"
 
 </Accordion>
 
@@ -15327,7 +15327,7 @@ In the configuration above,
     **Note:** For Azure OpenAI, ensure that the `modelName` matches the name defined in your Azure portal configuration for the model.
 -   Add four OpenAI models:
     -   `"azure-openai::unknown::gpt-4o"` with chat capability - used as a default model for chat
-    -   `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and autocomplete
+    -   `"azure-openai::unknown::gpt-4.1-nano"` with chat, edit and autocomplete capabilities - used as a default model for fast chat and codeCompletion
     -   `"azure-openai::unknown::o3-mini"` with chat and reasoning capabilities - o-series model that supports thinking, can be used for chat (note: to enable thinking, model override should include "reasoning" capability and have "reasoningEffort" defined)
     -   `"azure-openai::unknown::gpt-35-turbo-instruct-test"` with "autocomplete" capability - included as an alternative model
 -   Since `"azure-openai::unknown::gpt-35-turbo-instruct-test"` is not supported on the newer OpenAI `"v1/chat/completions"` endpoint, we set `"useDeprecatedCompletionsAPI"` to `true` to route requests to the legacy `"v1/completions"` endpoint. This setting is unnecessary if you are using a model supported on the `"v1/chat/completions"` endpoint.
@@ -15597,7 +15597,7 @@ In the configuration above,
 In the configuration above,
 
 -   Set up a provider override for Google Anthropic, routing requests for this provider directly to the specified endpoint (bypassing Cody Gateway)
--   Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "autocomplete"
+-   Add two Anthropic models: - `"google::unknown::claude-3-5-sonnet"` with "chat" capabiity - used for "chat" and "fastChat" - `"google::unknown::claude-3-haiku"` with "autocomplete" capability - used for "codeCompletion"
 
 </Accordion>