Skip to content

🔬(websearch) Use Brave LLM context API with snippets#330

Merged
camilleAND merged 1 commit intomainfrom
camand/feat_websearch_snippets
Mar 26, 2026
Merged

🔬(websearch) Use Brave LLM context API with snippets#330
camilleAND merged 1 commit intomainfrom
camand/feat_websearch_snippets

Conversation

@camilleAND
Copy link
Contributor

@camilleAND camilleAND commented Mar 13, 2026

Summary

  • Registers a single model-facing tool named web_search, implemented by a Python path from the LLM model config (LLModel.web_search), not via get_pydantic_tools_by_name / legacy web_search_* tool names.
  • Extends Brave integration with an explicit /res/v1/llm/context path, grounding.generic normalization, and settings-driven limits for that endpoint.
  • Keeps classic /res/v1/web/search and the Brave + RAG implementation; which one runs is chosen by web_search on the model.
  • Test / CI: Test Django config gets safe AI_* placeholders so load_llm_configuration succeeds; HTTP tests mock the correct Brave URLs.

Changes

Area What changed
web_search_brave.py Shared _query_brave_api_with_endpoint_async; _query_brave_llm_context_api_async vs _query_brave_web_search_api_async; _normalize_llm_context_results; format_tool_return uses snippets or extra_snippets; new web_search_brave_llm_context entry point.
pydantic_ai.py _setup_web_search_tool(): register web_search with import_string(configuration.web_search), prepare gated on web_search_enabled; forced-search prompt always mentions web_search.
conversation.py get_web_search_tool_name()is_web_search_configured() (checks configuration.web_search).
llm_configuration.py Optional web_search on LLModel with settings.* / environ.* resolution.
chat/tools/__init__.py Removes Brave / Tavily / Albert web-search entries from get_pydantic_tools_by_name (web search is no longer resolved there).
brave_settings.py BRAVE_MAX_TOKENS, BRAVE_MAX_SNIPPETS, BRAVE_MAX_SNIPPETS_PER_URL for LLM context requests.
settings.py (Test) Non-None AI_MODEL, AI_BASE_URL, AI_API_KEY placeholders for config load during tests.
Tests test_web_search_brave: separate mock URLs for web search vs LLM context; test_smart_web_search, test_build_conversation_agent updated for the new registration and is_web_search_configured.

How to point the model at an implementation

Set web_search on the relevant LLModel in LLM JSON (e.g. chat.tools.web_search_brave.web_search_brave, web_search_brave_llm_context, or web_search_brave_with_document_backend). Omit it when the model should not expose web search.

Testing

pytest chat/tests/tools/test_web_search_brave.py \
       chat/tests/clients/pydantic_ai/test_smart_web_search.py \
       chat/tests/agents/test_build_conversation_agent.py

Summary by CodeRabbit

  • New Features

    • Web search now uses Brave’s LLM-context endpoint for richer context-aware results.
    • New configurable limits for search responses: max tokens, max snippets, and snippets per URL.
  • Improvements

    • Standardized web search tool identifier for compatibility with legacy names.
    • Improved snippet extraction, formatting, and more detailed debug logging for search results.
  • Tests

    • Test updates to reflect the new tool identifier and endpoint.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Renames the public web-search tool key to web_search, accepts legacy web_search_* names, switches Brave requests to the LLM context endpoint with added payload limits, normalizes grounding.generic responses into unified snippets, adds Brave LLM-related settings, and updates tests/mocks.

Changes

Cohort / File(s) Summary
Tool Registration
src/backend/chat/tools/__init__.py
Replaces public tool key web_search_brave_with_document_backend with web_search and sets Tool(..., name="web_search").
Tool Detection
src/backend/chat/agents/conversation.py
Expands detection to accept tool.name == "web_search" or legacy tool.name.startswith("web_search_").
Brave API & Snippet Handling
src/backend/chat/tools/web_search_brave.py
Adds _get_snippets_from_result, normalizes grounding.generic results into url/title/snippets, extends Brave request payload with max urls/tokens/snippets/per-url, updates snippet extraction/merging and logging, and filters results by snippet presence.
Configuration
src/backend/conversations/brave_settings.py
Adds BRAVE_MAX_TOKENS, BRAVE_MAX_SNIPPETS, BRAVE_MAX_SNIPPETS_PER_URL settings with defaults and env bindings.
Tests & Mocks
src/backend/chat/tests/clients/pydantic_ai/test_smart_web_search.py, src/backend/chat/tests/tools/test_web_search_brave.py
Updates tests to use web_search tool name and changes Brave mock URL to /res/v1/llm/context; adjusts mocks/assertions to new response shape and payload expectations.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Agent
  participant Tool as WebSearch Tool
  participant Brave as Brave API
  participant Config as BraveSettings

  User->>Agent: Query requiring web search
  Agent->>Tool: Invoke (name "web_search" or legacy)
  Tool->>Config: Read limits (tokens/snippets/urls)
  Tool->>Brave: POST /res/v1/llm/context (payload with limits)
  Brave-->>Tool: Respond (grounding.generic or legacy web.results)
  Tool->>Tool: Normalize results, merge/extract snippets
  Tool-->>Agent: Formatted results with snippets and sources
  Agent-->>User: Final response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

enhancement, backend

Suggested reviewers

  • providenz
  • elvoisin
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: switching to Brave's LLM context API with snippet extraction support, which aligns with the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch camand/feat_websearch_snippets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@camilleAND camilleAND requested a review from providenz March 13, 2026 15:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/backend/chat/tools/web_search_brave.py (2)

247-248: Consider using DEBUG level for raw search results logging.

Logging raw search results at INFO level could produce significant log volume in production. Consider downgrading to DEBUG, or logging only a summary (e.g., count of results).

💡 Suggested change
-    logger.info("Raw search results: %s", raw_search_results)
-    logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results})
+    logger.debug("Raw search results: %s", raw_search_results)
+    logger.debug("Unduplicated sources: %s", {result["url"] for result in raw_search_results})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 247 - 248, Change
the verbose INFO logs that print full raw_search_results to a lower verbosity:
replace logger.info("Raw search results: %s", raw_search_results) with
logger.debug(...) so raw payloads only appear when DEBUG is enabled, and either
change the unduplicated sources line logger.info("Unduplicated sources: %s",
{result["url"] for result in raw_search_results}) to logger.debug(...) or
replace it with a concise summary using logger.info("Search results count: %d,
unique_sources: %d", len(raw_search_results), len({r["url"] for r in
raw_search_results})); update the logging calls in the same function where
raw_search_results is produced so production logs aren’t flooded.

251-262: Extract the snippet-presence check to reduce duplication.

The condition result.get("snippets", []) or result.get("extra_snippets", []) is repeated three times (lines 255, 258, 262). Extracting it into a helper would improve maintainability and prevent drift.

♻️ Suggested refactor
 def format_tool_return(raw_search_results: List[dict]) -> ToolReturn:
     """Format the raw search results into a ToolReturn object."""
     logger.info("Raw search results: %s", raw_search_results)
     logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results})
+
+    def get_snippets(result: dict) -> list:
+        return result.get("snippets", []) or result.get("extra_snippets", [])
+
     return ToolReturn(
         # Format return value "mistral-like": https://docs.mistral.ai/capabilities/citations/
         return_value={
             str(idx): {
                 "url": result["url"],
                 "title": result["title"],
-                "snippets": result.get("snippets", []) or result.get("extra_snippets", []),
+                "snippets": get_snippets(result),
             }
             for idx, result in enumerate(raw_search_results)
-            if result.get("snippets", []) or result.get("extra_snippets", [])
+            if get_snippets(result)
         },
         metadata={
-            "sources": {
-                result["url"] for result in raw_search_results if result.get("snippets", []) or result.get("extra_snippets", [])
-            }
+            "sources": {result["url"] for result in raw_search_results if get_snippets(result)}
         },
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 251 - 262, Extract
the repeated snippet-presence check into a single helper to avoid duplication:
create a small predicate (e.g., has_snippets(result) or a local variable
snippet_present) and replace the three instances of result.get("snippets", [])
or result.get("extra_snippets", []) used inside the return_value dict
comprehension and the metadata "sources" set comprehension; update the dict
comprehension for return_value (the str(idx) -> {...} block), the inner
"snippets" value selection, and the metadata set comprehension to call the new
helper/variable so the condition and its semantics are defined in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/chat/tests/tools/test_web_search_brave.py`:
- Around line 30-31: Tests currently mock the old web.results shape while the
production code in _query_brave_api_async expects the grounding.generic
structure for the llm/context endpoint; add a new async test (e.g.,
test_query_brave_api_llm_context_response) that uses respx.mock to intercept GET
to BRAVE_URL and return a JSON payload with "grounding": {"generic": [...]}
containing entries with and without "title", then call
_query_brave_api_async("...") and assert the returned list length, that each
item maps "url" to url, "title" uses title or falls back to the url, and
"snippets" preserves the snippets to ensure the grounding.generic normalization
path is exercised.

---

Nitpick comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 247-248: Change the verbose INFO logs that print full
raw_search_results to a lower verbosity: replace logger.info("Raw search
results: %s", raw_search_results) with logger.debug(...) so raw payloads only
appear when DEBUG is enabled, and either change the unduplicated sources line
logger.info("Unduplicated sources: %s", {result["url"] for result in
raw_search_results}) to logger.debug(...) or replace it with a concise summary
using logger.info("Search results count: %d, unique_sources: %d",
len(raw_search_results), len({r["url"] for r in raw_search_results})); update
the logging calls in the same function where raw_search_results is produced so
production logs aren’t flooded.
- Around line 251-262: Extract the repeated snippet-presence check into a single
helper to avoid duplication: create a small predicate (e.g.,
has_snippets(result) or a local variable snippet_present) and replace the three
instances of result.get("snippets", []) or result.get("extra_snippets", []) used
inside the return_value dict comprehension and the metadata "sources" set
comprehension; update the dict comprehension for return_value (the str(idx) ->
{...} block), the inner "snippets" value selection, and the metadata set
comprehension to call the new helper/variable so the condition and its semantics
are defined in one place.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a116a415-4a18-4585-80cd-f5ab011bd4c5

📥 Commits

Reviewing files that changed from the base of the PR and between 2f74543 and 1078807.

📒 Files selected for processing (6)
  • src/backend/chat/agents/conversation.py
  • src/backend/chat/tests/clients/pydantic_ai/test_smart_web_search.py
  • src/backend/chat/tests/tools/test_web_search_brave.py
  • src/backend/chat/tools/__init__.py
  • src/backend/chat/tools/web_search_brave.py
  • src/backend/conversations/brave_settings.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/chat/tools/web_search_brave.py (1)

147-169: ⚠️ Potential issue | 🔴 Critical

Remove undocumented parameters that are incompatible with the LLM Context endpoint.

The code sends four parameters (safesearch, spellcheck, result_filter, extra_snippets) to the LLM Context endpoint, but these are not documented in the official Brave LLM Context API. These parameters belong to the standard web search endpoint only.

The parameter filter at line 170 only removes None values; it doesn't prevent sending invalid parameters to the API. This could cause unexpected behavior or failures.

Invalid parameters to remove:
  • safesearch (line 161)
  • spellcheck (line 162)
  • result_filter (line 163)
  • extra_snippets (line 164)

Keep only the documented LLM Context parameters: q, country, search_lang, count, maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets, and maximum_number_of_snippets_per_url.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 147 - 169, The
request body `data` for the Brave LLM Context endpoint is sending undocumented
parameters (safesearch, spellcheck, result_filter, extra_snippets) which belong
to the standard web search API; update the construction of `data` in
web_search_brave.py to only include the documented LLM Context fields — q,
country, search_lang, count, maximum_number_of_urls, maximum_number_of_tokens,
maximum_number_of_snippets, and maximum_number_of_snippets_per_url — and remove
the four invalid keys before the None-filtering step so the LLM Context endpoint
never receives them; locate the `data` dict in the function that builds the
request (the block where url, headers and data are defined) and trim it
accordingly.
🧹 Nitpick comments (1)
src/backend/chat/tools/web_search_brave.py (1)

247-248: Consider using DEBUG level for raw search results logging.

Logging the entire raw_search_results at INFO level could produce verbose logs in production (up to 50 snippets per the default BRAVE_MAX_SNIPPETS). Consider using logger.debug for the raw results while keeping the deduplicated URLs at INFO level for operational visibility.

Suggested change
-    logger.info("Raw search results: %s", raw_search_results)
+    logger.debug("Raw search results: %s", raw_search_results)
     logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 247 - 248, The
current INFO logs in web_search_brave.py are too verbose: change the log of the
full raw_search_results to logger.debug while keeping the deduplicated URL set
logged at INFO; locate the logging calls that reference raw_search_results and
the set comprehension {result["url"] for result in raw_search_results} and
replace the first logger.info with logger.debug("Raw search results: %s",
raw_search_results) so only the deduplicated sources remain at INFO for
operational visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 147-169: The request body `data` for the Brave LLM Context
endpoint is sending undocumented parameters (safesearch, spellcheck,
result_filter, extra_snippets) which belong to the standard web search API;
update the construction of `data` in web_search_brave.py to only include the
documented LLM Context fields — q, country, search_lang, count,
maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets,
and maximum_number_of_snippets_per_url — and remove the four invalid keys before
the None-filtering step so the LLM Context endpoint never receives them; locate
the `data` dict in the function that builds the request (the block where url,
headers and data are defined) and trim it accordingly.

---

Nitpick comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 247-248: The current INFO logs in web_search_brave.py are too
verbose: change the log of the full raw_search_results to logger.debug while
keeping the deduplicated URL set logged at INFO; locate the logging calls that
reference raw_search_results and the set comprehension {result["url"] for result
in raw_search_results} and replace the first logger.info with logger.debug("Raw
search results: %s", raw_search_results) so only the deduplicated sources remain
at INFO for operational visibility.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: deb983d5-b0c7-48e4-bddb-cb9b9b97b96d

📥 Commits

Reviewing files that changed from the base of the PR and between 1078807 and 2685555.

📒 Files selected for processing (1)
  • src/backend/chat/tools/web_search_brave.py

Copy link
Collaborator

@providenz providenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Works like a charm. A few issues in comments.

@eliott07
Copy link
Collaborator

#331

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/chat/tools/web_search_brave.py (1)

168-181: ⚠️ Potential issue | 🟠 Major

Remove parameters invalid for the LLM context endpoint.

The code targets the LLM context endpoint but includes parameters from the standard web search endpoint: safesearch, spellcheck, result_filter, and extra_snippets. These parameters are not valid for the LLM context API and should be removed. Only these are valid for LLM context: q, country, search_lang, count, maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets, maximum_number_of_snippets_per_url, and optionally freshness, context_threshold_mode, enable_local, and goggles.

Invalid parameters to remove
"safesearch": settings.BRAVE_SEARCH_SAFE_SEARCH,
"spellcheck": settings.BRAVE_SEARCH_SPELLCHECK,
"result_filter": "web,faq,query",
"extra_snippets": settings.BRAVE_SEARCH_EXTRA_SNIPPETS,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 168 - 181, The data
payload built in src/backend/chat/tools/web_search_brave.py (the data dict)
contains parameters invalid for the LLM context endpoint; remove "safesearch",
"spellcheck", "result_filter", and "extra_snippets" from the data dict and keep
only the valid keys (q, country, search_lang, count, maximum_number_of_urls,
maximum_number_of_tokens, maximum_number_of_snippets,
maximum_number_of_snippets_per_url), and if needed add only the documented
optional LLM-context keys (freshness, context_threshold_mode, enable_local,
goggles); update the construction of the data variable accordingly.
🧹 Nitpick comments (1)
src/backend/chat/tools/web_search_brave.py (1)

259-275: _get_snippets_from_result is called multiple times per result.

The helper function is invoked up to 3 times per result in this function (lines 265, 268, 272). Consider caching the result for efficiency.

♻️ Proposed refactor to avoid repeated calls
 def format_tool_return(raw_search_results: List[dict]) -> ToolReturn:
     """Format the raw search results into a ToolReturn object."""
     logger.debug("Raw search results: %s", raw_search_results)
     logger.debug("Unduplicated sources: %s", {result["url"] for result in raw_search_results})
+
+    # Pre-compute snippets for each result
+    results_with_snippets = [
+        (result, _get_snippets_from_result(result))
+        for result in raw_search_results
+    ]
+
     return ToolReturn(
         # Format return value "mistral-like": https://docs.mistral.ai/capabilities/citations/
         return_value={
             str(idx): {
                 "url": result["url"],
                 "title": result["title"],
-                "snippets": _get_snippets_from_result(result),
+                "snippets": snippets,
             }
-            for idx, result in enumerate(raw_search_results)
-            if _get_snippets_from_result(result)
+            for idx, (result, snippets) in enumerate(results_with_snippets)
+            if snippets
         },
         metadata={
-            "sources": {
-                result["url"] for result in raw_search_results if _get_snippets_from_result(result)
-            }
+            "sources": {result["url"] for result, snippets in results_with_snippets if snippets}
         },
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 259 - 275, The code
calls _get_snippets_from_result(raw_result) multiple times for each result when
building the ToolReturn (in the return_value dict comprehension and metadata
set); compute snippets once per result and reuse it: iterate over
raw_search_results, call snippets = _get_snippets_from_result(result) a single
time, then use that cached snippets value when populating the return_value entry
(using str(idx), result["url"], result["title"], snippets) and when adding URLs
to the metadata "sources" set to avoid redundant work and improve efficiency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 257-258: The INFO-level logging of raw_search_results risks
excessive verbosity and PII leakage; change the two logger.info calls that
reference raw_search_results and the undeduplicated URL set to logger.debug
(i.e., replace logger.info("Raw search results: %s", raw_search_results) and
logger.info("Unduplicated sources: %s", {result["url"] for result in
raw_search_results}) with logger.debug) so these details are only emitted at
debug level while keeping the same message content and variables (logger and
raw_search_results) intact.
- Around line 104-114: The function _get_snippets_from_result currently claims
to "Return merged snippets/extra_snippets" but never combines snippets and
extra_snippets; if both are non-empty it effectively returns only snippets. Fix
by updating _get_snippets_from_result so when both result.get("snippets") and
result.get("extra_snippets") are non-empty it returns a merged list (e.g.,
snippets + extra_snippets) while preserving the existing fallback behavior, or
alternatively change the docstring to explicitly state that snippets are
prioritized over extra_snippets; locate the logic around the variables snippets
and extra_snippets inside _get_snippets_from_result and implement the chosen
change.

---

Outside diff comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 168-181: The data payload built in
src/backend/chat/tools/web_search_brave.py (the data dict) contains parameters
invalid for the LLM context endpoint; remove "safesearch", "spellcheck",
"result_filter", and "extra_snippets" from the data dict and keep only the valid
keys (q, country, search_lang, count, maximum_number_of_urls,
maximum_number_of_tokens, maximum_number_of_snippets,
maximum_number_of_snippets_per_url), and if needed add only the documented
optional LLM-context keys (freshness, context_threshold_mode, enable_local,
goggles); update the construction of the data variable accordingly.

---

Nitpick comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 259-275: The code calls _get_snippets_from_result(raw_result)
multiple times for each result when building the ToolReturn (in the return_value
dict comprehension and metadata set); compute snippets once per result and reuse
it: iterate over raw_search_results, call snippets =
_get_snippets_from_result(result) a single time, then use that cached snippets
value when populating the return_value entry (using str(idx), result["url"],
result["title"], snippets) and when adding URLs to the metadata "sources" set to
avoid redundant work and improve efficiency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cb4c7183-1a29-45b3-a40a-35aae865ec1a

📥 Commits

Reviewing files that changed from the base of the PR and between 2685555 and c8997e5.

📒 Files selected for processing (1)
  • src/backend/chat/tools/web_search_brave.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/chat/tools/web_search_brave.py (1)

168-181: ⚠️ Potential issue | 🔴 Critical

Remove unsupported parameters from LLM Context API request.

The endpoint uses /res/v1/llm/context, but several parameters (safesearch, spellcheck, result_filter, extra_snippets) are from the web search API and not supported by the LLM Context API. According to Brave's API documentation, the LLM Context endpoint does not accept these parameters. Remove them from the request data dictionary to prevent silent ignoring or API errors.

Supported parameters for LLM Context include: q, country, search_lang, count, freshness, maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets, maximum_number_of_snippets_per_url, context_threshold_mode, enable_local, and goggles.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 168 - 181, The
request payload `data` sent to the LLM Context endpoint `/res/v1/llm/context`
includes unsupported web-search parameters; remove the keys "safesearch",
"spellcheck", "result_filter", and "extra_snippets" from the `data` dict in
src/backend/chat/tools/web_search_brave.py (where `data` is constructed) and
ensure only LLM Context-supported parameters remain (e.g., keep "q", "country",
"search_lang", "count", "maximum_number_of_urls", "maximum_number_of_tokens",
"maximum_number_of_snippets", "maximum_number_of_snippets_per_url" and add any
other supported flags like "freshness", "context_threshold_mode",
"enable_local", or "goggles" if needed).
♻️ Duplicate comments (1)
src/backend/chat/tools/web_search_brave.py (1)

104-114: ⚠️ Potential issue | 🟠 Major

Logic bug: snippets are never actually merged.

The docstring claims "Return merged snippets/extra_snippets" but when both lists are non-empty, line 113 returns only snippets (since snippets or extra_snippets short-circuits on a truthy snippets). The conditional branches on lines 109-112 are also redundant since line 113 handles all cases identically.

If merging is intended:

-    if snippets and not extra_snippets:
-        return snippets
-    if extra_snippets and not snippets:
-        return extra_snippets
-    return snippets or extra_snippets
+    return snippets + extra_snippets

If prioritization is intended, update the docstring to say "Return snippets, falling back to extra_snippets".

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 104 - 114, The
function _get_snippets_from_result incorrectly never merges snippets and
extra_snippets: when both are non-empty it returns only snippets due to the
final `snippets or extra_snippets` logic; update _get_snippets_from_result to
actually merge the two lists (e.g., return snippets + extra_snippets, optionally
deduplicating while preserving order) and simplify/remove the redundant
conditionals, or if the intended behavior is to prefer snippets over
extra_snippets, change the docstring to "Return snippets, falling back to
extra_snippets" and keep the current early-return logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 343-344: Remove the trailing whitespace after the `try:` token on
the line immediately following the `logger.debug("Starting web search with RAG
backend for query: %s", query)` call; locate the `try:` in
src/backend/chat/tools/web_search_brave.py (near the `logger.debug` call) and
ensure there are no extra spaces or tabs after the colon so the line ends
exactly with `try:`.

---

Outside diff comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 168-181: The request payload `data` sent to the LLM Context
endpoint `/res/v1/llm/context` includes unsupported web-search parameters;
remove the keys "safesearch", "spellcheck", "result_filter", and
"extra_snippets" from the `data` dict in
src/backend/chat/tools/web_search_brave.py (where `data` is constructed) and
ensure only LLM Context-supported parameters remain (e.g., keep "q", "country",
"search_lang", "count", "maximum_number_of_urls", "maximum_number_of_tokens",
"maximum_number_of_snippets", "maximum_number_of_snippets_per_url" and add any
other supported flags like "freshness", "context_threshold_mode",
"enable_local", or "goggles" if needed).

---

Duplicate comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 104-114: The function _get_snippets_from_result incorrectly never
merges snippets and extra_snippets: when both are non-empty it returns only
snippets due to the final `snippets or extra_snippets` logic; update
_get_snippets_from_result to actually merge the two lists (e.g., return snippets
+ extra_snippets, optionally deduplicating while preserving order) and
simplify/remove the redundant conditionals, or if the intended behavior is to
prefer snippets over extra_snippets, change the docstring to "Return snippets,
falling back to extra_snippets" and keep the current early-return logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fc31fa52-a849-4d36-89ba-965b95a0331b

📥 Commits

Reviewing files that changed from the base of the PR and between c8997e5 and 96ef231.

📒 Files selected for processing (1)
  • src/backend/chat/tools/web_search_brave.py

@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch 2 times, most recently from 96ef231 to 19d7588 Compare March 18, 2026 09:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/backend/chat/tools/web_search_brave.py (1)

104-114: ⚠️ Potential issue | 🟠 Major

Merge both snippet sources; current logic drops extra_snippets when both exist.

On Line 113, return snippets or extra_snippets returns only snippets when both lists are populated, which contradicts the function contract and discards additional extracted context.

🐛 Proposed fix
 def _get_snippets_from_result(result: dict) -> List[str]:
     """Return merged snippets/extra_snippets as a list, guarding against None."""
     snippets = result.get("snippets") or []
     extra_snippets = result.get("extra_snippets") or []
-    # Both are expected to be lists of strings; fall back to one or the other if needed.
-    if snippets and not extra_snippets:
-        return snippets
-    if extra_snippets and not snippets:
-        return extra_snippets
-    return snippets or extra_snippets
+    return snippets + extra_snippets
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/web_search_brave.py` around lines 104 - 114, The
_get_snippets_from_result function currently drops extra_snippets when both
snippets and extra_snippets are present; change the logic so when both lists
exist you return a merged list (e.g., concatenate snippets and extra_snippets,
optionally deduplicating) instead of returning only snippets; update the return
behavior in _get_snippets_from_result to combine result.get("snippets") and
result.get("extra_snippets") (still treating missing values as empty lists) so
both sources are preserved.
src/backend/chat/tools/__init__.py (1)

26-32: ⚠️ Potential issue | 🟠 Major

Keep a backward-compatible alias for the legacy tool key.

With Line 47’s strict dictionary lookup, configurations still using "web_search_brave_with_document_backend" will raise KeyError at runtime.

🔁 Proposed compatibility patch
 def get_pydantic_tools_by_name(name: str) -> Tool:
     """Get a tool by its name."""
+    web_search_tool = Tool(
+        web_search_brave_with_document_backend,
+        name="web_search",
+        takes_ctx=True,
+        prepare=only_if_web_search_enabled,
+        max_retries=2,
+    )
     tool_dict = {
         "get_current_weather": Tool(get_current_weather, takes_ctx=False),
         "web_search_brave": Tool(
             web_search_brave,
             takes_ctx=True,
             prepare=only_if_web_search_enabled,
             max_retries=2,
         ),
-        "web_search": Tool(
-            web_search_brave_with_document_backend,
-            name="web_search",
-            takes_ctx=True,
-            prepare=only_if_web_search_enabled,
-            max_retries=2,
-        ),
+        "web_search": web_search_tool,
+        "web_search_brave_with_document_backend": web_search_tool,
         "web_search_tavily": Tool(
             web_search_tavily,
             takes_ctx=False,
             prepare=only_if_web_search_enabled,
             max_retries=2,
         ),

Also applies to: 47-47

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/chat/tools/__init__.py` around lines 26 - 32, The tools registry
lacks a backward-compatible alias for the legacy key
"web_search_brave_with_document_backend", so configs using that exact key will
KeyError; update the tools dict to also expose the same Tool instance under the
legacy key (e.g., add "web_search_brave_with_document_backend": web_search or
assign the created Tool to a variable and reuse it) so the existing entry using
web_search_brave_with_document_backend maps to the same Tool created with
web_search_brave_with_document_backend handler
(web_search_brave_with_document_backend), name="web_search",
prepare=only_if_web_search_enabled, max_retries=2, ensuring lookups (the strict
dict lookup referenced) succeed for both keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/backend/chat/tools/__init__.py`:
- Around line 26-32: The tools registry lacks a backward-compatible alias for
the legacy key "web_search_brave_with_document_backend", so configs using that
exact key will KeyError; update the tools dict to also expose the same Tool
instance under the legacy key (e.g., add
"web_search_brave_with_document_backend": web_search or assign the created Tool
to a variable and reuse it) so the existing entry using
web_search_brave_with_document_backend maps to the same Tool created with
web_search_brave_with_document_backend handler
(web_search_brave_with_document_backend), name="web_search",
prepare=only_if_web_search_enabled, max_retries=2, ensuring lookups (the strict
dict lookup referenced) succeed for both keys.

In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 104-114: The _get_snippets_from_result function currently drops
extra_snippets when both snippets and extra_snippets are present; change the
logic so when both lists exist you return a merged list (e.g., concatenate
snippets and extra_snippets, optionally deduplicating) instead of returning only
snippets; update the return behavior in _get_snippets_from_result to combine
result.get("snippets") and result.get("extra_snippets") (still treating missing
values as empty lists) so both sources are preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c43b8fe1-672c-4d90-8cda-cd2b90958531

📥 Commits

Reviewing files that changed from the base of the PR and between 96ef231 and 19d7588.

📒 Files selected for processing (6)
  • src/backend/chat/agents/conversation.py
  • src/backend/chat/tests/clients/pydantic_ai/test_smart_web_search.py
  • src/backend/chat/tests/tools/test_web_search_brave.py
  • src/backend/chat/tools/__init__.py
  • src/backend/chat/tools/web_search_brave.py
  • src/backend/conversations/brave_settings.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/backend/chat/tests/tools/test_web_search_brave.py
  • src/backend/chat/agents/conversation.py
  • src/backend/conversations/brave_settings.py

@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch 8 times, most recently from a2a5fea to 0c37183 Compare March 18, 2026 13:00
Copy link
Member

@qbey qbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yet I think we should allow "old tool" to still be used (and we can select one or another in configuration).

@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch 2 times, most recently from 2cd06de to e335f1c Compare March 20, 2026 11:00
@camilleAND camilleAND requested a review from qbey March 20, 2026 11:01
@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch from e335f1c to f2080a2 Compare March 20, 2026 16:08
@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch 3 times, most recently from 82f13a2 to 752fe92 Compare March 24, 2026 16:56
@camilleAND camilleAND requested a review from qbey March 24, 2026 17:04
@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch from 752fe92 to ccdcc2f Compare March 24, 2026 17:16
@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch from ccdcc2f to 2a1d4ff Compare March 25, 2026 10:00
@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch from 2a1d4ff to ec91ec1 Compare March 25, 2026 10:55
Use llm/context endpoint with snippets, change tool name for web_search

Signed-off-by: camilleAND <camille.andre@modernisation.gouv.fr>
@camilleAND camilleAND force-pushed the camand/feat_websearch_snippets branch from ec91ec1 to cb2bfd5 Compare March 25, 2026 11:01
@sonarqubecloud
Copy link

@camilleAND camilleAND merged commit cb2bfd5 into main Mar 26, 2026
28 of 29 checks passed
@camilleAND camilleAND deleted the camand/feat_websearch_snippets branch March 26, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants