🔬(websearch) Use Brave LLM context API with snippets#330
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRenames the public web-search tool key to Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent
participant Tool as WebSearch Tool
participant Brave as Brave API
participant Config as BraveSettings
User->>Agent: Query requiring web search
Agent->>Tool: Invoke (name "web_search" or legacy)
Tool->>Config: Read limits (tokens/snippets/urls)
Tool->>Brave: POST /res/v1/llm/context (payload with limits)
Brave-->>Tool: Respond (grounding.generic or legacy web.results)
Tool->>Tool: Normalize results, merge/extract snippets
Tool-->>Agent: Formatted results with snippets and sources
Agent-->>User: Final response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/backend/chat/tools/web_search_brave.py (2)
247-248: Consider using DEBUG level for raw search results logging.Logging raw search results at INFO level could produce significant log volume in production. Consider downgrading to DEBUG, or logging only a summary (e.g., count of results).
💡 Suggested change
- logger.info("Raw search results: %s", raw_search_results) - logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results}) + logger.debug("Raw search results: %s", raw_search_results) + logger.debug("Unduplicated sources: %s", {result["url"] for result in raw_search_results})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 247 - 248, Change the verbose INFO logs that print full raw_search_results to a lower verbosity: replace logger.info("Raw search results: %s", raw_search_results) with logger.debug(...) so raw payloads only appear when DEBUG is enabled, and either change the unduplicated sources line logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results}) to logger.debug(...) or replace it with a concise summary using logger.info("Search results count: %d, unique_sources: %d", len(raw_search_results), len({r["url"] for r in raw_search_results})); update the logging calls in the same function where raw_search_results is produced so production logs aren’t flooded.
251-262: Extract the snippet-presence check to reduce duplication.The condition
result.get("snippets", []) or result.get("extra_snippets", [])is repeated three times (lines 255, 258, 262). Extracting it into a helper would improve maintainability and prevent drift.♻️ Suggested refactor
def format_tool_return(raw_search_results: List[dict]) -> ToolReturn: """Format the raw search results into a ToolReturn object.""" logger.info("Raw search results: %s", raw_search_results) logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results}) + + def get_snippets(result: dict) -> list: + return result.get("snippets", []) or result.get("extra_snippets", []) + return ToolReturn( # Format return value "mistral-like": https://docs.mistral.ai/capabilities/citations/ return_value={ str(idx): { "url": result["url"], "title": result["title"], - "snippets": result.get("snippets", []) or result.get("extra_snippets", []), + "snippets": get_snippets(result), } for idx, result in enumerate(raw_search_results) - if result.get("snippets", []) or result.get("extra_snippets", []) + if get_snippets(result) }, metadata={ - "sources": { - result["url"] for result in raw_search_results if result.get("snippets", []) or result.get("extra_snippets", []) - } + "sources": {result["url"] for result in raw_search_results if get_snippets(result)} }, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 251 - 262, Extract the repeated snippet-presence check into a single helper to avoid duplication: create a small predicate (e.g., has_snippets(result) or a local variable snippet_present) and replace the three instances of result.get("snippets", []) or result.get("extra_snippets", []) used inside the return_value dict comprehension and the metadata "sources" set comprehension; update the dict comprehension for return_value (the str(idx) -> {...} block), the inner "snippets" value selection, and the metadata set comprehension to call the new helper/variable so the condition and its semantics are defined in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tests/tools/test_web_search_brave.py`:
- Around line 30-31: Tests currently mock the old web.results shape while the
production code in _query_brave_api_async expects the grounding.generic
structure for the llm/context endpoint; add a new async test (e.g.,
test_query_brave_api_llm_context_response) that uses respx.mock to intercept GET
to BRAVE_URL and return a JSON payload with "grounding": {"generic": [...]}
containing entries with and without "title", then call
_query_brave_api_async("...") and assert the returned list length, that each
item maps "url" to url, "title" uses title or falls back to the url, and
"snippets" preserves the snippets to ensure the grounding.generic normalization
path is exercised.
---
Nitpick comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 247-248: Change the verbose INFO logs that print full
raw_search_results to a lower verbosity: replace logger.info("Raw search
results: %s", raw_search_results) with logger.debug(...) so raw payloads only
appear when DEBUG is enabled, and either change the unduplicated sources line
logger.info("Unduplicated sources: %s", {result["url"] for result in
raw_search_results}) to logger.debug(...) or replace it with a concise summary
using logger.info("Search results count: %d, unique_sources: %d",
len(raw_search_results), len({r["url"] for r in raw_search_results})); update
the logging calls in the same function where raw_search_results is produced so
production logs aren’t flooded.
- Around line 251-262: Extract the repeated snippet-presence check into a single
helper to avoid duplication: create a small predicate (e.g.,
has_snippets(result) or a local variable snippet_present) and replace the three
instances of result.get("snippets", []) or result.get("extra_snippets", []) used
inside the return_value dict comprehension and the metadata "sources" set
comprehension; update the dict comprehension for return_value (the str(idx) ->
{...} block), the inner "snippets" value selection, and the metadata set
comprehension to call the new helper/variable so the condition and its semantics
are defined in one place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a116a415-4a18-4585-80cd-f5ab011bd4c5
📒 Files selected for processing (6)
src/backend/chat/agents/conversation.pysrc/backend/chat/tests/clients/pydantic_ai/test_smart_web_search.pysrc/backend/chat/tests/tools/test_web_search_brave.pysrc/backend/chat/tools/__init__.pysrc/backend/chat/tools/web_search_brave.pysrc/backend/conversations/brave_settings.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/chat/tools/web_search_brave.py (1)
147-169:⚠️ Potential issue | 🔴 CriticalRemove undocumented parameters that are incompatible with the LLM Context endpoint.
The code sends four parameters (
safesearch,spellcheck,result_filter,extra_snippets) to the LLM Context endpoint, but these are not documented in the official Brave LLM Context API. These parameters belong to the standard web search endpoint only.The parameter filter at line 170 only removes
Nonevalues; it doesn't prevent sending invalid parameters to the API. This could cause unexpected behavior or failures.Invalid parameters to remove:
safesearch(line 161)spellcheck(line 162)result_filter(line 163)extra_snippets(line 164)Keep only the documented LLM Context parameters:
q,country,search_lang,count,maximum_number_of_urls,maximum_number_of_tokens,maximum_number_of_snippets, andmaximum_number_of_snippets_per_url.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 147 - 169, The request body `data` for the Brave LLM Context endpoint is sending undocumented parameters (safesearch, spellcheck, result_filter, extra_snippets) which belong to the standard web search API; update the construction of `data` in web_search_brave.py to only include the documented LLM Context fields — q, country, search_lang, count, maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets, and maximum_number_of_snippets_per_url — and remove the four invalid keys before the None-filtering step so the LLM Context endpoint never receives them; locate the `data` dict in the function that builds the request (the block where url, headers and data are defined) and trim it accordingly.
🧹 Nitpick comments (1)
src/backend/chat/tools/web_search_brave.py (1)
247-248: Consider using DEBUG level for raw search results logging.Logging the entire
raw_search_resultsat INFO level could produce verbose logs in production (up to 50 snippets per the defaultBRAVE_MAX_SNIPPETS). Consider usinglogger.debugfor the raw results while keeping the deduplicated URLs at INFO level for operational visibility.Suggested change
- logger.info("Raw search results: %s", raw_search_results) + logger.debug("Raw search results: %s", raw_search_results) logger.info("Unduplicated sources: %s", {result["url"] for result in raw_search_results})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 247 - 248, The current INFO logs in web_search_brave.py are too verbose: change the log of the full raw_search_results to logger.debug while keeping the deduplicated URL set logged at INFO; locate the logging calls that reference raw_search_results and the set comprehension {result["url"] for result in raw_search_results} and replace the first logger.info with logger.debug("Raw search results: %s", raw_search_results) so only the deduplicated sources remain at INFO for operational visibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 147-169: The request body `data` for the Brave LLM Context
endpoint is sending undocumented parameters (safesearch, spellcheck,
result_filter, extra_snippets) which belong to the standard web search API;
update the construction of `data` in web_search_brave.py to only include the
documented LLM Context fields — q, country, search_lang, count,
maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets,
and maximum_number_of_snippets_per_url — and remove the four invalid keys before
the None-filtering step so the LLM Context endpoint never receives them; locate
the `data` dict in the function that builds the request (the block where url,
headers and data are defined) and trim it accordingly.
---
Nitpick comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 247-248: The current INFO logs in web_search_brave.py are too
verbose: change the log of the full raw_search_results to logger.debug while
keeping the deduplicated URL set logged at INFO; locate the logging calls that
reference raw_search_results and the set comprehension {result["url"] for result
in raw_search_results} and replace the first logger.info with logger.debug("Raw
search results: %s", raw_search_results) so only the deduplicated sources remain
at INFO for operational visibility.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: deb983d5-b0c7-48e4-bddb-cb9b9b97b96d
📒 Files selected for processing (1)
src/backend/chat/tools/web_search_brave.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/chat/tools/web_search_brave.py (1)
168-181:⚠️ Potential issue | 🟠 MajorRemove parameters invalid for the LLM context endpoint.
The code targets the LLM context endpoint but includes parameters from the standard web search endpoint:
safesearch,spellcheck,result_filter, andextra_snippets. These parameters are not valid for the LLM context API and should be removed. Only these are valid for LLM context:q,country,search_lang,count,maximum_number_of_urls,maximum_number_of_tokens,maximum_number_of_snippets,maximum_number_of_snippets_per_url, and optionallyfreshness,context_threshold_mode,enable_local, andgoggles.Invalid parameters to remove
"safesearch": settings.BRAVE_SEARCH_SAFE_SEARCH, "spellcheck": settings.BRAVE_SEARCH_SPELLCHECK, "result_filter": "web,faq,query", "extra_snippets": settings.BRAVE_SEARCH_EXTRA_SNIPPETS,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 168 - 181, The data payload built in src/backend/chat/tools/web_search_brave.py (the data dict) contains parameters invalid for the LLM context endpoint; remove "safesearch", "spellcheck", "result_filter", and "extra_snippets" from the data dict and keep only the valid keys (q, country, search_lang, count, maximum_number_of_urls, maximum_number_of_tokens, maximum_number_of_snippets, maximum_number_of_snippets_per_url), and if needed add only the documented optional LLM-context keys (freshness, context_threshold_mode, enable_local, goggles); update the construction of the data variable accordingly.
🧹 Nitpick comments (1)
src/backend/chat/tools/web_search_brave.py (1)
259-275:_get_snippets_from_resultis called multiple times per result.The helper function is invoked up to 3 times per result in this function (lines 265, 268, 272). Consider caching the result for efficiency.
♻️ Proposed refactor to avoid repeated calls
def format_tool_return(raw_search_results: List[dict]) -> ToolReturn: """Format the raw search results into a ToolReturn object.""" logger.debug("Raw search results: %s", raw_search_results) logger.debug("Unduplicated sources: %s", {result["url"] for result in raw_search_results}) + + # Pre-compute snippets for each result + results_with_snippets = [ + (result, _get_snippets_from_result(result)) + for result in raw_search_results + ] + return ToolReturn( # Format return value "mistral-like": https://docs.mistral.ai/capabilities/citations/ return_value={ str(idx): { "url": result["url"], "title": result["title"], - "snippets": _get_snippets_from_result(result), + "snippets": snippets, } - for idx, result in enumerate(raw_search_results) - if _get_snippets_from_result(result) + for idx, (result, snippets) in enumerate(results_with_snippets) + if snippets }, metadata={ - "sources": { - result["url"] for result in raw_search_results if _get_snippets_from_result(result) - } + "sources": {result["url"] for result, snippets in results_with_snippets if snippets} }, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 259 - 275, The code calls _get_snippets_from_result(raw_result) multiple times for each result when building the ToolReturn (in the return_value dict comprehension and metadata set); compute snippets once per result and reuse it: iterate over raw_search_results, call snippets = _get_snippets_from_result(result) a single time, then use that cached snippets value when populating the return_value entry (using str(idx), result["url"], result["title"], snippets) and when adding URLs to the metadata "sources" set to avoid redundant work and improve efficiency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 257-258: The INFO-level logging of raw_search_results risks
excessive verbosity and PII leakage; change the two logger.info calls that
reference raw_search_results and the undeduplicated URL set to logger.debug
(i.e., replace logger.info("Raw search results: %s", raw_search_results) and
logger.info("Unduplicated sources: %s", {result["url"] for result in
raw_search_results}) with logger.debug) so these details are only emitted at
debug level while keeping the same message content and variables (logger and
raw_search_results) intact.
- Around line 104-114: The function _get_snippets_from_result currently claims
to "Return merged snippets/extra_snippets" but never combines snippets and
extra_snippets; if both are non-empty it effectively returns only snippets. Fix
by updating _get_snippets_from_result so when both result.get("snippets") and
result.get("extra_snippets") are non-empty it returns a merged list (e.g.,
snippets + extra_snippets) while preserving the existing fallback behavior, or
alternatively change the docstring to explicitly state that snippets are
prioritized over extra_snippets; locate the logic around the variables snippets
and extra_snippets inside _get_snippets_from_result and implement the chosen
change.
---
Outside diff comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 168-181: The data payload built in
src/backend/chat/tools/web_search_brave.py (the data dict) contains parameters
invalid for the LLM context endpoint; remove "safesearch", "spellcheck",
"result_filter", and "extra_snippets" from the data dict and keep only the valid
keys (q, country, search_lang, count, maximum_number_of_urls,
maximum_number_of_tokens, maximum_number_of_snippets,
maximum_number_of_snippets_per_url), and if needed add only the documented
optional LLM-context keys (freshness, context_threshold_mode, enable_local,
goggles); update the construction of the data variable accordingly.
---
Nitpick comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 259-275: The code calls _get_snippets_from_result(raw_result)
multiple times for each result when building the ToolReturn (in the return_value
dict comprehension and metadata set); compute snippets once per result and reuse
it: iterate over raw_search_results, call snippets =
_get_snippets_from_result(result) a single time, then use that cached snippets
value when populating the return_value entry (using str(idx), result["url"],
result["title"], snippets) and when adding URLs to the metadata "sources" set to
avoid redundant work and improve efficiency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cb4c7183-1a29-45b3-a40a-35aae865ec1a
📒 Files selected for processing (1)
src/backend/chat/tools/web_search_brave.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/chat/tools/web_search_brave.py (1)
168-181:⚠️ Potential issue | 🔴 CriticalRemove unsupported parameters from LLM Context API request.
The endpoint uses
/res/v1/llm/context, but several parameters (safesearch,spellcheck,result_filter,extra_snippets) are from the web search API and not supported by the LLM Context API. According to Brave's API documentation, the LLM Context endpoint does not accept these parameters. Remove them from the request data dictionary to prevent silent ignoring or API errors.Supported parameters for LLM Context include:
q,country,search_lang,count,freshness,maximum_number_of_urls,maximum_number_of_tokens,maximum_number_of_snippets,maximum_number_of_snippets_per_url,context_threshold_mode,enable_local, andgoggles.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 168 - 181, The request payload `data` sent to the LLM Context endpoint `/res/v1/llm/context` includes unsupported web-search parameters; remove the keys "safesearch", "spellcheck", "result_filter", and "extra_snippets" from the `data` dict in src/backend/chat/tools/web_search_brave.py (where `data` is constructed) and ensure only LLM Context-supported parameters remain (e.g., keep "q", "country", "search_lang", "count", "maximum_number_of_urls", "maximum_number_of_tokens", "maximum_number_of_snippets", "maximum_number_of_snippets_per_url" and add any other supported flags like "freshness", "context_threshold_mode", "enable_local", or "goggles" if needed).
♻️ Duplicate comments (1)
src/backend/chat/tools/web_search_brave.py (1)
104-114:⚠️ Potential issue | 🟠 MajorLogic bug: snippets are never actually merged.
The docstring claims "Return merged snippets/extra_snippets" but when both lists are non-empty, line 113 returns only
snippets(sincesnippets or extra_snippetsshort-circuits on a truthysnippets). The conditional branches on lines 109-112 are also redundant since line 113 handles all cases identically.If merging is intended:
- if snippets and not extra_snippets: - return snippets - if extra_snippets and not snippets: - return extra_snippets - return snippets or extra_snippets + return snippets + extra_snippetsIf prioritization is intended, update the docstring to say "Return snippets, falling back to extra_snippets".
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 104 - 114, The function _get_snippets_from_result incorrectly never merges snippets and extra_snippets: when both are non-empty it returns only snippets due to the final `snippets or extra_snippets` logic; update _get_snippets_from_result to actually merge the two lists (e.g., return snippets + extra_snippets, optionally deduplicating while preserving order) and simplify/remove the redundant conditionals, or if the intended behavior is to prefer snippets over extra_snippets, change the docstring to "Return snippets, falling back to extra_snippets" and keep the current early-return logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 343-344: Remove the trailing whitespace after the `try:` token on
the line immediately following the `logger.debug("Starting web search with RAG
backend for query: %s", query)` call; locate the `try:` in
src/backend/chat/tools/web_search_brave.py (near the `logger.debug` call) and
ensure there are no extra spaces or tabs after the colon so the line ends
exactly with `try:`.
---
Outside diff comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 168-181: The request payload `data` sent to the LLM Context
endpoint `/res/v1/llm/context` includes unsupported web-search parameters;
remove the keys "safesearch", "spellcheck", "result_filter", and
"extra_snippets" from the `data` dict in
src/backend/chat/tools/web_search_brave.py (where `data` is constructed) and
ensure only LLM Context-supported parameters remain (e.g., keep "q", "country",
"search_lang", "count", "maximum_number_of_urls", "maximum_number_of_tokens",
"maximum_number_of_snippets", "maximum_number_of_snippets_per_url" and add any
other supported flags like "freshness", "context_threshold_mode",
"enable_local", or "goggles" if needed).
---
Duplicate comments:
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 104-114: The function _get_snippets_from_result incorrectly never
merges snippets and extra_snippets: when both are non-empty it returns only
snippets due to the final `snippets or extra_snippets` logic; update
_get_snippets_from_result to actually merge the two lists (e.g., return snippets
+ extra_snippets, optionally deduplicating while preserving order) and
simplify/remove the redundant conditionals, or if the intended behavior is to
prefer snippets over extra_snippets, change the docstring to "Return snippets,
falling back to extra_snippets" and keep the current early-return logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fc31fa52-a849-4d36-89ba-965b95a0331b
📒 Files selected for processing (1)
src/backend/chat/tools/web_search_brave.py
96ef231 to
19d7588
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/backend/chat/tools/web_search_brave.py (1)
104-114:⚠️ Potential issue | 🟠 MajorMerge both snippet sources; current logic drops
extra_snippetswhen both exist.On Line 113,
return snippets or extra_snippetsreturns onlysnippetswhen both lists are populated, which contradicts the function contract and discards additional extracted context.🐛 Proposed fix
def _get_snippets_from_result(result: dict) -> List[str]: """Return merged snippets/extra_snippets as a list, guarding against None.""" snippets = result.get("snippets") or [] extra_snippets = result.get("extra_snippets") or [] - # Both are expected to be lists of strings; fall back to one or the other if needed. - if snippets and not extra_snippets: - return snippets - if extra_snippets and not snippets: - return extra_snippets - return snippets or extra_snippets + return snippets + extra_snippets🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/web_search_brave.py` around lines 104 - 114, The _get_snippets_from_result function currently drops extra_snippets when both snippets and extra_snippets are present; change the logic so when both lists exist you return a merged list (e.g., concatenate snippets and extra_snippets, optionally deduplicating) instead of returning only snippets; update the return behavior in _get_snippets_from_result to combine result.get("snippets") and result.get("extra_snippets") (still treating missing values as empty lists) so both sources are preserved.src/backend/chat/tools/__init__.py (1)
26-32:⚠️ Potential issue | 🟠 MajorKeep a backward-compatible alias for the legacy tool key.
With Line 47’s strict dictionary lookup, configurations still using
"web_search_brave_with_document_backend"will raiseKeyErrorat runtime.🔁 Proposed compatibility patch
def get_pydantic_tools_by_name(name: str) -> Tool: """Get a tool by its name.""" + web_search_tool = Tool( + web_search_brave_with_document_backend, + name="web_search", + takes_ctx=True, + prepare=only_if_web_search_enabled, + max_retries=2, + ) tool_dict = { "get_current_weather": Tool(get_current_weather, takes_ctx=False), "web_search_brave": Tool( web_search_brave, takes_ctx=True, prepare=only_if_web_search_enabled, max_retries=2, ), - "web_search": Tool( - web_search_brave_with_document_backend, - name="web_search", - takes_ctx=True, - prepare=only_if_web_search_enabled, - max_retries=2, - ), + "web_search": web_search_tool, + "web_search_brave_with_document_backend": web_search_tool, "web_search_tavily": Tool( web_search_tavily, takes_ctx=False, prepare=only_if_web_search_enabled, max_retries=2, ),Also applies to: 47-47
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/chat/tools/__init__.py` around lines 26 - 32, The tools registry lacks a backward-compatible alias for the legacy key "web_search_brave_with_document_backend", so configs using that exact key will KeyError; update the tools dict to also expose the same Tool instance under the legacy key (e.g., add "web_search_brave_with_document_backend": web_search or assign the created Tool to a variable and reuse it) so the existing entry using web_search_brave_with_document_backend maps to the same Tool created with web_search_brave_with_document_backend handler (web_search_brave_with_document_backend), name="web_search", prepare=only_if_web_search_enabled, max_retries=2, ensuring lookups (the strict dict lookup referenced) succeed for both keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/backend/chat/tools/__init__.py`:
- Around line 26-32: The tools registry lacks a backward-compatible alias for
the legacy key "web_search_brave_with_document_backend", so configs using that
exact key will KeyError; update the tools dict to also expose the same Tool
instance under the legacy key (e.g., add
"web_search_brave_with_document_backend": web_search or assign the created Tool
to a variable and reuse it) so the existing entry using
web_search_brave_with_document_backend maps to the same Tool created with
web_search_brave_with_document_backend handler
(web_search_brave_with_document_backend), name="web_search",
prepare=only_if_web_search_enabled, max_retries=2, ensuring lookups (the strict
dict lookup referenced) succeed for both keys.
In `@src/backend/chat/tools/web_search_brave.py`:
- Around line 104-114: The _get_snippets_from_result function currently drops
extra_snippets when both snippets and extra_snippets are present; change the
logic so when both lists exist you return a merged list (e.g., concatenate
snippets and extra_snippets, optionally deduplicating) instead of returning only
snippets; update the return behavior in _get_snippets_from_result to combine
result.get("snippets") and result.get("extra_snippets") (still treating missing
values as empty lists) so both sources are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c43b8fe1-672c-4d90-8cda-cd2b90958531
📒 Files selected for processing (6)
src/backend/chat/agents/conversation.pysrc/backend/chat/tests/clients/pydantic_ai/test_smart_web_search.pysrc/backend/chat/tests/tools/test_web_search_brave.pysrc/backend/chat/tools/__init__.pysrc/backend/chat/tools/web_search_brave.pysrc/backend/conversations/brave_settings.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/backend/chat/tests/tools/test_web_search_brave.py
- src/backend/chat/agents/conversation.py
- src/backend/conversations/brave_settings.py
a2a5fea to
0c37183
Compare
qbey
left a comment
There was a problem hiding this comment.
Nice, yet I think we should allow "old tool" to still be used (and we can select one or another in configuration).
2cd06de to
e335f1c
Compare
e335f1c to
f2080a2
Compare
82f13a2 to
752fe92
Compare
752fe92 to
ccdcc2f
Compare
ccdcc2f to
2a1d4ff
Compare
2a1d4ff to
ec91ec1
Compare
Use llm/context endpoint with snippets, change tool name for web_search Signed-off-by: camilleAND <camille.andre@modernisation.gouv.fr>
ec91ec1 to
cb2bfd5
Compare
|



Summary
web_search, implemented by a Python path from the LLM model config (LLModel.web_search), not viaget_pydantic_tools_by_name/ legacyweb_search_*tool names./res/v1/llm/contextpath,grounding.genericnormalization, and settings-driven limits for that endpoint./res/v1/web/searchand the Brave + RAG implementation; which one runs is chosen byweb_searchon the model.TestDjango config gets safeAI_*placeholders soload_llm_configurationsucceeds; HTTP tests mock the correct Brave URLs.Changes
web_search_brave.py_query_brave_api_with_endpoint_async;_query_brave_llm_context_api_asyncvs_query_brave_web_search_api_async;_normalize_llm_context_results;format_tool_returnusessnippetsorextra_snippets; newweb_search_brave_llm_contextentry point.pydantic_ai.py_setup_web_search_tool(): registerweb_searchwithimport_string(configuration.web_search),preparegated onweb_search_enabled; forced-search prompt always mentionsweb_search.conversation.pyget_web_search_tool_name()→is_web_search_configured()(checksconfiguration.web_search).llm_configuration.pyweb_searchonLLModelwithsettings.*/environ.*resolution.chat/tools/__init__.pyget_pydantic_tools_by_name(web search is no longer resolved there).brave_settings.pyBRAVE_MAX_TOKENS,BRAVE_MAX_SNIPPETS,BRAVE_MAX_SNIPPETS_PER_URLfor LLM context requests.settings.py(Test)NoneAI_MODEL,AI_BASE_URL,AI_API_KEYplaceholders for config load during tests.test_web_search_brave: separate mock URLs for web search vs LLM context;test_smart_web_search,test_build_conversation_agentupdated for the new registration andis_web_search_configured.How to point the model at an implementation
Set
web_searchon the relevantLLModelin LLM JSON (e.g.chat.tools.web_search_brave.web_search_brave,web_search_brave_llm_context, orweb_search_brave_with_document_backend). Omit it when the model should not expose web search.Testing
pytest chat/tests/tools/test_web_search_brave.py \ chat/tests/clients/pydantic_ai/test_smart_web_search.py \ chat/tests/agents/test_build_conversation_agent.pySummary by CodeRabbit
New Features
Improvements
Tests