-
Notifications
You must be signed in to change notification settings - Fork 43
Dynamic FAQ Handler #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Dynamic FAQ Handler #126
Conversation
WalkthroughReplaces the static FAQ handler with a dynamic, site-restricted web-search + LLM pipeline and updates callers to pass Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DevRelAgent
participant ReactSupervisor
participant FAQHandler
participant SearchTool
participant LLM
User->>DevRelAgent: Ask org-level question
DevRelAgent->>ReactSupervisor: react_supervisor_node(state)
ReactSupervisor-->>DevRelAgent: Decision: faq_handler
DevRelAgent->>FAQHandler: handle_faq_node(state, search_tool, llm)
FAQHandler->>LLM: Refine query with org/site filters
LLM-->>FAQHandler: Refined query
FAQHandler->>SearchTool: Search with site_filters
SearchTool-->>FAQHandler: Results
FAQHandler->>LLM: Synthesize answer + sources
LLM-->>FAQHandler: Draft answer
FAQHandler-->>DevRelAgent: Formatted response + top sources
DevRelAgent-->>User: Answer with sources
sequenceDiagram
participant DevRelAgent
participant ReactSupervisor
participant LLM
DevRelAgent->>ReactSupervisor: react_supervisor_node(state)
ReactSupervisor->>ReactSupervisor: Validate state & enforce caps
ReactSupervisor->>LLM: Prompt with context
LLM-->>ReactSupervisor: THINK / ACT / REASON
ReactSupervisor->>ReactSupervisor: Parse & validate action
ReactSupervisor-->>DevRelAgent: Decision, iteration_count
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (9)
env.example (1)
29-29
: Trailing whitespace on MAX_RETRIESRemove the trailing space to avoid lint churn.
-MAX_RETRIES=3 +MAX_RETRIES=3backend/app/agents/devrel/nodes/handlers/faq.py (2)
114-181
: Prompts: lower log verbosity for user content and queriesSensitive user content and generated prompts are logged at INFO. Consider downgrading to DEBUG to reduce PII exposure in logs.
- logger.info(f"[FAQ_HANDLER] Refined query: {refined_query}") + logger.debug(f"[FAQ_HANDLER] Refined query: {refined_query}") ... - logger.info("[FAQ_HANDLER] Step 3-4: Synthesizing search results into FAQ response") + logger.debug("[FAQ_HANDLER] Step 3-4: Synthesizing search results into FAQ response") ... - logger.info(f"[FAQ_HANDLER] Synthesized FAQ response: {synthesized_answer[:100]}...") + logger.debug(f"[FAQ_HANDLER] Synthesized FAQ response: {synthesized_answer[:100]}...")
214-238
: Inline test harness in production moduleThe main test harness is useful locally but should live under tests/ to avoid accidental execution and to keep modules import-safe.
-# Example usage for testing -if __name__ == "__main__": - ... - asyncio.run(test_faq_handler()) +# Consider moving the inline test harness to tests/ with an async test case.backend/app/agents/devrel/nodes/react_supervisor.py (6)
167-171
: Use timezone-aware timestamps in tool resultsNaive datetime.now() is ambiguous; prefer UTC-aware timestamps for consistent logs and auditing.
Apply this diff:
-from datetime import datetime +from datetime import datetime, timezone @@ - "timestamp": datetime.now().isoformat() + "timestamp": datetime.now(timezone.utc).isoformat()Also applies to: 3-3
179-187
: Deduplicate tools_used and handle None robustlyAvoid unbounded duplicates in tools_used and guard against None.
Apply this diff:
- tools_used = getattr(state, 'tools_used', []) + [tool_name] + existing_tools = getattr(state, 'tools_used', []) or [] + tools_used = list(dict.fromkeys([*existing_tools, tool_name]))
85-114
: Make THINK/ACT/REASON parsing case-insensitive and tolerant to format variationsCurrent parsing requires exact uppercase keywords with no space before colon. LLMs often vary casing/spacing (e.g., "Act:", "REASON :"). Normalize case and split on the first colon.
Apply this diff:
- for line in response.strip().split("\n"): - line = line.strip() + for line in response.strip().split("\n"): + line = line.strip() + line_upper = line.upper() if not line: continue @@ - if line.startswith("THINK:"): + if line_upper.startswith("THINK"): if current_section and content_buffer: decision[current_section] = " ".join(content_buffer) current_section = "thinking" - content_buffer = [line.replace("THINK:", "").strip()] + content_buffer = [line.split(":", 1)[1].strip()] if ":" in line else [] @@ - elif line.startswith("ACT:"): + elif line_upper.startswith("ACT"): if current_section and content_buffer: decision[current_section] = " ".join(content_buffer) - action = line.replace("ACT:", "").strip().lower() + action = line.split(":", 1)[1].strip().lower() if ":" in line else "" if action in VALID_ACTIONS: decision["action"] = action else: logger.warning(f"Invalid action '{action}', defaulting to 'complete'") decision["action"] = "complete" current_section = None content_buffer = [] @@ - elif line.startswith("REASON:"): + elif line_upper.startswith("REASON"): if current_section and content_buffer: decision[current_section] = " ".join(content_buffer) current_section = "reasoning" - content_buffer = [line.replace("REASON:", "").strip()] + content_buffer = [line.split(":", 1)[1].strip()] if ":" in line else [] @@ - if decision["action"] not in VALID_ACTIONS: + if decision["action"] not in VALID_ACTIONS: logger.warning(f"Final validation failed for action '{decision['action']}', defaulting to 'complete'") decision["action"] = "complete"Also applies to: 120-123
227-238
: Strengthen state validation (types) to fail fast on malformed stateMinimal checks are good, but validating types prevents subtle downstream errors.
Apply this diff:
try: if not state: return False if not getattr(state, 'session_id', None): logger.error("Invalid state: missing session_id") return False - if not hasattr(state, 'context'): + if not hasattr(state, 'context'): logger.error("Invalid state: missing context") return False + if not isinstance(state.context, dict): + logger.error("Invalid state: context must be a dict") + return False + if hasattr(state, 'messages') and state.messages is not None and not isinstance(state.messages, list): + logger.error("Invalid state: messages must be a list") + return False return True
57-66
: Consider not storing raw supervisor_thinking unless in debug modeStoring full LLM internal thoughts can leak sensitive content and inflate state size. Keep decision, but gate raw "supervisor_thinking" behind a debug flag or truncate.
If you want to keep it, consider:
- Truncating to N chars (e.g., 1k).
- Only storing when an env var like DEVREL_DEBUG is true.
Also applies to: 244-273
73-128
: Add focused unit tests for decision parsing, iteration bounds, and tool result mergingThese flows are critical to routing correctness and robustness.
Proposed test coverage:
- _parse_supervisor_decision handles: empty response, mixed-case tags, extra spaces, invalid action fallback, multi-line THINK/REASON accumulation.
- react_supervisor_node iteration cap triggers completion when iteration_count >= MAX_ITERATIONS.
- supervisor_decision_router respects the same bound (>=) and defaults to "complete" on invalid actions.
- add_tool_result trims history to 20, uses UTC timestamps, and deduplicates tools_used.
I can draft pytest tests if helpful. Would you like me to open a new issue or add a test file in this PR?
Also applies to: 130-151, 153-189
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/app/agents/devrel/agent.py
(1 hunks)backend/app/agents/devrel/nodes/handlers/faq.py
(1 hunks)backend/app/agents/devrel/nodes/react_supervisor.py
(1 hunks)backend/app/agents/devrel/tool_wrappers.py
(1 hunks)backend/app/agents/devrel/tools/faq_tool.py
(1 hunks)env.example
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
backend/app/agents/devrel/agent.py (2)
backend/app/agents/devrel/tool_wrappers.py (1)
faq_handler_tool_node
(18-24)backend/app/services/embedding_service/service.py (1)
llm
(55-68)
backend/app/agents/devrel/nodes/react_supervisor.py (2)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/agents/devrel/nodes/generate_response.py (1)
_get_latest_message
(97-101)
backend/app/agents/devrel/nodes/handlers/faq.py (4)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/services/embedding_service/service.py (1)
llm
(55-68)backend/app/agents/devrel/tools/faq_tool.py (1)
search
(35-37)backend/app/agents/base_agent.py (1)
run
(22-54)
backend/app/agents/devrel/tool_wrappers.py (5)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/services/embedding_service/service.py (1)
llm
(55-68)backend/app/agents/devrel/nodes/handlers/faq.py (1)
handle_faq_node
(27-59)backend/app/agents/devrel/nodes/react_supervisor.py (1)
add_tool_result
(153-192)backend/app/agents/devrel/nodes/handlers/onboarding.py (1)
handle_onboarding_node
(6-17)
🔇 Additional comments (4)
backend/app/agents/devrel/tools/faq_tool.py (1)
43-43
: Fuzzy-match simplification LGTMThe inlined intersection check is concise and equivalent.
backend/app/agents/devrel/tool_wrappers.py (1)
37-41
: Minor readability improvement LGTMThe latest_message one-liner is concise and clear.
backend/app/agents/devrel/nodes/handlers/faq.py (1)
87-106
: Ignore string‐shape normalization; search always returns a list of dicts
The currentDuckDuckGoSearchTool.search
implementation (and any other active tool) always returns aList[Dict[str, Any]]
(or an empty list on error). There is no code path that yields a raw string or single dict, so the added normalization isn’t needed.Likely an incorrect or invalid review comment.
backend/app/agents/devrel/nodes/react_supervisor.py (1)
130-147
: Action Name Consistency ConfirmedAll routing action strings are used uniformly across the codebase, so there are no dead routes:
- VALID_ACTIONS in backend/app/agents/devrel/nodes/react_supervisor.py
- Workflow mapping in backend/app/agents/devrel/agent.py (lines 66–69)
- Tool wrappers in backend/app/agents/devrel/tool_wrappers.py
- Prompt definitions in backend/app/agents/devrel/prompts/react_prompt.py
No changes required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
env.example (1)
31-35
: Broken interpolation: ORG_ vars are not defined; OFFICIAL_HANDLE_ will expand to empty/literal strings**Define ORG_NAME and ORG_WEBSITE/ORG_GITHUB/ORG_TWITTER before referencing them. Otherwise, the FAQ handler may generate queries like site:${ORG_WEBSITE} or no site filters at all.
Apply this diff:
# Agent Configuration (optional) DEVREL_AGENT_MODEL=gemini-2.5-flash GITHUB_AGENT_MODEL=gemini-2.5-flash CLASSIFICATION_AGENT_MODEL=gemini-2.0-flash AGENT_TIMEOUT=30 -MAX_RETRIES=3 +MAX_RETRIES=3 -# Explicit handles for faq.py +# Organization identity and official sources (consumed by Settings and FAQ handler) +ORG_NAME=AOSSIE +ORG_WEBSITE=https://aossie.org +ORG_GITHUB=https://github.com/AOSSIE-Org +ORG_TWITTER=https://twitter.com/aossie_org + +# Explicit handles for faq.py (map to ORG_* above) OFFICIAL_HANDLE_1=${ORG_WEBSITE} OFFICIAL_HANDLE_2=${ORG_GITHUB} OFFICIAL_HANDLE_3=${ORG_TWITTER}backend/app/agents/devrel/nodes/handlers/faq.py (1)
13-16
: Good: removed import-time logging.basicConfigAvoiding global logging configuration in a library module is the right call. Let the application configure logging.
backend/app/agents/devrel/nodes/react_supervisor.py (1)
61-66
: Good: guard len() against None latest_messageUsing len(latest_message or "") avoids TypeError.
🧹 Nitpick comments (5)
env.example (1)
29-29
: Nit: trailing space after MAX_RETRIESTrim the trailing space to avoid accidental parsing issues.
-MAX_RETRIES=3 +MAX_RETRIES=3backend/app/agents/devrel/nodes/handlers/faq.py (3)
53-69
: Site filters: robust domain extraction; consider native include_domains when availableDomain parsing with urlparse is solid. Two improvement suggestions:
- Prefer passing include_domains (or equivalent) to the search tool if supported, falling back to query-time site: tokens only when necessary.
- For broad engine compatibility, concatenating site filters with whitespace is usually sufficient; the literal AND inside the query can be brittle depending on the backend parser.
If you want, I can update the wrapper/tool to optionally accept include_domains and branch accordingly.
108-112
: Query composition: avoid brittle "AND" in the free-text stringSome providers do not honor boolean operators in the query string. Safer composition:
- if site_filters: - refined_query = f"({refined_query}) AND ({site_filters})" + if site_filters: + refined_query = f"{refined_query} {site_filters}"
241-264
: Keep test harness out of production modulesThe main AsyncMock harness is handy, but consider moving this to a proper test under tests/ with pytest-asyncio to keep modules clean.
backend/app/agents/devrel/nodes/react_supervisor.py (1)
186-191
: Prefer timezone-aware timestampsUse UTC ISO timestamps to avoid ambiguity.
-from datetime import datetime +from datetime import datetime, timezone @@ - "timestamp": datetime.now().isoformat() + "timestamp": datetime.now(timezone.utc).isoformat()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/agents/devrel/agent.py
(1 hunks)backend/app/agents/devrel/nodes/handlers/faq.py
(1 hunks)backend/app/agents/devrel/nodes/react_supervisor.py
(1 hunks)backend/app/agents/devrel/tool_wrappers.py
(1 hunks)backend/app/agents/devrel/tools/faq_tool.py
(1 hunks)backend/app/core/config/settings.py
(1 hunks)env.example
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/agents/devrel/tools/faq_tool.py
- backend/app/agents/devrel/tool_wrappers.py
- backend/app/agents/devrel/agent.py
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/app/agents/devrel/nodes/react_supervisor.py
180-180: Local variable tool_results_str
is assigned to but never used
Remove assignment to unused variable tool_results_str
(F841)
🔇 Additional comments (5)
backend/app/agents/devrel/nodes/handlers/faq.py (2)
41-52
: Nice early-exit guard for empty messageGood defensive check and graceful fallback without invoking the LLM or search.
216-221
: Sanitize titles/URLs downstream to prevent UI XSSSince titles/urls come from the web, ensure the renderer escapes markdown/html. If you control the UI, sanitize before rendering.
backend/app/agents/devrel/nodes/react_supervisor.py (3)
33-37
: Iteration cap guard in node looks goodNode prevents generating decisions once the cap is hit. Keep as-is.
38-47
: Good: safely serializing tool_results for prompt constructionUsing default=str and fallback prevents crashes from non-serializable entries.
Also applies to: 54-55
199-208
: Actionable Directive: Verify State Merger’s Handling oftools_used
To confirm whether top‐level
tools_used
is merged intoAgentState.tools_used
or ignored, please locate and inspect the code that applies a node’s return-dict to theAgentState
. For example, search for whereoutput = node.run(...)
is consumed and how it updatesstate.context
andstate.tools_used
.Suggested search commands:
# Find where node outputs are merged into AgentState rg -n "state.context" -n backend/app/agents rg -n "tools_used" -n backend/app/agents rg -n "AgentState" -n backend/app/agentsOnce you’ve identified whether
tools_used
must be returned at top level or embedded incontext
, updatereact_supervisor.py
accordingly:If the merger only looks at
context
:- tools_used = getattr(state, 'tools_used', []) + [tool_name] - - return { - "context": { - **state.context, - "tool_results": tool_results - }, - "tools_used": tools_used, - "current_task": f"completed_{tool_name}" - } + tools_used = state.context.get("tools_used", []) + [tool_name] + + return { + "context": { + **state.context, + "tool_results": tool_results, + "tools_used": tools_used + }, + "current_task": f"completed_{tool_name}" + }If the merger respects top-level
tools_used
, no change is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/app/agents/devrel/nodes/handlers/faq.py (3)
5-5
: Centralized org settings usage looks goodSwitching to app_settings and dropping ad-hoc env parsing aligns config to a single source of truth. Good cleanup.
Also applies to: 12-15
8-9
: Good: no global logging configuration at import timeUsing a module logger without basicConfig avoids clobbering app-level logging config.
30-40
: Early exit on empty user messageSolid guardrail to avoid unnecessary LLM/search calls and return a helpful fallback.
🧹 Nitpick comments (6)
backend/app/agents/devrel/nodes/handlers/faq.py (6)
42-57
: Tighten site scoping for GitHub/Twitter using inurl to reduce noiseCurrent site filters restrict by domain only (e.g., github.com), which can be very broad. Optionally add inurl constraints for handles where a subpath is known (GitHub org, Twitter handle) to bias results toward the org space while still staying provider-agnostic.
Apply this diff:
- from urllib.parse import urlparse - domains = [] + from urllib.parse import urlparse + filters = set() for u in OFFICIAL_HANDLES: try: parsed = urlparse(u) - domain = parsed.netloc or parsed.path # handles bare domains - if domain: - domains.append(domain) + domain = parsed.netloc or parsed.path # handles bare domains + if not domain: + continue + filters.add(f"site:{domain}") + path = (parsed.path or "").strip("/") + # Add path scoping for common hubs to reduce noise + if path and (domain.endswith("github.com") or domain.endswith("twitter.com")): + filters.add(f"site:{domain} inurl:{path}") except Exception: continue - site_filters = " OR ".join([f"site:{d}" for d in domains]) + site_filters = " OR ".join(filters)
192-195
: Reduce verbosity and potential leakage in logsLog synthesized content preview at debug, not info.
Apply this diff:
- logger.info(f"[FAQ_HANDLER] Synthesized FAQ response: {synthesized_answer[:100]}...") + logger.debug(f"[FAQ_HANDLER] Synthesized FAQ response: {synthesized_answer[:100]}...")
67-74
: Consider returning structured sources alongside the formatted answerFor richer UIs, expose sources as structured metadata (e.g., array of {title, url}) in task_result rather than only embedding Markdown links in the string. This enables downstream rendering and analytics without parsing.
Follow-up: have _dynamic_faq_process return both text and the top-N sources, or compute sources in handle_faq_node before returning task_result.
17-21
: Optional: add lightweight Protocols for search_tool and llmHelps static checking and refactoring safety without tying to specific implementations.
Add near the imports:
from typing import Protocol, runtime_checkable, Sequence from langchain_core.messages import BaseMessage @runtime_checkable class SupportsSearch(Protocol): async def search(self, query: str) -> List[Dict[str, Any]]: ... @runtime_checkable class SupportsChatInvoke(Protocol): async def ainvoke(self, messages: Sequence[BaseMessage]) -> Any: ...Then annotate:
async def handle_faq_node(state: AgentState, search_tool: SupportsSearch, llm: SupportsChatInvoke) -> dict: ... async def _dynamic_faq_process(..., search_tool: SupportsSearch, llm: SupportsChatInvoke, ...): ...
229-253
: Move module-level test harness to proper testsKeep modules import-safe. Relocate main harness to tests or a sample notebook/script.
Apply this diff to remove it here:
-# Example usage for testing -if __name__ == "__main__": - import asyncio - from unittest.mock import AsyncMock - - class MockState: - session_id = "test_session" - messages = [{"content": "What projects does your organization have?"}] - context = {} - - async def test_faq_handler(): - mock_state = MockState() - mock_search_tool = AsyncMock() - mock_search_tool.search.return_value = [ - {"title": "Project A", "content": "Details about Project A.", "url": "https://aossie.org/projects/a"}, - {"title": "Project B", "content": "Details about Project B.", "url": "https://aossie.org/projects/b"}, - ] - mock_llm = AsyncMock() - mock_llm.ainvoke.return_value = AsyncMock(content="We have Project A and Project B focusing on AI and Web.") - - response = await handle_faq_node(mock_state, mock_search_tool, mock_llm) - print("FAQ Handler response:") - print(response) - - asyncio.run(test_faq_handler())
97-101
: Make query composition provider-agnostic by using whitespace instead of explicit ANDUpdate the FAQ handler’s query concatenation to avoid hard-coding
AND
, relying on implicit AND (whitespace) which is supported more broadly:--- a/backend/app/agents/devrel/nodes/handlers/faq.py +++ b/backend/app/agents/devrel/nodes/handlers/faq.py @@ -98,7 +98,7 @@ async def _dynamic_faq_process( - if site_filters: - refined_query = f"({refined_query}) AND ({site_filters})" + if site_filters: + refined_query = f"{refined_query} {site_filters}" logger.info(f"[FAQ_HANDLER] Refined and filtered query: {refined_query}")This change removes the explicit
AND
, allowing providers that treat whitespace as an implicit AND to handle the query correctly. Please verify that each search tool implementation you use (e.g. DuckDuckGoSearchTool, TavilySearchTool) supports boolean operators and grouping as expected.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/agents/devrel/nodes/handlers/faq.py
(1 hunks)backend/app/agents/devrel/nodes/react_supervisor.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/agents/devrel/nodes/react_supervisor.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/agents/devrel/nodes/handlers/faq.py (3)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/services/embedding_service/service.py (1)
llm
(55-68)backend/app/agents/base_agent.py (1)
run
(22-54)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
backend/app/core/config/settings.py (1)
55-58
: Pydantic v2: class Config is ignored; use SettingsConfigDictIn pydantic v2/pydantic-settings v2, the nested Config class is no-op. Switch to model_config so env_file/extra/case sensitivity actually take effect.
Apply this diff:
+from pydantic_settings import SettingsConfigDict @@ - class Config: - env_file = ".env" - extra = "ignore" # to prevent errors from extra env variables + # pydantic-settings v2 configuration + model_config = SettingsConfigDict( + env_file=".env", + extra="ignore", + case_sensitive=False, # allow ORG_NAME or org_name + )
🧹 Nitpick comments (1)
backend/app/core/config/settings.py (1)
42-42
: Nit: remove emoji from code comments for consistencyAvoid emojis in source comments to keep style consistent with the rest of the file.
Apply this diff:
- # ✅ New org info fields (pulled from .env automatically) + # Organization info fields (populated from environment)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/core/config/settings.py
(1 hunks)
🔇 Additional comments (1)
backend/app/core/config/settings.py (1)
42-47
: LGTM: centralizing org identity in SettingsAdding org_* fields here aligns with the goal to make branding/handles a single source of truth for the dynamic FAQ flow. This addresses prior feedback about avoiding separate env parsing in faq.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
backend/app/agents/devrel/nodes/handlers/faq.py (1)
42-58
: Normalize handles to bare domains for reliable site: filteringAs written, Line 49 may capture path segments (e.g., "github.com/AOSSIE-Org") when OFFICIAL_HANDLES contain bare values without a scheme, which would lead to invalid site: filters. Strip path segments and "www." to ensure domains only.
Apply this diff within the loop:
- domain = parsed.netloc or parsed.path # handles bare domains - if domain: - domains.append(domain) + domain = parsed.netloc or parsed.path # handles bare domains + if domain: + # Keep only host, drop any path and "www." prefix + domain = domain.split("/")[0] + if domain.startswith("www."): + domain = domain[4:] + if domain: + domains.append(domain)Optionally dedupe before building filters:
- site_filters = " OR ".join([f"site:{d}" for d in domains]) + domains = list(dict.fromkeys(domains)) # preserve order, remove dups + site_filters = " OR ".join([f"site:{d}" for d in domains])
🧹 Nitpick comments (4)
backend/app/agents/devrel/nodes/handlers/faq.py (4)
24-28
: Make sure you’re using the latest human/user message, not potentially the last assistant messageIf state.messages ends with an assistant reply, Line 26 may pick up the assistant’s content. Prefer selecting the most recent human/user message.
If your messages include a role/type, consider:
- if state.messages: - latest_message = state.messages[-1].get("content", "") + if state.messages: + for msg in reversed(state.messages): + role = (msg.get("role") or msg.get("type") or "user").lower() + if role in ("user", "human"): + latest_message = msg.get("content", "") + breakPlease confirm whether your message dicts include role/type so we can finalize the patch accordingly.
200-213
: Channel-aware formatting for sourcesThe Markdown/emoji formatting on Lines 207-212 is nice but may render poorly on some channels. Consider a settings toggle (e.g., faq_markdown_enabled) or falling back to plain text when markdown is unsupported.
For a plain-text fallback:
- formatted_response += "\n\n**📚 Sources:**" + formatted_response += "\n\nSources:" - formatted_response += f"\n{i+1}. [{title}]({url})" + formatted_response += f"\n{i+1}. {title} - {url}"
231-255
: Move the main test harness to a proper test to keep production modules cleanKeeping executable harnesses in production modules is unconventional and easy to forget. Move this to tests/ with AsyncMock-based unit tests.
I can port this into pytest/asyncio tests that cover:
- Empty-message fallback
- Successful dynamic flow (query refinement, search, synthesis)
- Search error path
- No-results path
- Source formatting cap (top 3)
Want me to push a test module?
17-21
: Stronger typing via Protocols for search_tool and llmUsing Any everywhere weakens static guarantees. Introduce minimal Protocols to document the expected async interface.
Add (outside the selected range):
from typing import Protocol, runtime_checkable @runtime_checkable class AsyncSearchTool(Protocol): async def search(self, query: str) -> List[Dict[str, Any]]: ... @runtime_checkable class AsyncLLM(Protocol): async def ainvoke(self, messages: List[HumanMessage]) -> Any: ...Then:
-async def handle_faq_node(state: AgentState, search_tool: Any, llm: Any) -> dict: +async def handle_faq_node(state: AgentState, search_tool: AsyncSearchTool, llm: AsyncLLM) -> dict:This keeps flexibility while improving correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/agents/devrel/nodes/handlers/faq.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/agents/devrel/nodes/handlers/faq.py (3)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/services/embedding_service/service.py (1)
llm
(55-68)backend/app/agents/base_agent.py (1)
run
(22-54)
🔇 Additional comments (4)
backend/app/agents/devrel/nodes/handlers/faq.py (4)
8-10
: Good: Logger configured without global basicConfigModule uses a local logger and avoids configuring global logging at import-time. This aligns with best practices and prior review feedback.
12-15
: Good: Centralized org config via app settingsUsing app_settings for ORG_NAME and OFFICIAL_HANDLES prevents env drift and consolidates configuration.
30-41
: Good: Early exit on empty message with graceful fallbackShort-circuiting avoids unnecessary tool/LLM invocations and returns a structured fallback.
97-101
: Verify search_tool’s Boolean operator supportPlease confirm whether the
search_tool.search
implementation reliably handles explicit Boolean operators (e.g., “AND”). Many web-search back ends ignore or misinterpret these operators, which can lead to unexpected results. If it does not:
- In
backend/app/agents/devrel/nodes/handlers/faq.py
, inside_dynamic_faq_process
(lines ~97–101), replacewith a more search-friendly concatenation, for example:if site_filters: refined_query = f"({refined_query}) AND ({site_filters})"if site_filters: refined_query = f"{refined_query} {site_filters}"- Better yet, if your search tool exposes a structured API (e.g.,
search_tool.search(query, site_filters=domains)
), switch to that instead of in-query filtering.
logger.info(f"[FAQ_HANDLER] Step 1: Refining FAQ query for org '{org_name}'") | ||
refined_query = await _refine_faq_query(message, llm, org_name) | ||
|
||
# Append site filters for restricting to official handles | ||
if site_filters: | ||
refined_query = f"({refined_query}) AND ({site_filters})" | ||
logger.info(f"[FAQ_HANDLER] Refined and filtered query: {refined_query}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce user-content logging to DEBUG and truncate to lower PII exposure and noise
Refined queries and synthesized responses may contain user-provided content. Logging them at INFO can leak PII and increase noise.
Apply:
- logger.info(f"[FAQ_HANDLER] Step 1: Refining FAQ query for org '{org_name}'")
+ logger.debug(f"[FAQ_HANDLER] Step 1: Refining FAQ query for org '{org_name}'")
@@
- logger.info(f"[FAQ_HANDLER] Refined and filtered query: {refined_query}")
+ logger.debug(f"[FAQ_HANDLER] Refined and filtered query: {refined_query[:200]}{'...' if len(refined_query) > 200 else ''}")
@@
- logger.info(f"[FAQ_HANDLER] Synthesized FAQ response: {synthesized_answer[:100]}...")
+ logger.debug(f"[FAQ_HANDLER] Synthesized FAQ response: {synthesized_answer[:100]}...")
If you need INFO-level observability, consider redacting with a sanitizer (emails, phone numbers) before logging.
Also applies to: 196-196
logger.info(f"[FAQ_HANDLER] Step 2: Searching for: {refined_query}") | ||
try: | ||
search_results = await search_tool.search(refined_query) | ||
except Exception as search_err: | ||
logger.error(f"[FAQ_HANDLER] Search tool error: {search_err}") | ||
return _generate_fallback_response(message, org_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add timeouts around external search calls to prevent hangs
Wrap the search call with asyncio.wait_for using configurable timeouts; otherwise a network stall can block the node.
Apply:
- try:
- search_results = await search_tool.search(refined_query)
+ try:
+ search_results = await asyncio.wait_for(
+ search_tool.search(refined_query),
+ timeout=FAQ_SEARCH_TIMEOUT
+ )
Add these near the top of the module (outside the selected range):
import asyncio # at the imports
# Timeouts with safe defaults; override via settings if available
FAQ_SEARCH_TIMEOUT = getattr(app_settings, "faq_search_timeout", 10.0)
FAQ_LLM_TIMEOUT = getattr(app_settings, "faq_llm_timeout", 15.0)
🤖 Prompt for AI Agents
In backend/app/agents/devrel/nodes/handlers/faq.py around lines 103 to 108, the
external search call can hang; add an asyncio timeout: import asyncio at the top
and define FAQ_SEARCH_TIMEOUT = getattr(app_settings, "faq_search_timeout",
10.0) (and FAQ_LLM_TIMEOUT similarly if present), then replace the direct await
search_tool.search(refined_query) with await
asyncio.wait_for(search_tool.search(refined_query), FAQ_SEARCH_TIMEOUT); catch
asyncio.TimeoutError (and still catch general Exception) and log a
timeout-specific error before returning _generate_fallback_response(message,
org_name).
fe0cfe8
to
de1246e
Compare
… empty message fallback
…r) with env var aliases
de1246e
to
b05f020
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/app/agents/devrel/nodes/react_supervisor.py (2)
34-37
: Fix iteration cap check to allow exactly MAX_ITERATIONSThe current check prevents routing the action decided on the MAX-th iteration. Since the node increments
iteration_count
after decision-making, you should use>
instead of>=
to allow exactly MAX_ITERATIONS to proceed.- if iteration_count >= MAX_ITERATIONS: + if iteration_count > MAX_ITERATIONS: logger.warning(f"Max iterations ({MAX_ITERATIONS}) reached for session {state.session_id}") return _create_completion_response(state, "Maximum iterations reached")
147-149
: Align iteration boundary check with node logicThis router check should match the node's iteration check. Since you're incrementing before routing, use
>
to allow the MAX-th iteration's action to be routed.- if iteration_count > MAX_ITERATIONS: + if iteration_count >= MAX_ITERATIONS: logger.warning(f"Max iterations reached for session {state.session_id}") return "complete"Wait, I notice the current code already uses
>
which is actually correct. The node should be changed to match this, not the other way around.backend/app/agents/devrel/nodes/handlers/faq.py (1)
103-108
: Add timeout for search operationsExternal search calls could hang indefinitely. Consider adding a timeout to prevent blocking.
Add at the top of the file:
import asyncio # Configurable timeouts FAQ_SEARCH_TIMEOUT = getattr(app_settings, "faq_search_timeout", 10.0) FAQ_LLM_TIMEOUT = getattr(app_settings, "faq_llm_timeout", 15.0)Then wrap the search call:
try: - search_results = await search_tool.search(refined_query) + search_results = await asyncio.wait_for( + search_tool.search(refined_query), + timeout=FAQ_SEARCH_TIMEOUT + ) + except asyncio.TimeoutError: + logger.error(f"[FAQ_HANDLER] Search timeout after {FAQ_SEARCH_TIMEOUT}s") + return _generate_fallback_response(message, org_name) except Exception as search_err:
🧹 Nitpick comments (2)
backend/app/agents/devrel/nodes/handlers/faq.py (2)
94-100
: Consider adding DEBUG level for detailed query loggingThe refined query may contain user-provided content. Consider using DEBUG level and truncating for privacy.
- logger.info(f"[FAQ_HANDLER] Step 1: Refining FAQ query for org '{org_name}'") + logger.debug(f"[FAQ_HANDLER] Step 1: Refining FAQ query for org '{org_name}'") # ... - logger.info(f"[FAQ_HANDLER] Refined and filtered query: {refined_query}") + logger.debug(f"[FAQ_HANDLER] Refined and filtered query: {refined_query[:200]}{'...' if len(refined_query) > 200 else ''}")
232-255
: Consider moving test code to a separate test fileThe test harness in
__main__
is useful for development but should ideally be in a separate test file for better organization.Consider moving this to a file like
tests/test_faq_handler.py
to keep the production code clean and maintain separation of concerns.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/agents/devrel/agent.py
(1 hunks)backend/app/agents/devrel/nodes/handlers/faq.py
(1 hunks)backend/app/agents/devrel/nodes/react_supervisor.py
(1 hunks)backend/app/agents/devrel/tool_wrappers.py
(1 hunks)backend/app/agents/devrel/tools/faq_tool.py
(1 hunks)backend/app/core/config/settings.py
(2 hunks)env.example
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- backend/app/agents/devrel/agent.py
- backend/app/agents/devrel/tools/faq_tool.py
- env.example
- backend/app/core/config/settings.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-08T13:31:11.572Z
Learnt from: smokeyScraper
PR: AOSSIE-Org/Devr.AI#72
File: backend/app/agents/devrel/nodes/handle_web_search_node.py:31-42
Timestamp: 2025-06-08T13:31:11.572Z
Learning: In backend/app/agents/devrel/tools/search_tool.py, the TavilySearchTool.search() method has partial error handling for missing API key, AttributeError, ConnectionError, and TimeoutError, but lacks a comprehensive Exception catch-all block, so calling functions may still need additional error handling for other potential exceptions.
Applied to files:
backend/app/agents/devrel/nodes/handlers/faq.py
🧬 Code Graph Analysis (3)
backend/app/agents/devrel/nodes/handlers/faq.py (3)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/services/embedding_service/service.py (1)
llm
(55-68)backend/app/agents/base_agent.py (1)
run
(22-54)
backend/app/agents/devrel/nodes/react_supervisor.py (2)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/agents/devrel/nodes/generate_response.py (1)
_get_latest_message
(97-101)
backend/app/agents/devrel/tool_wrappers.py (5)
backend/app/agents/state.py (1)
AgentState
(18-73)backend/app/services/embedding_service/service.py (1)
llm
(55-68)backend/app/agents/devrel/nodes/handlers/faq.py (1)
handle_faq_node
(17-74)backend/app/agents/devrel/nodes/react_supervisor.py (1)
add_tool_result
(163-202)backend/app/agents/devrel/nodes/handlers/onboarding.py (1)
handle_onboarding_node
(6-17)
🔇 Additional comments (12)
backend/app/agents/devrel/nodes/react_supervisor.py (4)
62-62
: Good defensive programming for None handlingThe use of
len(latest_message or '')
properly guards againstNone
values that could cause a TypeError.
205-212
: Good error handling wrapper for utility functionThe wrapper properly catches exceptions from the imported utility function and provides a safe fallback, preventing crashes in the supervisor node.
237-252
: Comprehensive state validationThe validation function properly checks for essential state attributes with detailed error logging, which will help with debugging issues in production.
38-46
: Remove unused variabletool_results_str
The variable
tool_results_str
is computed but never used in the prompt construction. The prompt uses the same serialization logic directly on line 54.- # Safely serialize tool_results for prompt usage - try: - if tool_results: - tool_results_str = json.dumps(tool_results, indent=2, default=str) - else: - tool_results_str = "No previous tool results" - except Exception as e: - logger.warning(f"Failed to serialize tool_results: {e}") - tool_results_str = str(tool_results) if tool_results else "No previous tool results"Then update line 54 to safely serialize:
- tool_results=tool_results_str + tool_results=json.dumps(tool_results, indent=2, default=str) if tool_results else "No previous tool results"Likely an incorrect or invalid review comment.
backend/app/agents/devrel/tool_wrappers.py (3)
18-24
: LGTM! Wrapper signature properly aligned with FAQ handlerThe function signature correctly accepts
search_tool
andllm
parameters and forwards them to the FAQ handler, which aligns with the dynamic FAQ implementation requirements.
37-41
: Good refactoring for message extractionThe conditional expression cleanly handles both scenarios (messages present vs context fallback) in a readable way.
43-53
: Excellent error handling for GitHub toolkitThe try-catch block properly captures toolkit errors and returns a structured error response that can be processed downstream, preventing crashes while maintaining observability.
backend/app/agents/devrel/nodes/handlers/faq.py (5)
13-14
: Good use of centralized settingsUsing
app_settings
for organization configuration ensures consistency across the application and avoids duplicate environment variable parsing.
30-41
: Smart early exit for empty messagesThe early return prevents unnecessary LLM and search API calls when there's no user input to process, improving efficiency.
43-57
: Well-implemented domain extraction for site filtersThe URL parsing logic properly extracts domains from URLs and handles both full URLs and bare domains, with good error handling for malformed URLs.
169-171
: Good content truncation for token managementLimiting content to 500 characters prevents token explosion in LLM prompts while maintaining useful context.
216-228
: Excellent user-friendly fallback messageThe fallback response is informative, explains potential reasons for failure, and provides actionable next steps for users.
response = await llm.ainvoke([HumanMessage(content=synthesis_prompt)]) | ||
synthesized_answer = response.content.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Apply timeout to LLM synthesis calls
Similar to search operations, LLM calls should have timeouts to prevent hanging.
- response = await llm.ainvoke([HumanMessage(content=synthesis_prompt)])
+ response = await asyncio.wait_for(
+ llm.ainvoke([HumanMessage(content=synthesis_prompt)]),
+ timeout=FAQ_LLM_TIMEOUT
+ )
Add exception handling:
except asyncio.TimeoutError:
logger.error(f"[FAQ_HANDLER] LLM timeout after {FAQ_LLM_TIMEOUT}s")
return _generate_fallback_response(message, org_name)
🤖 Prompt for AI Agents
In backend/app/agents/devrel/nodes/handlers/faq.py around lines 194-195, wrap
the LLM synthesis call in asyncio.wait_for using the FAQ_LLM_TIMEOUT constant
and add exception handling: put the await llm.ainvoke(...) inside a try block
with response = await asyncio.wait_for(llm.ainvoke(...), FAQ_LLM_TIMEOUT), catch
asyncio.TimeoutError, log the timeout with logger.error(f"[FAQ_HANDLER] LLM
timeout after {FAQ_LLM_TIMEOUT}s") and return
_generate_fallback_response(message, org_name); ensure asyncio is imported if
not already.
🌟 Feature Implementation: Dynamic FAQ Handler
Problem Solved
Key Features Implemented
✅ Smart Query Routing: ReAct Supervisor intelligently identifies organizational queries
✅ Dynamic Web Search: Real-time information retrieval from organization's public presence
✅ Automated Synthesis: Converts search results into comprehensive, user-friendly responses
✅ Source Attribution: Provides links for users to explore topics further
Testing Results
Successfully tested with query: "How can i contribute to AOSSIE Projects?"
Technical Implementation
Ready for review and production deployment.
Closes #97
bandicam.2025-08-10.19-47-44-389.mp4
Summary by CodeRabbit
New Features
Enhancements
Configuration