Skip to content

Conversation

@T-rav
Copy link
Collaborator

@T-rav T-rav commented Oct 10, 2025

Summary

  • Increase MAX_CONTENT_LENGTH from 8,000 to 50,000 characters (default)
  • Make MAX_CONTENT_LENGTH configurable via .env file
  • Remove unused dependencies (beautifulsoup4, readability-lxml, html2text)
  • Trafilatura now handles all content extraction with built-in markdown output
  • Update READMEs with MAX_CONTENT_LENGTH documentation
  • Clean up mypy overrides to reflect current dependencies

Rationale

The previous 8,000 character limit was too aggressive and cut off article content prematurely. With the new 50,000 character default:

  • AI agents receive fuller article content
  • Agents can decide whether to compress/summarize based on their needs
  • Users can adjust the limit via environment variable for their specific use cases

Changes

Code Changes

  • docker/constants.py: Now reads MAX_CONTENT_LENGTH from environment with 50,000 default
  • pyproject.toml: Removed unused dependencies (BeautifulSoup4, readability-lxml, html2text, lxml_html_clean)
  • pyproject.toml: Updated mypy overrides to reflect current dependencies

Configuration

  • docker/.env.example: Added MAX_CONTENT_LENGTH=50000 with documentation
  • README.md: Added MAX_CONTENT_LENGTH to environment variables table
  • docker/README.md: Added MAX_CONTENT_LENGTH to environment variables section

Testing

Tested with teslarati.com article that was previously truncated:

  • Before: ~8,000 characters (truncated)
  • After: Full ~16,000 character article content extracted

All code formatted and linted successfully:

  • ✅ Black formatting
  • ✅ isort import sorting
  • ✅ flake8 linting
  • ✅ Pre-commit hooks

Test plan

  • Run make format - All files formatted correctly
  • Run make lint - No linting errors
  • Test article extraction with long articles
  • Run full test suite: make test
  • Verify Docker build works with new dependencies
  • Test with custom MAX_CONTENT_LENGTH values via .env

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a configurable deep research tool that runs multi-step, multi-source research with selectable effort levels and returns synthesized reports with source links.
  • Configuration

    • Introduced PERPLEXITY_API_KEY (optional) and MAX_CONTENT_LENGTH (default 1,000,000) environment variables; consolidated API keys guidance.
  • Documentation

    • Updated README and Docker docs for Perplexity setup, endpoints, and simplified setup notes.
  • Tests

    • Added unit tests, test utilities, mock factories, and runnable test/debug scripts for the deep research workflow.
  • Chores

    • Updated project dependencies and type-checker config to support Perplexity integration.

…rable

- Increase MAX_CONTENT_LENGTH from 8,000 to 50,000 characters (default)
- Make MAX_CONTENT_LENGTH configurable via .env file
- Remove unused dependencies (beautifulsoup4, readability-lxml, html2text)
- Trafilatura now handles all content extraction with built-in markdown output
- Update READMEs with MAX_CONTENT_LENGTH documentation
- Clean up mypy overrides to reflect current dependencies

This allows AI agents to receive fuller article content and decide
whether to compress it, rather than pre-truncating at 8K chars.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Oct 10, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds Perplexity integration: new PERPLEXITY_API_KEY and MAX_CONTENT_LENGTH env vars, Perplexity SDK dependency, a Perplexity client, a deep_research MCP tool and registration, test/debug scripts and unit tests with test factories, docs/.env updates, and constants driving content truncation.

Changes

Cohort / File(s) Summary
Documentation & READMEs
README.md, docker/README.md
Documented PERPLEXITY_API_KEY and MAX_CONTENT_LENGTH; unified "Get API Keys" section for Serper and Perplexity; added deep_research tool description and reordered endpoints; updated setup notes.
Environment example
docker/.env.example
Added PERPLEXITY_API_KEY (optional) and MAX_CONTENT_LENGTH=1000000 with explanatory comments.
Runtime constants
docker/constants.py
Replaced fixed constant with environment-driven parse: MAX_CONTENT_LENGTH from env (default 1000000) with ValueError fallback; added os import.
Project config / deps
pyproject.toml
Added perplexityai>=0.15.0; removed HTML-parsing deps (beautifulsoup4, readability-lxml, lxml_html_clean, html2text); updated mypy overrides and removed several optional dev/test/doc deps.
Perplexity client
docker/clients/perplexity_client.py
New fetch_perplexity_deep_research(query, api_key, max_results=5, research_effort="high") using Perplexity SDK (model sonar-deep-research), returns (research_report: str, citations: List[str]) with logging and error handling.
Deep research MCP tool
docker/tools/deep_research_tool.py
New deep_research_tool(query, research_effort="high", max_results=5) that validates API key, calls the client, formats report + Sources into a SearchResponse dict and returns model_dump().
MCP server registration
docker/mcp_server.py
Loads PERPLEXITY_API_KEY, logs its status, imports and registers deep_research MCP tool via the mcp_server.tool decorator.
Local test & debug scripts
docker/test_perplexity.py, docker/test_perplexity_debug.py, docker/test_perplexity_minimal.py, docker/test_deep_research_mcp.py
Added scripts to exercise and debug Perplexity API and the MCP deep_research tool; include dotenv loading, API key guards, HTTP/SSE parsing, and diagnostic output.
Test factories & mocks
docker/tests/factories/mock_perplexity_response.py, docker/tests/factories/perplexity_response_factory.py
Added typed mock response classes and PerplexityResponseFactory helpers for successful, empty, many-citation, no-citation, and error scenarios plus mock client builder.
Unit tests
docker/tests/unit/clients/test_perplexity_client.py, docker/tests/unit/tools/test_deep_research_tool.py, docker/tests/unit/services/content_scraper/test_edge_cases.py
Added unit tests for the Perplexity client and deep_research tool covering success, empty, API error, citation limits, API-key absence, content formatting, snippet truncation; updated truncation test to use MAX_CONTENT_LENGTH.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User as User / Client
  participant MCP as mcp_server
  participant Tool as deep_research_tool
  participant Client as perplexity_client
  participant API as Perplexity API

  Note over MCP,Tool: New tool registration & invocation
  User->>MCP: tools/call deep_research(query, effort, max_results)
  MCP->>Tool: invoke deep_research_tool(query, effort, max_results)
  alt API key missing
    Tool-->>MCP: SearchResponse { error message }
    MCP-->>User: Error response
  else API key present
    Tool->>Client: fetch_perplexity_deep_research(query, api_key, max_results, effort)
    Client->>API: chat.completions.create(model: sonar-deep-research, payload)
    API-->>Client: response (choices.content, citations, usage)
    Client-->>Tool: (research_report, citation_urls)
    Tool->>MCP: formatted SearchResponse (report + Sources)
    MCP-->>User: SSE/HTTP response with SearchResponse
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I hop through docs and whisper keys,
I fetch deep lore from research seas,
I stitch the sources, tidy and bright,
I paste the excerpts, snug and light,
I bring you knowledge — carrot delight. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and succinctly describes the primary change—raising the MAX_CONTENT_LENGTH limit and making it configurable—which aligns with the pull request’s main objectives without extraneous detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6103b4 and bfdf872.

📒 Files selected for processing (1)
  • docker/tests/unit/services/content_scraper/test_edge_cases.py (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docker/constants.py (1)

27-27: Consider adding validation for the environment variable.

The current implementation will raise a ValueError if MAX_CONTENT_LENGTH is set to a non-numeric string in the environment, causing the application to fail at startup with a potentially unclear error message.

Consider one of these approaches for better robustness:

Option 1: Add a try-except block with a fallback

-MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000"))
+try:
+    MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000"))
+except ValueError:
+    MAX_CONTENT_LENGTH = 50000
+    # Optionally log a warning here

Option 2: Add explicit validation with a clear error message

-MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000"))
+_content_length_str = os.environ.get("MAX_CONTENT_LENGTH", "50000")
+try:
+    MAX_CONTENT_LENGTH = int(_content_length_str)
+except ValueError:
+    raise ValueError(
+        f"MAX_CONTENT_LENGTH must be a valid integer, got: {_content_length_str}"
+    )

Alternatively, if you prefer to keep it simple, document in the .env.example file that the value must be numeric.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccc585e and 0713d57.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • docker/.env.example (1 hunks)
  • docker/README.md (1 hunks)
  • docker/constants.py (2 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
docker/.env.example

📄 CodeRabbit inference engine (CLAUDE.md)

Update docker/.env.example when introducing new environment variables (e.g., API keys)

Files:

  • docker/.env.example
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods

Files:

  • docker/constants.py
docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All Python files in docker/ are included in coverage; increase tests instead of excluding files

Files:

  • docker/constants.py
{pyproject.toml,docker/Dockerfile}

📄 CodeRabbit inference engine (CLAUDE.md)

Use Python 3.11 exactly for the project runtime and tooling

Files:

  • pyproject.toml
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Define tool configurations in pyproject.toml for Black, isort, mypy, pytest, and coverage

Files:

  • pyproject.toml
docker/README.md

📄 CodeRabbit inference engine (CLAUDE.md)

Document new MCP tools and changes in docker/README.md

Files:

  • docker/README.md
🧠 Learnings (1)
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/.env.example : Update docker/.env.example when introducing new environment variables (e.g., API keys)

Applied to files:

  • docker/.env.example
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Security Scan
🔇 Additional comments (5)
docker/.env.example (1)

15-17: LGTM!

The new MAX_CONTENT_LENGTH environment variable is properly configured with a clear section header and a sensible default value of 50,000 characters. This aligns with the changes in docker/constants.py and the documentation updates in README.md and docker/README.md.

Based on coding guidelines

README.md (1)

112-112: LGTM!

The documentation for MAX_CONTENT_LENGTH is clear, accurate, and consistent with the implementation in docker/constants.py. The default value of 50,000 characters and the description appropriately explain the purpose of this configuration option.

pyproject.toml (1)

168-168: LGTM!

The addition of trafilatura.* to the mypy overrides is correct and necessary. This properly handles the lack of type stubs for the Trafilatura library, which is now the primary content extraction tool after removing beautifulsoup4, readability-lxml, html2text, and lxml_html_clean dependencies.

docker/constants.py (1)

8-8: LGTM!

The import of the os module is correctly placed and necessary for reading the MAX_CONTENT_LENGTH environment variable.

docker/README.md (1)

56-56: LGTM!

The documentation for MAX_CONTENT_LENGTH in the Docker README is accurate, clear, and consistent with the implementation in docker/constants.py and the main README.md. The default value of 50,000 characters is correctly stated.

T-rav and others added 2 commits October 10, 2025 13:35
Remove unused development dependencies that were slowing down CI:
- autoflake, autopep8 (redundant with Black/isort)
- pylint (using flake8 instead)
- pytest-xdist (not used for parallel tests)
- testcontainers (not used in tests)
- ipdb, ipython, jupyter (debug/REPL tools, not needed in CI)
- click (not imported anywhere)
- pip-tools, tox (not used)
- line-profiler, memory-profiler (profiling tools, not used)
- types-PyYAML (no YAML usage in codebase)

This reduces CI installation time from ~5-10 minutes to ~1-2 minutes
by eliminating heavy dependency trees (jupyter, testcontainers, etc).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add comprehensive deep research capabilities using Perplexity AI's
sonar-deep-research model as a new MCP tool.

New Features:
- New `deep_research` MCP tool for comprehensive research
- Uses Perplexity's sonar-deep-research model
- Performs dozens of searches, reads hundreds of sources
- Synthesizes findings into comprehensive reports
- Configurable research effort (low, medium, high)
- Takes 2-4 minutes vs many hours for humans

Components Added:
- clients/perplexity_client.py - Perplexity API integration
- tools/deep_research_tool.py - Deep research MCP tool
- PERPLEXITY_API_KEY environment variable configuration
- Updated documentation in README.md and docker/README.md

This provides a separate, dedicated tool for in-depth research alongside
the existing fast search tool. Users can choose between quick search
(Serper/DuckDuckGo) and comprehensive deep research (Perplexity).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
docker/mcp_server.py (1)

56-58: Drop the unused # noqa: E402 directives

Ruff flags these directives as unused (RUF100). Since importing after load_dotenv() no longer violates E402, please remove the # noqa: E402 comments so lint continues to pass.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db20af3 and c00a18f.

📒 Files selected for processing (6)
  • README.md (1 hunks)
  • docker/.env.example (2 hunks)
  • docker/README.md (2 hunks)
  • docker/clients/perplexity_client.py (1 hunks)
  • docker/mcp_server.py (2 hunks)
  • docker/tools/deep_research_tool.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker/.env.example
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods

Files:

  • docker/clients/perplexity_client.py
  • docker/tools/deep_research_tool.py
  • docker/mcp_server.py
docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All Python files in docker/ are included in coverage; increase tests instead of excluding files

Files:

  • docker/clients/perplexity_client.py
  • docker/tools/deep_research_tool.py
  • docker/mcp_server.py
docker/mcp_server.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define new MCP tools in docker/mcp_server.py using the @mcp_server.tool() decorator

Files:

  • docker/mcp_server.py
docker/README.md

📄 CodeRabbit inference engine (CLAUDE.md)

Document new MCP tools and changes in docker/README.md

Files:

  • docker/README.md
🧠 Learnings (3)
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/mcp_server.py : Define new MCP tools in docker/mcp_server.py using the mcp_server.tool() decorator

Applied to files:

  • docker/mcp_server.py
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/tests/test_mcp_server.py : Add unit tests for new MCP tools in docker/tests/test_mcp_server.py

Applied to files:

  • docker/mcp_server.py
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/README.md : Document new MCP tools and changes in docker/README.md

Applied to files:

  • docker/README.md
🧬 Code graph analysis (3)
docker/clients/perplexity_client.py (1)
docker/models/domain/api_search_result.py (1)
  • APISearchResult (11-22)
docker/tools/deep_research_tool.py (3)
docker/clients/perplexity_client.py (1)
  • fetch_perplexity_deep_research (19-127)
docker/models/responses/search_response.py (1)
  • SearchResponse (15-21)
docker/services/search_processor.py (1)
  • process_search_results (15-39)
docker/mcp_server.py (3)
docker/tools/deep_research_tool.py (1)
  • deep_research_tool (22-97)
docker/tools/health_check_tool.py (1)
  • health_check_tool (11-23)
docker/tools/search_tool.py (1)
  • search_tool (23-63)
🪛 markdownlint-cli2 (0.18.1)
docker/README.md

53-53: Bare URL used

(MD034, no-bare-urls)

README.md

108-108: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.13.3)
docker/clients/perplexity_client.py

72-72: Probable use of requests call without timeout

(S113)


116-116: Consider moving this statement to an else block

(TRY300)


119-119: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


121-121: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


123-123: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


125-125: Do not catch blind exception: Exception

(BLE001)


126-126: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


126-126: Use explicit conversion flag

Replace with conversion flag

(RUF010)

docker/mcp_server.py

56-56: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


57-57: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


58-58: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


import requests

from models.api_search_result import APISearchResult
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the APISearchResult import path

APISearchResult lives under models.domain.api_search_result; importing models.api_search_result will raise ModuleNotFoundError at runtime. Update the import to target the correct module.

-from models.api_search_result import APISearchResult
+from models.domain.api_search_result import APISearchResult
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from models.api_search_result import APISearchResult
from models.domain.api_search_result import APISearchResult
🤖 Prompt for AI Agents
In docker/clients/perplexity_client.py around line 14, the import incorrectly
references models.api_search_result which will raise ModuleNotFoundError; change
the import to point to the correct module path models.domain.api_search_result
and import APISearchResult from there (i.e., replace the current import with one
that imports APISearchResult from models.domain.api_search_result).

Comment on lines 65 to 88
api_results = fetch_perplexity_deep_research(
query=query,
api_key=PERPLEXITY_API_KEY,
max_results=max_results,
research_effort=research_effort,
)

# Check if we got any results
if not api_results:
logger.warning(f"No deep research results found for query: {query}")
response = SearchResponse(
query=query,
search_source="Perplexity Deep Research",
results=[],
error=(
"Deep research failed. This could be due to API limits, "
"invalid query, or temporary service issues."
),
)
return response.model_dump()

# Process the results (scrape and convert to markdown)
processed_results = process_search_results(api_results)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid blocking the event loop with synchronous I/O

deep_research_tool is async, but it directly calls fetch_perplexity_deep_research, which uses synchronous requests. Because that call can take 2–4 minutes, it will block the event loop and starve every other MCP request. Please move the call into an executor (e.g., await asyncio.to_thread(...)) or provide an async HTTP client so the loop stays responsive.

Change deep_research tool to return the comprehensive synthesized research
report from Perplexity directly, instead of treating it like search results
that need scraping.

Changes:
- Perplexity client now returns tuple of (research_report, citation_urls)
- Deep research tool creates single SearchResult with full report
- Citations are preserved and listed at end of report
- No unnecessary scraping of citation URLs (they're already synthesized)

Result format:
```
# Deep Research: {query}
*Research Effort: High*
---
[Full synthesized research content from Perplexity]
---
## Sources
1. {url1}
2. {url2}
...
```

This provides the comprehensive research report that Perplexity generates
(2-4 minutes of deep research) with all citations intact, rather than
just returning scraped versions of the source URLs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
docker/clients/perplexity_client.py (1)

72-73: Add a timeout to avoid indefinite blocking.

This requests.post call lacks a timeout, which can block indefinitely and tie up the event loop when called from the async tool layer. Given that deep-research runs can take 2-4 minutes, add an explicit timeout to prevent resource exhaustion.

Apply this diff to add a timeout:

-        response = requests.post(url, headers=headers, data=json.dumps(payload))
+        response = requests.post(
+            url,
+            headers=headers,
+            data=json.dumps(payload),
+            timeout=300,
+        )

Adjust the timeout value based on expected research duration.

docker/tools/deep_research_tool.py (1)

64-70: Avoid blocking the event loop with synchronous I/O.

The deep_research_tool is async, but it directly calls the synchronous fetch_perplexity_deep_research, which uses blocking requests.post. Since deep research can take 2-4 minutes, this will freeze the event loop and prevent other MCP requests from being processed.

Apply this diff to run the synchronous call in a thread pool:

+    import asyncio
+
     # Fetch deep research from Perplexity
-    research_report, citation_urls = fetch_perplexity_deep_research(
+    research_report, citation_urls = await asyncio.to_thread(
+        fetch_perplexity_deep_research,
         query=query,
         api_key=PERPLEXITY_API_KEY,
         max_results=max_results,
         research_effort=research_effort,
     )
🧹 Nitpick comments (2)
docker/clients/perplexity_client.py (2)

97-97: Consider moving the return to an else block.

For clarity, move the successful return statement into an else block after the if "choices" in data check to make the flow more explicit.

Apply this diff:

             logger.info(
                 f"Perplexity deep research completed: {len(research_report)} chars, "
                 f"{len(citation_urls)} citations, {token_count} tokens"
             )
-
-        return research_report, citation_urls
+        else:
+            logger.warning("No choices returned in Perplexity API response")
+
+        return research_report, citation_urls

99-108: Improve error logging with exception details.

Replace logger.error with logger.exception to automatically include stack traces, which aids debugging. Additionally, catching bare Exception is overly broad; consider handling specific exceptions like requests.exceptions.RequestException or at minimum log full exception details.

Apply this diff:

     except requests.exceptions.HTTPError as e:
-        logger.error(f"Perplexity API HTTP error: {e.response.status_code} - {e}")
+        logger.exception(f"Perplexity API HTTP error: {e.response.status_code}")
         if e.response.status_code == 401:
-            logger.error("Invalid Perplexity API key")
+            logger.exception("Invalid Perplexity API key")
         elif e.response.status_code == 429:
-            logger.error("Perplexity API rate limit exceeded")
+            logger.exception("Perplexity API rate limit exceeded")
         return "", []
-    except Exception as e:
-        logger.error(f"Error fetching Perplexity deep research: {str(e)}")
+    except requests.exceptions.RequestException as e:
+        logger.exception("Error fetching Perplexity deep research")
         return "", []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c00a18f and 919d425.

📒 Files selected for processing (2)
  • docker/clients/perplexity_client.py (1 hunks)
  • docker/tools/deep_research_tool.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods

Files:

  • docker/clients/perplexity_client.py
  • docker/tools/deep_research_tool.py
docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All Python files in docker/ are included in coverage; increase tests instead of excluding files

Files:

  • docker/clients/perplexity_client.py
  • docker/tools/deep_research_tool.py
🧬 Code graph analysis (1)
docker/tools/deep_research_tool.py (2)
docker/clients/perplexity_client.py (1)
  • fetch_perplexity_deep_research (17-108)
docker/models/responses/search_response.py (1)
  • SearchResponse (15-21)
🪛 Ruff (0.13.3)
docker/clients/perplexity_client.py

72-72: Probable use of requests call without timeout

(S113)


97-97: Consider moving this statement to an else block

(TRY300)


100-100: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


102-102: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


104-104: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


106-106: Do not catch blind exception: Exception

(BLE001)


107-107: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


107-107: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (4)
docker/clients/perplexity_client.py (1)

17-41: LGTM!

The function signature and docstring are well-structured. Type hints are complete, and the documentation clearly explains the parameters, behavior, and return values.

docker/tools/deep_research_tool.py (3)

22-48: LGTM!

The function signature, type hints, and docstring are well-structured. The documentation clearly explains the research effort levels and expected duration.


50-62: LGTM!

The API key validation is well-implemented with a clear error message that guides users to configure the PERPLEXITY_API_KEY environment variable.


111-122: LGTM!

The response construction and logging are well-implemented. The search source label includes the effort level, and the completion metrics provide good observability.

- Install official perplexityai SDK (v0.15.0) for reliable API integration
- Add new deep_research MCP tool for comprehensive research queries
- Increase timeout to 10 minutes (600s) for long-running research
- Add PERPLEXITY_API_KEY environment variable configuration
- Update README with Perplexity setup instructions and tool documentation

## New Files
- clients/perplexity_client.py: Official SDK client integration
- tools/deep_research_tool.py: MCP tool for deep research
- tests/unit/clients/test_perplexity_client.py: Client unit tests (5 tests)
- tests/unit/tools/test_deep_research_tool.py: Tool unit tests (7 tests)
- tests/factories/mock_perplexity_response.py: Typed mock classes
- tests/factories/perplexity_response_factory.py: Factory pattern for tests
- test_perplexity.py: Integration test script

## Features
- Three research effort levels: low/medium/high (configurable by caller)
- Returns comprehensive report with citations intact
- Follows factory/builder pattern (no raw mocks in tests)
- Full unit test coverage (69 tests total, all passing)

## Tool Parameters
- query (required): Research question or topic
- research_effort (optional): "low"/"medium"/"high", default "high"
- max_results (optional): Max citations to return, default 5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
docker/clients/perplexity_client.py (1)

91-93: Use logging.exception for better error diagnostics.

Replace logging.error with logging.exception to automatically include the exception traceback, which aids in debugging API failures.

Apply this diff:

-    except Exception as e:
-        logger.error(f"Error fetching Perplexity deep research: {str(e)}")
+    except Exception as e:
+        logger.exception(f"Error fetching Perplexity deep research: {str(e)}")
         return "", []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 919d425 and fc6232a.

📒 Files selected for processing (11)
  • README.md (1 hunks)
  • docker/clients/perplexity_client.py (1 hunks)
  • docker/test_deep_research_mcp.py (1 hunks)
  • docker/test_perplexity.py (1 hunks)
  • docker/test_perplexity_debug.py (1 hunks)
  • docker/test_perplexity_minimal.py (1 hunks)
  • docker/tests/factories/mock_perplexity_response.py (1 hunks)
  • docker/tests/factories/perplexity_response_factory.py (1 hunks)
  • docker/tests/unit/clients/test_perplexity_client.py (1 hunks)
  • docker/tests/unit/tools/test_deep_research_tool.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods

Files:

  • docker/tests/factories/mock_perplexity_response.py
  • docker/test_perplexity.py
  • docker/tests/unit/tools/test_deep_research_tool.py
  • docker/tests/factories/perplexity_response_factory.py
  • docker/test_perplexity_minimal.py
  • docker/tests/unit/clients/test_perplexity_client.py
  • docker/test_deep_research_mcp.py
  • docker/test_perplexity_debug.py
  • docker/clients/perplexity_client.py
docker/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

docker/tests/**/*.py: Mark tests with pytest markers: use @pytest.mark.unit for unit tests, @pytest.mark.integration for integration tests, and @pytest.mark.api for external API tests
Do not depend on external services in tests unless marked as integration (@pytest.mark.integration)
Patch external calls at the module under test level (e.g., @patch('mcp_server.requests.get')) rather than patching library modules directly
Avoid using raw MagicMock with property assignment; prefer typed mock classes, factories, or builders for test doubles

Files:

  • docker/tests/factories/mock_perplexity_response.py
  • docker/tests/unit/tools/test_deep_research_tool.py
  • docker/tests/factories/perplexity_response_factory.py
  • docker/tests/unit/clients/test_perplexity_client.py
docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All Python files in docker/ are included in coverage; increase tests instead of excluding files

Files:

  • docker/tests/factories/mock_perplexity_response.py
  • docker/test_perplexity.py
  • docker/tests/unit/tools/test_deep_research_tool.py
  • docker/tests/factories/perplexity_response_factory.py
  • docker/test_perplexity_minimal.py
  • docker/tests/unit/clients/test_perplexity_client.py
  • docker/test_deep_research_mcp.py
  • docker/test_perplexity_debug.py
  • docker/clients/perplexity_client.py
{pyproject.toml,docker/Dockerfile}

📄 CodeRabbit inference engine (CLAUDE.md)

Use Python 3.11 exactly for the project runtime and tooling

Files:

  • pyproject.toml
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Define tool configurations in pyproject.toml for Black, isort, mypy, pytest, and coverage

Files:

  • pyproject.toml
🧠 Learnings (2)
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/tests/test_mcp_server.py : Add unit tests for new MCP tools in docker/tests/test_mcp_server.py

Applied to files:

  • docker/test_deep_research_mcp.py
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/README.md : Document new MCP tools and changes in docker/README.md

Applied to files:

  • docker/test_deep_research_mcp.py
🧬 Code graph analysis (7)
docker/test_perplexity.py (1)
docker/tools/deep_research_tool.py (1)
  • deep_research_tool (22-122)
docker/tests/unit/tools/test_deep_research_tool.py (1)
docker/tools/deep_research_tool.py (1)
  • deep_research_tool (22-122)
docker/tests/factories/perplexity_response_factory.py (1)
docker/tests/factories/mock_perplexity_response.py (2)
  • MockPerplexityClient (68-87)
  • MockPerplexityResponse (47-65)
docker/tests/unit/clients/test_perplexity_client.py (3)
docker/clients/perplexity_client.py (1)
  • fetch_perplexity_deep_research (16-93)
docker/tests/factories/perplexity_response_factory.py (7)
  • PerplexityResponseFactory (16-91)
  • successful_research (20-35)
  • client_with_response (82-91)
  • empty_response (38-44)
  • api_error (73-79)
  • with_many_citations (47-59)
  • without_citations (62-70)
docker/tests/factories/mock_perplexity_response.py (1)
  • create (81-87)
docker/test_deep_research_mcp.py (1)
docker/tests/factories/http_response_factory.py (1)
  • timeout (80-86)
docker/test_perplexity_debug.py (1)
docker/tests/factories/http_response_factory.py (1)
  • timeout (80-86)
docker/clients/perplexity_client.py (2)
docker/tests/factories/http_response_factory.py (1)
  • timeout (80-86)
docker/tests/factories/mock_perplexity_response.py (1)
  • create (81-87)
🪛 GitHub Check: CodeQL
docker/test_perplexity.py

[failure] 34-34: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.

docker/test_perplexity_minimal.py

[failure] 25-25: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.

docker/test_perplexity_debug.py

[failure] 24-24: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.

🪛 markdownlint-cli2 (0.18.1)
README.md

108-108: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.13.3)
docker/tests/factories/mock_perplexity_response.py

81-81: Unused method argument: kwargs

(ARG002)

docker/test_perplexity.py

18-18: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


22-22: Unused noqa directive (non-enabled: E402)

Remove unused noqa directive

(RUF100)


65-65: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


72-72: Do not catch blind exception: Exception

(BLE001)

docker/tests/factories/perplexity_response_factory.py

22-22: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

docker/test_perplexity_minimal.py

63-63: Do not catch blind exception: Exception

(BLE001)

docker/tests/unit/clients/test_perplexity_client.py

72-72: Unused lambda argument: kwargs

(ARG005)


98-98: Unpacked variable report is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

docker/clients/perplexity_client.py

89-89: Consider moving this statement to an else block

(TRY300)


91-91: Do not catch blind exception: Exception

(BLE001)


92-92: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


92-92: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🔇 Additional comments (11)
docker/tests/unit/tools/test_deep_research_tool.py (7)

15-47: LGTM!

Comprehensive test coverage for the success path. The assertions validate all critical aspects: response structure, content formatting (title, effort marker, research report), and the Sources section with numbered citations.


49-62: LGTM!

Properly validates error handling when the API key is not configured. The assertions confirm the error message guides users to set the PERPLEXITY_API_KEY environment variable.


64-80: LGTM!

Correctly validates error handling when the API returns an empty response. The test ensures appropriate error messaging for API failures.


82-104: LGTM!

Effectively verifies that default parameters (max_results=5, research_effort="high") are correctly passed to the underlying fetch function when not explicitly provided.


106-125: LGTM!

Validates that all three effort levels (low, medium, high) are correctly reflected in the formatted content output.


127-143: LGTM!

Correctly validates that the Sources section is omitted when no citations are returned, preventing empty or misleading sections in the output.


145-161: LGTM!

Validates snippet truncation for long reports, ensuring the snippet is limited to 500 characters plus an ellipsis (503 total) to prevent overwhelming the response structure.

docker/clients/perplexity_client.py (2)

16-47: LGTM!

Function signature, docstring, and client initialization are well-structured. The 10-minute timeout (600 seconds) is appropriate for deep research operations that can take 2-4 minutes to complete.


49-89: LGTM!

The API call structure and response extraction logic are correct. The citation handling (lines 74-79) appropriately handles both string citations and dict citations with a url key, ensuring robust parsing.

docker/tests/factories/perplexity_response_factory.py (1)

37-91: LGTM!

The remaining factory methods are well-structured and provide comprehensive test double creation for various Perplexity API scenarios, including empty responses, many citations, no citations, errors, and custom client responses.

docker/tests/factories/mock_perplexity_response.py (1)

11-87: LGTM!

All mock classes are well-structured and serve their purpose as test doubles. The **kwargs parameter in MockPerplexityClient.create() (line 81) is intentionally unused—it accepts any arguments but returns the preconfigured response, which is the correct behavior for a test mock.

print("❌ PERPLEXITY_API_KEY not set")
sys.exit(1)

print(f"✅ API Key: {PERPLEXITY_API_KEY[:8]}...\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not echo API key material.

This line prints the leading characters of PERPLEXITY_API_KEY, which still counts as secret disclosure. Remove the print or replace it with a non-revealing message so the key never appears in logs.

Suggested fix:

-print(f"✅ API Key: {PERPLEXITY_API_KEY[:8]}...\n")
+print("✅ PERPLEXITY_API_KEY detected.\n")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"✅ API Key: {PERPLEXITY_API_KEY[:8]}...\n")
print("✅ PERPLEXITY_API_KEY detected.\n")
🧰 Tools
🪛 GitHub Check: CodeQL

[failure] 24-24: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.

🤖 Prompt for AI Agents
In docker/test_perplexity_debug.py around line 24, the code prints the first 8
characters of PERPLEXITY_API_KEY which leaks secret material; remove that print
or replace it with a non-sensitive message. Update the line to either delete the
print entirely or log a constant, non-revealing confirmation such as "API key
configured" or check presence with a boolean debug message (e.g.,
"PERPLEXITY_API_KEY set") without including any substring of the key.

print("❌ PERPLEXITY_API_KEY not set")
sys.exit(1)

print(f"✅ API Key found: {PERPLEXITY_API_KEY[:8]}...")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop logging API key fragments.

Printing even the first few characters of PERPLEXITY_API_KEY leaks secret material into logs/terminals. Treat API keys as fully sensitive—drop the print or replace it with a generic “key detected” message without exposing any portion of the value.

Apply this diff to eliminate the leak:

-print(f"✅ API Key found: {PERPLEXITY_API_KEY[:8]}...")
+print("✅ PERPLEXITY_API_KEY detected.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"✅ API Key found: {PERPLEXITY_API_KEY[:8]}...")
print("✅ PERPLEXITY_API_KEY detected.")
🧰 Tools
🪛 GitHub Check: CodeQL

[failure] 25-25: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.

🤖 Prompt for AI Agents
In docker/test_perplexity_minimal.py around line 25, the test currently prints
the first 8 characters of PERPLEXITY_API_KEY which leaks secret material; remove
the substring exposure and replace the print with a non-sensitive message (e.g.,
"API key detected" or no log at all). Update the code to not interpolate or
output any part of the key, ensuring logs only indicate presence/absence of the
key without revealing characters.

print("\nGet your API key at: https://www.perplexity.ai/settings/api")
sys.exit(1)

print(f"✅ PERPLEXITY_API_KEY found: {PERPLEXITY_API_KEY[:8]}...")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid printing API key fragments.

Revealing the first characters of PERPLEXITY_API_KEY is still a secret leak. Replace this log with a neutral message confirming the key’s presence without exposing any portion of it.

Proposed change:

-print(f"✅ PERPLEXITY_API_KEY found: {PERPLEXITY_API_KEY[:8]}...")
+print("✅ PERPLEXITY_API_KEY detected.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"✅ PERPLEXITY_API_KEY found: {PERPLEXITY_API_KEY[:8]}...")
print("✅ PERPLEXITY_API_KEY detected.")
🧰 Tools
🪛 GitHub Check: CodeQL

[failure] 34-34: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.

🤖 Prompt for AI Agents
In docker/test_perplexity.py around line 34, the code prints the first 8
characters of PERPLEXITY_API_KEY which leaks secret information; change the
print to a neutral confirmation like "PERPLEXITY_API_KEY found" (no key fragment
or any part of the key), or remove the print entirely and rely on a boolean
check or secure logger that does not expose secrets.

Comment on lines +19 to +35
@staticmethod
def successful_research(
content: str = "Python is a high-level programming language.",
citations: List[str] = None,
) -> MockPerplexityResponse:
"""Create successful deep research response.
Args:
content: Research report content
citations: Citation URLs
Returns:
MockPerplexityResponse with research content
"""
if citations is None:
citations = ["https://python.org", "https://docs.python.org"]
return MockPerplexityResponse(content=content, citations=citations)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use explicit Optional type annotation.

Line 22 uses implicit Optional with citations: List[str] = None. Modern Python type checking requires explicit Optional[List[str]] or List[str] | None.

Apply this diff:

+from typing import List, Optional
+
 from tests.factories.mock_perplexity_response import (
     MockPerplexityClient,
     MockPerplexityResponse,
 )


 class PerplexityResponseFactory:
     """Factory for creating pre-configured Perplexity response mocks."""

     @staticmethod
     def successful_research(
         content: str = "Python is a high-level programming language.",
-        citations: List[str] = None,
+        citations: Optional[List[str]] = None,
     ) -> MockPerplexityResponse:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def successful_research(
content: str = "Python is a high-level programming language.",
citations: List[str] = None,
) -> MockPerplexityResponse:
"""Create successful deep research response.
Args:
content: Research report content
citations: Citation URLs
Returns:
MockPerplexityResponse with research content
"""
if citations is None:
citations = ["https://python.org", "https://docs.python.org"]
return MockPerplexityResponse(content=content, citations=citations)
from typing import List, Optional
from tests.factories.mock_perplexity_response import (
MockPerplexityClient,
MockPerplexityResponse,
)
class PerplexityResponseFactory:
"""Factory for creating pre-configured Perplexity response mocks."""
@staticmethod
def successful_research(
content: str = "Python is a high-level programming language.",
citations: Optional[List[str]] = None,
) -> MockPerplexityResponse:
"""Create successful deep research response.
Args:
content: Research report content
citations: Citation URLs
Returns:
MockPerplexityResponse with research content
"""
if citations is None:
citations = ["https://python.org", "https://docs.python.org"]
return MockPerplexityResponse(content=content, citations=citations)
🧰 Tools
🪛 Ruff (0.13.3)

22-22: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In docker/tests/factories/perplexity_response_factory.py around lines 19 to 35,
the function signature uses a nullable default with citations: List[str] = None
which should be explicitly typed; change the parameter annotation to citations:
Optional[List[str]] = None (or citations: List[str] | None = None for Python
3.10+), and add the corresponding Optional import from typing (or update the
typing import line to include Optional / use PEP 604 union syntax and adjust
imports accordingly). Ensure no other runtime behavior changes—only update the
type annotation and imports.

- Disable Azure Functions workflow by renaming to .yml.disabled
- Increase MAX_CONTENT_LENGTH default from 50,000 to 1,000,000 chars
- Add validation for MAX_CONTENT_LENGTH environment variable
- Keep # noqa: E402 directives (required for imports after load_dotenv)
- Improve error logging in perplexity_client.py (use logger.exception)
- Update all documentation to reflect new default value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
docker/clients/perplexity_client.py (3)

56-59: Hardcoded max_results in system prompt doesn't reflect actual parameter.

The system prompt mentions "Focus on returning {max_results} most relevant sources" but this is just guidance to the model. If max_results=10 is passed but the API returns 20 citations, line 79's slice [:max_results] correctly limits it—however, the prompt's mention might confuse reviewers into thinking the model controls this limit when it's actually enforced post-response.

Consider clarifying the prompt or removing the mention:

                     "content": (
                         "You are a comprehensive research assistant. Provide detailed, "
-                        "well-researched answers with clear structure and citations. "
-                        f"Focus on returning {max_results} most relevant sources."
+                        "well-researched answers with clear structure and citations."
                     ),

74-79: Fragile citation type handling could raise AttributeError.

The code attempts to handle citations as either strings or dicts with .get("url", ""), but if citation is a dict-like object without a get method (e.g., a custom object), this will raise AttributeError. The isinstance(citation, str) check only guards the first branch.

Apply this diff for more robust handling:

             if hasattr(response, "citations") and response.citations:
                 citation_urls = [
-                    citation if isinstance(citation, str) else citation.get("url", "")
+                    (
+                        citation
+                        if isinstance(citation, str)
+                        else getattr(citation, "url", citation.get("url", ""))
+                        if hasattr(citation, "get")
+                        else str(citation)
+                    )
                     for citation in response.citations
                     if citation
                 ][:max_results]

Or simplify by wrapping in try-except:

             if hasattr(response, "citations") and response.citations:
-                citation_urls = [
-                    citation if isinstance(citation, str) else citation.get("url", "")
-                    for citation in response.citations
-                    if citation
-                ][:max_results]
+                citation_urls = []
+                for citation in response.citations[:max_results]:
+                    if not citation:
+                        continue
+                    try:
+                        url = citation if isinstance(citation, str) else citation.get("url", "")
+                        if url:
+                            citation_urls.append(url)
+                    except (AttributeError, TypeError):
+                        logger.warning(f"Skipping invalid citation format: {citation}")

89-93: Improve error handling based on static analysis hints.

Ruff flags two issues:

  1. Line 89: return statement could move to an else block for clarity (the success path returns inside the if, so the final return is only reached on empty response)
  2. Line 92: logging.exception() automatically includes the exception object, so str(e) is redundant

Apply this diff:

             logger.info(
                 f"Perplexity deep research completed: {len(research_report)} chars, "
                 f"{len(citation_urls)} citations, {token_count} tokens"
             )
+        else:
+            logger.warning("Perplexity returned empty response")

         return research_report, citation_urls

     except Exception as e:
-        logger.exception(f"Error fetching Perplexity deep research: {str(e)}")
+        logger.exception("Error fetching Perplexity deep research")
         return "", []

Based on coding guidelines

README.md (1)

108-108: Fix bare URL flagged by markdownlint.

Markdown linters prefer URLs in link syntax rather than bare URLs for consistency and accessibility.

Apply this diff:

-| `PERPLEXITY_API_KEY` | *(none)* | Perplexity API key for deep research tool (optional, get at https://www.perplexity.ai/settings/api) |
+| `PERPLEXITY_API_KEY` | *(none)* | Perplexity API key for deep research tool (optional, get at [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api)) |
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc6232a and b6103b4.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • docker/.env.example (2 hunks)
  • docker/README.md (2 hunks)
  • docker/clients/perplexity_client.py (1 hunks)
  • docker/constants.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • docker/.env.example
  • docker/README.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods

Files:

  • docker/constants.py
  • docker/clients/perplexity_client.py
docker/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All Python files in docker/ are included in coverage; increase tests instead of excluding files

Files:

  • docker/constants.py
  • docker/clients/perplexity_client.py
🪛 markdownlint-cli2 (0.18.1)
README.md

108-108: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.13.3)
docker/clients/perplexity_client.py

89-89: Consider moving this statement to an else block

(TRY300)


92-92: Use explicit conversion flag

Replace with conversion flag

(RUF010)


92-92: Redundant exception object included in logging.exception call

(TRY401)

🔇 Additional comments (2)
docker/clients/perplexity_client.py (1)

1-93: Verify PR scope: Is Perplexity integration part of PR #46?

This entire file implements Perplexity deep research functionality, but PR #46's title and objectives focus solely on making MAX_CONTENT_LENGTH configurable (increasing from 8,000 to 50,000 chars). The Perplexity integration appears to be an unrelated feature.

If this is intentional, please update the PR description to document this additional scope. If not, consider moving Perplexity changes to a separate PR for clearer change tracking and review.

README.md (1)

108-108: Verify PR scope: Is Perplexity documentation part of PR #46?

Lines 108 and 123-128 document the Perplexity API integration, but PR #46's title and objectives focus solely on making MAX_CONTENT_LENGTH configurable. This appears to be unrelated scope creep.

If Perplexity integration is intentional for this PR, please update the PR description. Otherwise, consider moving these documentation changes to a separate PR that includes the corresponding code changes.

Also applies to: 123-128

Comment on lines +27 to +30
try:
MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "1000000"))
except ValueError:
MAX_CONTENT_LENGTH = 1000000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Default value contradicts PR objectives.

The PR title and objectives state increasing MAX_CONTENT_LENGTH from 8,000 to 50,000 characters, but this implementation uses 1,000,000 (1 million) as the default—20× higher than documented. This contradicts the PR summary, README.md, and docker/.env.example if they also claim 50,000.

Additionally, the current implementation has no validation:

  • Negative or zero values are accepted silently
  • Invalid values fall back without logging a warning
  • No upper bound check to prevent memory issues

Apply this diff to align with PR objectives and add validation:

-try:
-    MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "1000000"))
-except ValueError:
-    MAX_CONTENT_LENGTH = 1000000
+try:
+    MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000"))
+    if MAX_CONTENT_LENGTH <= 0:
+        logger.warning(
+            f"Invalid MAX_CONTENT_LENGTH={MAX_CONTENT_LENGTH}, using default 50000"
+        )
+        MAX_CONTENT_LENGTH = 50000
+except ValueError as e:
+    logger.warning(
+        f"Invalid MAX_CONTENT_LENGTH format: {e}, using default 50000"
+    )
+    MAX_CONTENT_LENGTH = 50000

Note: You'll need to add import logging and logger = logging.getLogger(__name__) at the top of the file for the warning logs.

Committable suggestion skipped: line range outside the PR's diff.

| `PORT` | `8000` | Server port |
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
| `LOG_DIR` | `/tmp` | Log file directory |
| `MAX_CONTENT_LENGTH` | `1000000` | Maximum characters to return per scraped article |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Documentation contradicts PR objectives.

The documented default of 1,000,000 contradicts the PR title and objectives, which state increasing MAX_CONTENT_LENGTH from 8,000 to 50,000 characters. This same inconsistency appears in docker/constants.py.

Apply this diff:

-| `MAX_CONTENT_LENGTH` | `1000000` | Maximum characters to return per scraped article |
+| `MAX_CONTENT_LENGTH` | `50000` | Maximum characters to return per scraped article |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `MAX_CONTENT_LENGTH` | `1000000` | Maximum characters to return per scraped article |
| `MAX_CONTENT_LENGTH` | `50000` | Maximum characters to return per scraped article |
🤖 Prompt for AI Agents
In README.md around line 113 the documented default MAX_CONTENT_LENGTH is
1,000,000 which contradicts the PR objective to increase it from 8,000 to
50,000; update the README entry to show `50000` as the default and also update
docker/constants.py to set MAX_CONTENT_LENGTH = 50000 (and search the repo for
any other places documenting or hardcoding the old 1,000,000/8000 values to make
them consistent).

- Adjust test_truncates_content_exceeding_max_length to use content larger than 1M chars
- Previous 100K char test no longer triggers truncation with new 1M default

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@T-rav T-rav merged commit 8be0fc4 into main Oct 10, 2025
11 checks passed
@T-rav T-rav deleted the feat/configurable-max-content-length branch October 10, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants