-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Increase MAX_CONTENT_LENGTH to 50,000 chars and make it configurable #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rable - Increase MAX_CONTENT_LENGTH from 8,000 to 50,000 characters (default) - Make MAX_CONTENT_LENGTH configurable via .env file - Remove unused dependencies (beautifulsoup4, readability-lxml, html2text) - Trafilatura now handles all content extraction with built-in markdown output - Update READMEs with MAX_CONTENT_LENGTH documentation - Clean up mypy overrides to reflect current dependencies This allows AI agents to receive fuller article content and decide whether to compress it, rather than pre-truncating at 8K chars. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Caution Review failedThe pull request is closed. WalkthroughAdds Perplexity integration: new PERPLEXITY_API_KEY and MAX_CONTENT_LENGTH env vars, Perplexity SDK dependency, a Perplexity client, a deep_research MCP tool and registration, test/debug scripts and unit tests with test factories, docs/.env updates, and constants driving content truncation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as User / Client
participant MCP as mcp_server
participant Tool as deep_research_tool
participant Client as perplexity_client
participant API as Perplexity API
Note over MCP,Tool: New tool registration & invocation
User->>MCP: tools/call deep_research(query, effort, max_results)
MCP->>Tool: invoke deep_research_tool(query, effort, max_results)
alt API key missing
Tool-->>MCP: SearchResponse { error message }
MCP-->>User: Error response
else API key present
Tool->>Client: fetch_perplexity_deep_research(query, api_key, max_results, effort)
Client->>API: chat.completions.create(model: sonar-deep-research, payload)
API-->>Client: response (choices.content, citations, usage)
Client-->>Tool: (research_report, citation_urls)
Tool->>MCP: formatted SearchResponse (report + Sources)
MCP-->>User: SSE/HTTP response with SearchResponse
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docker/constants.py (1)
27-27: Consider adding validation for the environment variable.The current implementation will raise a
ValueErrorifMAX_CONTENT_LENGTHis set to a non-numeric string in the environment, causing the application to fail at startup with a potentially unclear error message.Consider one of these approaches for better robustness:
Option 1: Add a try-except block with a fallback
-MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000")) +try: + MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000")) +except ValueError: + MAX_CONTENT_LENGTH = 50000 + # Optionally log a warning hereOption 2: Add explicit validation with a clear error message
-MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000")) +_content_length_str = os.environ.get("MAX_CONTENT_LENGTH", "50000") +try: + MAX_CONTENT_LENGTH = int(_content_length_str) +except ValueError: + raise ValueError( + f"MAX_CONTENT_LENGTH must be a valid integer, got: {_content_length_str}" + )Alternatively, if you prefer to keep it simple, document in the
.env.examplefile that the value must be numeric.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md(1 hunks)docker/.env.example(1 hunks)docker/README.md(1 hunks)docker/constants.py(2 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
docker/.env.example
📄 CodeRabbit inference engine (CLAUDE.md)
Update docker/.env.example when introducing new environment variables (e.g., API keys)
Files:
docker/.env.example
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods
Files:
docker/constants.py
docker/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All Python files in docker/ are included in coverage; increase tests instead of excluding files
Files:
docker/constants.py
{pyproject.toml,docker/Dockerfile}
📄 CodeRabbit inference engine (CLAUDE.md)
Use Python 3.11 exactly for the project runtime and tooling
Files:
pyproject.toml
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
Define tool configurations in pyproject.toml for Black, isort, mypy, pytest, and coverage
Files:
pyproject.toml
docker/README.md
📄 CodeRabbit inference engine (CLAUDE.md)
Document new MCP tools and changes in docker/README.md
Files:
docker/README.md
🧠 Learnings (1)
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/.env.example : Update docker/.env.example when introducing new environment variables (e.g., API keys)
Applied to files:
docker/.env.example
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Security Scan
🔇 Additional comments (5)
docker/.env.example (1)
15-17: LGTM!The new
MAX_CONTENT_LENGTHenvironment variable is properly configured with a clear section header and a sensible default value of 50,000 characters. This aligns with the changes indocker/constants.pyand the documentation updates in README.md and docker/README.md.Based on coding guidelines
README.md (1)
112-112: LGTM!The documentation for
MAX_CONTENT_LENGTHis clear, accurate, and consistent with the implementation indocker/constants.py. The default value of 50,000 characters and the description appropriately explain the purpose of this configuration option.pyproject.toml (1)
168-168: LGTM!The addition of
trafilatura.*to the mypy overrides is correct and necessary. This properly handles the lack of type stubs for the Trafilatura library, which is now the primary content extraction tool after removing beautifulsoup4, readability-lxml, html2text, and lxml_html_clean dependencies.docker/constants.py (1)
8-8: LGTM!The import of the
osmodule is correctly placed and necessary for reading theMAX_CONTENT_LENGTHenvironment variable.docker/README.md (1)
56-56: LGTM!The documentation for
MAX_CONTENT_LENGTHin the Docker README is accurate, clear, and consistent with the implementation indocker/constants.pyand the main README.md. The default value of 50,000 characters is correctly stated.
Remove unused development dependencies that were slowing down CI: - autoflake, autopep8 (redundant with Black/isort) - pylint (using flake8 instead) - pytest-xdist (not used for parallel tests) - testcontainers (not used in tests) - ipdb, ipython, jupyter (debug/REPL tools, not needed in CI) - click (not imported anywhere) - pip-tools, tox (not used) - line-profiler, memory-profiler (profiling tools, not used) - types-PyYAML (no YAML usage in codebase) This reduces CI installation time from ~5-10 minutes to ~1-2 minutes by eliminating heavy dependency trees (jupyter, testcontainers, etc). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add comprehensive deep research capabilities using Perplexity AI's sonar-deep-research model as a new MCP tool. New Features: - New `deep_research` MCP tool for comprehensive research - Uses Perplexity's sonar-deep-research model - Performs dozens of searches, reads hundreds of sources - Synthesizes findings into comprehensive reports - Configurable research effort (low, medium, high) - Takes 2-4 minutes vs many hours for humans Components Added: - clients/perplexity_client.py - Perplexity API integration - tools/deep_research_tool.py - Deep research MCP tool - PERPLEXITY_API_KEY environment variable configuration - Updated documentation in README.md and docker/README.md This provides a separate, dedicated tool for in-depth research alongside the existing fast search tool. Users can choose between quick search (Serper/DuckDuckGo) and comprehensive deep research (Perplexity). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
docker/mcp_server.py (1)
56-58: Drop the unused# noqa: E402directivesRuff flags these directives as unused (RUF100). Since importing after
load_dotenv()no longer violates E402, please remove the# noqa: E402comments so lint continues to pass.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
README.md(1 hunks)docker/.env.example(2 hunks)docker/README.md(2 hunks)docker/clients/perplexity_client.py(1 hunks)docker/mcp_server.py(2 hunks)docker/tools/deep_research_tool.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docker/.env.example
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods
Files:
docker/clients/perplexity_client.pydocker/tools/deep_research_tool.pydocker/mcp_server.py
docker/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All Python files in docker/ are included in coverage; increase tests instead of excluding files
Files:
docker/clients/perplexity_client.pydocker/tools/deep_research_tool.pydocker/mcp_server.py
docker/mcp_server.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define new MCP tools in docker/mcp_server.py using the @mcp_server.tool() decorator
Files:
docker/mcp_server.py
docker/README.md
📄 CodeRabbit inference engine (CLAUDE.md)
Document new MCP tools and changes in docker/README.md
Files:
docker/README.md
🧠 Learnings (3)
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/mcp_server.py : Define new MCP tools in docker/mcp_server.py using the mcp_server.tool() decorator
Applied to files:
docker/mcp_server.py
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/tests/test_mcp_server.py : Add unit tests for new MCP tools in docker/tests/test_mcp_server.py
Applied to files:
docker/mcp_server.py
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/README.md : Document new MCP tools and changes in docker/README.md
Applied to files:
docker/README.md
🧬 Code graph analysis (3)
docker/clients/perplexity_client.py (1)
docker/models/domain/api_search_result.py (1)
APISearchResult(11-22)
docker/tools/deep_research_tool.py (3)
docker/clients/perplexity_client.py (1)
fetch_perplexity_deep_research(19-127)docker/models/responses/search_response.py (1)
SearchResponse(15-21)docker/services/search_processor.py (1)
process_search_results(15-39)
docker/mcp_server.py (3)
docker/tools/deep_research_tool.py (1)
deep_research_tool(22-97)docker/tools/health_check_tool.py (1)
health_check_tool(11-23)docker/tools/search_tool.py (1)
search_tool(23-63)
🪛 markdownlint-cli2 (0.18.1)
docker/README.md
53-53: Bare URL used
(MD034, no-bare-urls)
README.md
108-108: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.13.3)
docker/clients/perplexity_client.py
72-72: Probable use of requests call without timeout
(S113)
116-116: Consider moving this statement to an else block
(TRY300)
119-119: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
121-121: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
123-123: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
125-125: Do not catch blind exception: Exception
(BLE001)
126-126: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
126-126: Use explicit conversion flag
Replace with conversion flag
(RUF010)
docker/mcp_server.py
56-56: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
57-57: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
58-58: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
docker/clients/perplexity_client.py
Outdated
|
|
||
| import requests | ||
|
|
||
| from models.api_search_result import APISearchResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the APISearchResult import path
APISearchResult lives under models.domain.api_search_result; importing models.api_search_result will raise ModuleNotFoundError at runtime. Update the import to target the correct module.
-from models.api_search_result import APISearchResult
+from models.domain.api_search_result import APISearchResult📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from models.api_search_result import APISearchResult | |
| from models.domain.api_search_result import APISearchResult |
🤖 Prompt for AI Agents
In docker/clients/perplexity_client.py around line 14, the import incorrectly
references models.api_search_result which will raise ModuleNotFoundError; change
the import to point to the correct module path models.domain.api_search_result
and import APISearchResult from there (i.e., replace the current import with one
that imports APISearchResult from models.domain.api_search_result).
docker/tools/deep_research_tool.py
Outdated
| api_results = fetch_perplexity_deep_research( | ||
| query=query, | ||
| api_key=PERPLEXITY_API_KEY, | ||
| max_results=max_results, | ||
| research_effort=research_effort, | ||
| ) | ||
|
|
||
| # Check if we got any results | ||
| if not api_results: | ||
| logger.warning(f"No deep research results found for query: {query}") | ||
| response = SearchResponse( | ||
| query=query, | ||
| search_source="Perplexity Deep Research", | ||
| results=[], | ||
| error=( | ||
| "Deep research failed. This could be due to API limits, " | ||
| "invalid query, or temporary service issues." | ||
| ), | ||
| ) | ||
| return response.model_dump() | ||
|
|
||
| # Process the results (scrape and convert to markdown) | ||
| processed_results = process_search_results(api_results) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid blocking the event loop with synchronous I/O
deep_research_tool is async, but it directly calls fetch_perplexity_deep_research, which uses synchronous requests. Because that call can take 2–4 minutes, it will block the event loop and starve every other MCP request. Please move the call into an executor (e.g., await asyncio.to_thread(...)) or provide an async HTTP client so the loop stays responsive.
Change deep_research tool to return the comprehensive synthesized research
report from Perplexity directly, instead of treating it like search results
that need scraping.
Changes:
- Perplexity client now returns tuple of (research_report, citation_urls)
- Deep research tool creates single SearchResult with full report
- Citations are preserved and listed at end of report
- No unnecessary scraping of citation URLs (they're already synthesized)
Result format:
```
# Deep Research: {query}
*Research Effort: High*
---
[Full synthesized research content from Perplexity]
---
## Sources
1. {url1}
2. {url2}
...
```
This provides the comprehensive research report that Perplexity generates
(2-4 minutes of deep research) with all citations intact, rather than
just returning scraped versions of the source URLs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
docker/clients/perplexity_client.py (1)
72-73: Add a timeout to avoid indefinite blocking.This
requests.postcall lacks a timeout, which can block indefinitely and tie up the event loop when called from the async tool layer. Given that deep-research runs can take 2-4 minutes, add an explicit timeout to prevent resource exhaustion.Apply this diff to add a timeout:
- response = requests.post(url, headers=headers, data=json.dumps(payload)) + response = requests.post( + url, + headers=headers, + data=json.dumps(payload), + timeout=300, + )Adjust the timeout value based on expected research duration.
docker/tools/deep_research_tool.py (1)
64-70: Avoid blocking the event loop with synchronous I/O.The
deep_research_toolis async, but it directly calls the synchronousfetch_perplexity_deep_research, which uses blockingrequests.post. Since deep research can take 2-4 minutes, this will freeze the event loop and prevent other MCP requests from being processed.Apply this diff to run the synchronous call in a thread pool:
+ import asyncio + # Fetch deep research from Perplexity - research_report, citation_urls = fetch_perplexity_deep_research( + research_report, citation_urls = await asyncio.to_thread( + fetch_perplexity_deep_research, query=query, api_key=PERPLEXITY_API_KEY, max_results=max_results, research_effort=research_effort, )
🧹 Nitpick comments (2)
docker/clients/perplexity_client.py (2)
97-97: Consider moving the return to an else block.For clarity, move the successful return statement into an
elseblock after theif "choices" in datacheck to make the flow more explicit.Apply this diff:
logger.info( f"Perplexity deep research completed: {len(research_report)} chars, " f"{len(citation_urls)} citations, {token_count} tokens" ) - - return research_report, citation_urls + else: + logger.warning("No choices returned in Perplexity API response") + + return research_report, citation_urls
99-108: Improve error logging with exception details.Replace
logger.errorwithlogger.exceptionto automatically include stack traces, which aids debugging. Additionally, catching bareExceptionis overly broad; consider handling specific exceptions likerequests.exceptions.RequestExceptionor at minimum log full exception details.Apply this diff:
except requests.exceptions.HTTPError as e: - logger.error(f"Perplexity API HTTP error: {e.response.status_code} - {e}") + logger.exception(f"Perplexity API HTTP error: {e.response.status_code}") if e.response.status_code == 401: - logger.error("Invalid Perplexity API key") + logger.exception("Invalid Perplexity API key") elif e.response.status_code == 429: - logger.error("Perplexity API rate limit exceeded") + logger.exception("Perplexity API rate limit exceeded") return "", [] - except Exception as e: - logger.error(f"Error fetching Perplexity deep research: {str(e)}") + except requests.exceptions.RequestException as e: + logger.exception("Error fetching Perplexity deep research") return "", []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docker/clients/perplexity_client.py(1 hunks)docker/tools/deep_research_tool.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods
Files:
docker/clients/perplexity_client.pydocker/tools/deep_research_tool.py
docker/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All Python files in docker/ are included in coverage; increase tests instead of excluding files
Files:
docker/clients/perplexity_client.pydocker/tools/deep_research_tool.py
🧬 Code graph analysis (1)
docker/tools/deep_research_tool.py (2)
docker/clients/perplexity_client.py (1)
fetch_perplexity_deep_research(17-108)docker/models/responses/search_response.py (1)
SearchResponse(15-21)
🪛 Ruff (0.13.3)
docker/clients/perplexity_client.py
72-72: Probable use of requests call without timeout
(S113)
97-97: Consider moving this statement to an else block
(TRY300)
100-100: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
102-102: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
104-104: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
106-106: Do not catch blind exception: Exception
(BLE001)
107-107: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
107-107: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (4)
docker/clients/perplexity_client.py (1)
17-41: LGTM!The function signature and docstring are well-structured. Type hints are complete, and the documentation clearly explains the parameters, behavior, and return values.
docker/tools/deep_research_tool.py (3)
22-48: LGTM!The function signature, type hints, and docstring are well-structured. The documentation clearly explains the research effort levels and expected duration.
50-62: LGTM!The API key validation is well-implemented with a clear error message that guides users to configure the
PERPLEXITY_API_KEYenvironment variable.
111-122: LGTM!The response construction and logging are well-implemented. The search source label includes the effort level, and the completion metrics provide good observability.
- Install official perplexityai SDK (v0.15.0) for reliable API integration - Add new deep_research MCP tool for comprehensive research queries - Increase timeout to 10 minutes (600s) for long-running research - Add PERPLEXITY_API_KEY environment variable configuration - Update README with Perplexity setup instructions and tool documentation ## New Files - clients/perplexity_client.py: Official SDK client integration - tools/deep_research_tool.py: MCP tool for deep research - tests/unit/clients/test_perplexity_client.py: Client unit tests (5 tests) - tests/unit/tools/test_deep_research_tool.py: Tool unit tests (7 tests) - tests/factories/mock_perplexity_response.py: Typed mock classes - tests/factories/perplexity_response_factory.py: Factory pattern for tests - test_perplexity.py: Integration test script ## Features - Three research effort levels: low/medium/high (configurable by caller) - Returns comprehensive report with citations intact - Follows factory/builder pattern (no raw mocks in tests) - Full unit test coverage (69 tests total, all passing) ## Tool Parameters - query (required): Research question or topic - research_effort (optional): "low"/"medium"/"high", default "high" - max_results (optional): Max citations to return, default 5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
docker/clients/perplexity_client.py (1)
91-93: Uselogging.exceptionfor better error diagnostics.Replace
logging.errorwithlogging.exceptionto automatically include the exception traceback, which aids in debugging API failures.Apply this diff:
- except Exception as e: - logger.error(f"Error fetching Perplexity deep research: {str(e)}") + except Exception as e: + logger.exception(f"Error fetching Perplexity deep research: {str(e)}") return "", []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
README.md(1 hunks)docker/clients/perplexity_client.py(1 hunks)docker/test_deep_research_mcp.py(1 hunks)docker/test_perplexity.py(1 hunks)docker/test_perplexity_debug.py(1 hunks)docker/test_perplexity_minimal.py(1 hunks)docker/tests/factories/mock_perplexity_response.py(1 hunks)docker/tests/factories/perplexity_response_factory.py(1 hunks)docker/tests/unit/clients/test_perplexity_client.py(1 hunks)docker/tests/unit/tools/test_deep_research_tool.py(1 hunks)pyproject.toml(2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods
Files:
docker/tests/factories/mock_perplexity_response.pydocker/test_perplexity.pydocker/tests/unit/tools/test_deep_research_tool.pydocker/tests/factories/perplexity_response_factory.pydocker/test_perplexity_minimal.pydocker/tests/unit/clients/test_perplexity_client.pydocker/test_deep_research_mcp.pydocker/test_perplexity_debug.pydocker/clients/perplexity_client.py
docker/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
docker/tests/**/*.py: Mark tests with pytest markers: use @pytest.mark.unit for unit tests, @pytest.mark.integration for integration tests, and @pytest.mark.api for external API tests
Do not depend on external services in tests unless marked as integration (@pytest.mark.integration)
Patch external calls at the module under test level (e.g., @patch('mcp_server.requests.get')) rather than patching library modules directly
Avoid using raw MagicMock with property assignment; prefer typed mock classes, factories, or builders for test doubles
Files:
docker/tests/factories/mock_perplexity_response.pydocker/tests/unit/tools/test_deep_research_tool.pydocker/tests/factories/perplexity_response_factory.pydocker/tests/unit/clients/test_perplexity_client.py
docker/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All Python files in docker/ are included in coverage; increase tests instead of excluding files
Files:
docker/tests/factories/mock_perplexity_response.pydocker/test_perplexity.pydocker/tests/unit/tools/test_deep_research_tool.pydocker/tests/factories/perplexity_response_factory.pydocker/test_perplexity_minimal.pydocker/tests/unit/clients/test_perplexity_client.pydocker/test_deep_research_mcp.pydocker/test_perplexity_debug.pydocker/clients/perplexity_client.py
{pyproject.toml,docker/Dockerfile}
📄 CodeRabbit inference engine (CLAUDE.md)
Use Python 3.11 exactly for the project runtime and tooling
Files:
pyproject.toml
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
Define tool configurations in pyproject.toml for Black, isort, mypy, pytest, and coverage
Files:
pyproject.toml
🧠 Learnings (2)
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/tests/test_mcp_server.py : Add unit tests for new MCP tools in docker/tests/test_mcp_server.py
Applied to files:
docker/test_deep_research_mcp.py
📚 Learning: 2025-10-10T01:00:46.506Z
Learnt from: CR
PR: Kode-Rex/webcat#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-10T01:00:46.506Z
Learning: Applies to docker/README.md : Document new MCP tools and changes in docker/README.md
Applied to files:
docker/test_deep_research_mcp.py
🧬 Code graph analysis (7)
docker/test_perplexity.py (1)
docker/tools/deep_research_tool.py (1)
deep_research_tool(22-122)
docker/tests/unit/tools/test_deep_research_tool.py (1)
docker/tools/deep_research_tool.py (1)
deep_research_tool(22-122)
docker/tests/factories/perplexity_response_factory.py (1)
docker/tests/factories/mock_perplexity_response.py (2)
MockPerplexityClient(68-87)MockPerplexityResponse(47-65)
docker/tests/unit/clients/test_perplexity_client.py (3)
docker/clients/perplexity_client.py (1)
fetch_perplexity_deep_research(16-93)docker/tests/factories/perplexity_response_factory.py (7)
PerplexityResponseFactory(16-91)successful_research(20-35)client_with_response(82-91)empty_response(38-44)api_error(73-79)with_many_citations(47-59)without_citations(62-70)docker/tests/factories/mock_perplexity_response.py (1)
create(81-87)
docker/test_deep_research_mcp.py (1)
docker/tests/factories/http_response_factory.py (1)
timeout(80-86)
docker/test_perplexity_debug.py (1)
docker/tests/factories/http_response_factory.py (1)
timeout(80-86)
docker/clients/perplexity_client.py (2)
docker/tests/factories/http_response_factory.py (1)
timeout(80-86)docker/tests/factories/mock_perplexity_response.py (1)
create(81-87)
🪛 GitHub Check: CodeQL
docker/test_perplexity.py
[failure] 34-34: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.
docker/test_perplexity_minimal.py
[failure] 25-25: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.
docker/test_perplexity_debug.py
[failure] 24-24: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.
🪛 markdownlint-cli2 (0.18.1)
README.md
108-108: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.13.3)
docker/tests/factories/mock_perplexity_response.py
81-81: Unused method argument: kwargs
(ARG002)
docker/test_perplexity.py
18-18: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
22-22: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
65-65: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
72-72: Do not catch blind exception: Exception
(BLE001)
docker/tests/factories/perplexity_response_factory.py
22-22: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
docker/test_perplexity_minimal.py
63-63: Do not catch blind exception: Exception
(BLE001)
docker/tests/unit/clients/test_perplexity_client.py
72-72: Unused lambda argument: kwargs
(ARG005)
98-98: Unpacked variable report is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
docker/clients/perplexity_client.py
89-89: Consider moving this statement to an else block
(TRY300)
91-91: Do not catch blind exception: Exception
(BLE001)
92-92: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
92-92: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (11)
docker/tests/unit/tools/test_deep_research_tool.py (7)
15-47: LGTM!Comprehensive test coverage for the success path. The assertions validate all critical aspects: response structure, content formatting (title, effort marker, research report), and the Sources section with numbered citations.
49-62: LGTM!Properly validates error handling when the API key is not configured. The assertions confirm the error message guides users to set the
PERPLEXITY_API_KEYenvironment variable.
64-80: LGTM!Correctly validates error handling when the API returns an empty response. The test ensures appropriate error messaging for API failures.
82-104: LGTM!Effectively verifies that default parameters (
max_results=5,research_effort="high") are correctly passed to the underlying fetch function when not explicitly provided.
106-125: LGTM!Validates that all three effort levels (
low,medium,high) are correctly reflected in the formatted content output.
127-143: LGTM!Correctly validates that the Sources section is omitted when no citations are returned, preventing empty or misleading sections in the output.
145-161: LGTM!Validates snippet truncation for long reports, ensuring the snippet is limited to 500 characters plus an ellipsis (503 total) to prevent overwhelming the response structure.
docker/clients/perplexity_client.py (2)
16-47: LGTM!Function signature, docstring, and client initialization are well-structured. The 10-minute timeout (600 seconds) is appropriate for deep research operations that can take 2-4 minutes to complete.
49-89: LGTM!The API call structure and response extraction logic are correct. The citation handling (lines 74-79) appropriately handles both string citations and dict citations with a
urlkey, ensuring robust parsing.docker/tests/factories/perplexity_response_factory.py (1)
37-91: LGTM!The remaining factory methods are well-structured and provide comprehensive test double creation for various Perplexity API scenarios, including empty responses, many citations, no citations, errors, and custom client responses.
docker/tests/factories/mock_perplexity_response.py (1)
11-87: LGTM!All mock classes are well-structured and serve their purpose as test doubles. The
**kwargsparameter inMockPerplexityClient.create()(line 81) is intentionally unused—it accepts any arguments but returns the preconfigured response, which is the correct behavior for a test mock.
| print("❌ PERPLEXITY_API_KEY not set") | ||
| sys.exit(1) | ||
|
|
||
| print(f"✅ API Key: {PERPLEXITY_API_KEY[:8]}...\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not echo API key material.
This line prints the leading characters of PERPLEXITY_API_KEY, which still counts as secret disclosure. Remove the print or replace it with a non-revealing message so the key never appears in logs.
Suggested fix:
-print(f"✅ API Key: {PERPLEXITY_API_KEY[:8]}...\n")
+print("✅ PERPLEXITY_API_KEY detected.\n")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"✅ API Key: {PERPLEXITY_API_KEY[:8]}...\n") | |
| print("✅ PERPLEXITY_API_KEY detected.\n") |
🧰 Tools
🪛 GitHub Check: CodeQL
[failure] 24-24: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.
🤖 Prompt for AI Agents
In docker/test_perplexity_debug.py around line 24, the code prints the first 8
characters of PERPLEXITY_API_KEY which leaks secret material; remove that print
or replace it with a non-sensitive message. Update the line to either delete the
print entirely or log a constant, non-revealing confirmation such as "API key
configured" or check presence with a boolean debug message (e.g.,
"PERPLEXITY_API_KEY set") without including any substring of the key.
| print("❌ PERPLEXITY_API_KEY not set") | ||
| sys.exit(1) | ||
|
|
||
| print(f"✅ API Key found: {PERPLEXITY_API_KEY[:8]}...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop logging API key fragments.
Printing even the first few characters of PERPLEXITY_API_KEY leaks secret material into logs/terminals. Treat API keys as fully sensitive—drop the print or replace it with a generic “key detected” message without exposing any portion of the value.
Apply this diff to eliminate the leak:
-print(f"✅ API Key found: {PERPLEXITY_API_KEY[:8]}...")
+print("✅ PERPLEXITY_API_KEY detected.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"✅ API Key found: {PERPLEXITY_API_KEY[:8]}...") | |
| print("✅ PERPLEXITY_API_KEY detected.") |
🧰 Tools
🪛 GitHub Check: CodeQL
[failure] 25-25: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.
🤖 Prompt for AI Agents
In docker/test_perplexity_minimal.py around line 25, the test currently prints
the first 8 characters of PERPLEXITY_API_KEY which leaks secret material; remove
the substring exposure and replace the print with a non-sensitive message (e.g.,
"API key detected" or no log at all). Update the code to not interpolate or
output any part of the key, ensuring logs only indicate presence/absence of the
key without revealing characters.
| print("\nGet your API key at: https://www.perplexity.ai/settings/api") | ||
| sys.exit(1) | ||
|
|
||
| print(f"✅ PERPLEXITY_API_KEY found: {PERPLEXITY_API_KEY[:8]}...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid printing API key fragments.
Revealing the first characters of PERPLEXITY_API_KEY is still a secret leak. Replace this log with a neutral message confirming the key’s presence without exposing any portion of it.
Proposed change:
-print(f"✅ PERPLEXITY_API_KEY found: {PERPLEXITY_API_KEY[:8]}...")
+print("✅ PERPLEXITY_API_KEY detected.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"✅ PERPLEXITY_API_KEY found: {PERPLEXITY_API_KEY[:8]}...") | |
| print("✅ PERPLEXITY_API_KEY detected.") |
🧰 Tools
🪛 GitHub Check: CodeQL
[failure] 34-34: Clear-text logging of sensitive information
This expression logs sensitive data (password) as clear text.
🤖 Prompt for AI Agents
In docker/test_perplexity.py around line 34, the code prints the first 8
characters of PERPLEXITY_API_KEY which leaks secret information; change the
print to a neutral confirmation like "PERPLEXITY_API_KEY found" (no key fragment
or any part of the key), or remove the print entirely and rely on a boolean
check or secure logger that does not expose secrets.
| @staticmethod | ||
| def successful_research( | ||
| content: str = "Python is a high-level programming language.", | ||
| citations: List[str] = None, | ||
| ) -> MockPerplexityResponse: | ||
| """Create successful deep research response. | ||
| Args: | ||
| content: Research report content | ||
| citations: Citation URLs | ||
| Returns: | ||
| MockPerplexityResponse with research content | ||
| """ | ||
| if citations is None: | ||
| citations = ["https://python.org", "https://docs.python.org"] | ||
| return MockPerplexityResponse(content=content, citations=citations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use explicit Optional type annotation.
Line 22 uses implicit Optional with citations: List[str] = None. Modern Python type checking requires explicit Optional[List[str]] or List[str] | None.
Apply this diff:
+from typing import List, Optional
+
from tests.factories.mock_perplexity_response import (
MockPerplexityClient,
MockPerplexityResponse,
)
class PerplexityResponseFactory:
"""Factory for creating pre-configured Perplexity response mocks."""
@staticmethod
def successful_research(
content: str = "Python is a high-level programming language.",
- citations: List[str] = None,
+ citations: Optional[List[str]] = None,
) -> MockPerplexityResponse:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @staticmethod | |
| def successful_research( | |
| content: str = "Python is a high-level programming language.", | |
| citations: List[str] = None, | |
| ) -> MockPerplexityResponse: | |
| """Create successful deep research response. | |
| Args: | |
| content: Research report content | |
| citations: Citation URLs | |
| Returns: | |
| MockPerplexityResponse with research content | |
| """ | |
| if citations is None: | |
| citations = ["https://python.org", "https://docs.python.org"] | |
| return MockPerplexityResponse(content=content, citations=citations) | |
| from typing import List, Optional | |
| from tests.factories.mock_perplexity_response import ( | |
| MockPerplexityClient, | |
| MockPerplexityResponse, | |
| ) | |
| class PerplexityResponseFactory: | |
| """Factory for creating pre-configured Perplexity response mocks.""" | |
| @staticmethod | |
| def successful_research( | |
| content: str = "Python is a high-level programming language.", | |
| citations: Optional[List[str]] = None, | |
| ) -> MockPerplexityResponse: | |
| """Create successful deep research response. | |
| Args: | |
| content: Research report content | |
| citations: Citation URLs | |
| Returns: | |
| MockPerplexityResponse with research content | |
| """ | |
| if citations is None: | |
| citations = ["https://python.org", "https://docs.python.org"] | |
| return MockPerplexityResponse(content=content, citations=citations) |
🧰 Tools
🪛 Ruff (0.13.3)
22-22: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
In docker/tests/factories/perplexity_response_factory.py around lines 19 to 35,
the function signature uses a nullable default with citations: List[str] = None
which should be explicitly typed; change the parameter annotation to citations:
Optional[List[str]] = None (or citations: List[str] | None = None for Python
3.10+), and add the corresponding Optional import from typing (or update the
typing import line to include Optional / use PEP 604 union syntax and adjust
imports accordingly). Ensure no other runtime behavior changes—only update the
type annotation and imports.
- Disable Azure Functions workflow by renaming to .yml.disabled - Increase MAX_CONTENT_LENGTH default from 50,000 to 1,000,000 chars - Add validation for MAX_CONTENT_LENGTH environment variable - Keep # noqa: E402 directives (required for imports after load_dotenv) - Improve error logging in perplexity_client.py (use logger.exception) - Update all documentation to reflect new default value 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
docker/clients/perplexity_client.py (3)
56-59: Hardcoded max_results in system prompt doesn't reflect actual parameter.The system prompt mentions "Focus on returning {max_results} most relevant sources" but this is just guidance to the model. If
max_results=10is passed but the API returns 20 citations, line 79's slice[:max_results]correctly limits it—however, the prompt's mention might confuse reviewers into thinking the model controls this limit when it's actually enforced post-response.Consider clarifying the prompt or removing the mention:
"content": ( "You are a comprehensive research assistant. Provide detailed, " - "well-researched answers with clear structure and citations. " - f"Focus on returning {max_results} most relevant sources." + "well-researched answers with clear structure and citations." ),
74-79: Fragile citation type handling could raise AttributeError.The code attempts to handle citations as either strings or dicts with
.get("url", ""), but ifcitationis a dict-like object without agetmethod (e.g., a custom object), this will raiseAttributeError. Theisinstance(citation, str)check only guards the first branch.Apply this diff for more robust handling:
if hasattr(response, "citations") and response.citations: citation_urls = [ - citation if isinstance(citation, str) else citation.get("url", "") + ( + citation + if isinstance(citation, str) + else getattr(citation, "url", citation.get("url", "")) + if hasattr(citation, "get") + else str(citation) + ) for citation in response.citations if citation ][:max_results]Or simplify by wrapping in try-except:
if hasattr(response, "citations") and response.citations: - citation_urls = [ - citation if isinstance(citation, str) else citation.get("url", "") - for citation in response.citations - if citation - ][:max_results] + citation_urls = [] + for citation in response.citations[:max_results]: + if not citation: + continue + try: + url = citation if isinstance(citation, str) else citation.get("url", "") + if url: + citation_urls.append(url) + except (AttributeError, TypeError): + logger.warning(f"Skipping invalid citation format: {citation}")
89-93: Improve error handling based on static analysis hints.Ruff flags two issues:
- Line 89:
returnstatement could move to anelseblock for clarity (the success path returns inside theif, so the final return is only reached on empty response)- Line 92:
logging.exception()automatically includes the exception object, sostr(e)is redundantApply this diff:
logger.info( f"Perplexity deep research completed: {len(research_report)} chars, " f"{len(citation_urls)} citations, {token_count} tokens" ) + else: + logger.warning("Perplexity returned empty response") return research_report, citation_urls except Exception as e: - logger.exception(f"Error fetching Perplexity deep research: {str(e)}") + logger.exception("Error fetching Perplexity deep research") return "", []Based on coding guidelines
README.md (1)
108-108: Fix bare URL flagged by markdownlint.Markdown linters prefer URLs in link syntax rather than bare URLs for consistency and accessibility.
Apply this diff:
-| `PERPLEXITY_API_KEY` | *(none)* | Perplexity API key for deep research tool (optional, get at https://www.perplexity.ai/settings/api) | +| `PERPLEXITY_API_KEY` | *(none)* | Perplexity API key for deep research tool (optional, get at [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api)) |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md(1 hunks)docker/.env.example(2 hunks)docker/README.md(2 hunks)docker/clients/perplexity_client.py(1 hunks)docker/constants.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- docker/.env.example
- docker/README.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods
Files:
docker/constants.pydocker/clients/perplexity_client.py
docker/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All Python files in docker/ are included in coverage; increase tests instead of excluding files
Files:
docker/constants.pydocker/clients/perplexity_client.py
🪛 markdownlint-cli2 (0.18.1)
README.md
108-108: Bare URL used
(MD034, no-bare-urls)
🪛 Ruff (0.13.3)
docker/clients/perplexity_client.py
89-89: Consider moving this statement to an else block
(TRY300)
92-92: Use explicit conversion flag
Replace with conversion flag
(RUF010)
92-92: Redundant exception object included in logging.exception call
(TRY401)
🔇 Additional comments (2)
docker/clients/perplexity_client.py (1)
1-93: Verify PR scope: Is Perplexity integration part of PR #46?This entire file implements Perplexity deep research functionality, but PR #46's title and objectives focus solely on making
MAX_CONTENT_LENGTHconfigurable (increasing from 8,000 to 50,000 chars). The Perplexity integration appears to be an unrelated feature.If this is intentional, please update the PR description to document this additional scope. If not, consider moving Perplexity changes to a separate PR for clearer change tracking and review.
README.md (1)
108-108: Verify PR scope: Is Perplexity documentation part of PR #46?Lines 108 and 123-128 document the Perplexity API integration, but PR #46's title and objectives focus solely on making
MAX_CONTENT_LENGTHconfigurable. This appears to be unrelated scope creep.If Perplexity integration is intentional for this PR, please update the PR description. Otherwise, consider moving these documentation changes to a separate PR that includes the corresponding code changes.
Also applies to: 123-128
| try: | ||
| MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "1000000")) | ||
| except ValueError: | ||
| MAX_CONTENT_LENGTH = 1000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Default value contradicts PR objectives.
The PR title and objectives state increasing MAX_CONTENT_LENGTH from 8,000 to 50,000 characters, but this implementation uses 1,000,000 (1 million) as the default—20× higher than documented. This contradicts the PR summary, README.md, and docker/.env.example if they also claim 50,000.
Additionally, the current implementation has no validation:
- Negative or zero values are accepted silently
- Invalid values fall back without logging a warning
- No upper bound check to prevent memory issues
Apply this diff to align with PR objectives and add validation:
-try:
- MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "1000000"))
-except ValueError:
- MAX_CONTENT_LENGTH = 1000000
+try:
+ MAX_CONTENT_LENGTH = int(os.environ.get("MAX_CONTENT_LENGTH", "50000"))
+ if MAX_CONTENT_LENGTH <= 0:
+ logger.warning(
+ f"Invalid MAX_CONTENT_LENGTH={MAX_CONTENT_LENGTH}, using default 50000"
+ )
+ MAX_CONTENT_LENGTH = 50000
+except ValueError as e:
+ logger.warning(
+ f"Invalid MAX_CONTENT_LENGTH format: {e}, using default 50000"
+ )
+ MAX_CONTENT_LENGTH = 50000Note: You'll need to add import logging and logger = logging.getLogger(__name__) at the top of the file for the warning logs.
Committable suggestion skipped: line range outside the PR's diff.
| | `PORT` | `8000` | Server port | | ||
| | `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) | | ||
| | `LOG_DIR` | `/tmp` | Log file directory | | ||
| | `MAX_CONTENT_LENGTH` | `1000000` | Maximum characters to return per scraped article | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation contradicts PR objectives.
The documented default of 1,000,000 contradicts the PR title and objectives, which state increasing MAX_CONTENT_LENGTH from 8,000 to 50,000 characters. This same inconsistency appears in docker/constants.py.
Apply this diff:
-| `MAX_CONTENT_LENGTH` | `1000000` | Maximum characters to return per scraped article |
+| `MAX_CONTENT_LENGTH` | `50000` | Maximum characters to return per scraped article |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `MAX_CONTENT_LENGTH` | `1000000` | Maximum characters to return per scraped article | | |
| | `MAX_CONTENT_LENGTH` | `50000` | Maximum characters to return per scraped article | |
🤖 Prompt for AI Agents
In README.md around line 113 the documented default MAX_CONTENT_LENGTH is
1,000,000 which contradicts the PR objective to increase it from 8,000 to
50,000; update the README entry to show `50000` as the default and also update
docker/constants.py to set MAX_CONTENT_LENGTH = 50000 (and search the repo for
any other places documenting or hardcoding the old 1,000,000/8000 values to make
them consistent).
- Adjust test_truncates_content_exceeding_max_length to use content larger than 1M chars - Previous 100K char test no longer triggers truncation with new 1M default 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Summary
MAX_CONTENT_LENGTHfrom 8,000 to 50,000 characters (default)MAX_CONTENT_LENGTHconfigurable via.envfileMAX_CONTENT_LENGTHdocumentationRationale
The previous 8,000 character limit was too aggressive and cut off article content prematurely. With the new 50,000 character default:
Changes
Code Changes
docker/constants.py: Now readsMAX_CONTENT_LENGTHfrom environment with 50,000 defaultpyproject.toml: Removed unused dependencies (BeautifulSoup4, readability-lxml, html2text, lxml_html_clean)pyproject.toml: Updated mypy overrides to reflect current dependenciesConfiguration
docker/.env.example: AddedMAX_CONTENT_LENGTH=50000with documentationREADME.md: AddedMAX_CONTENT_LENGTHto environment variables tabledocker/README.md: AddedMAX_CONTENT_LENGTHto environment variables sectionTesting
Tested with teslarati.com article that was previously truncated:
All code formatted and linted successfully:
Test plan
make format- All files formatted correctlymake lint- No linting errorsmake testMAX_CONTENT_LENGTHvalues via.env🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Configuration
Documentation
Tests
Chores