-
Notifications
You must be signed in to change notification settings - Fork 1
test: Add integration test for Serper scrape functionality #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Verifies: - Serper scrape API client working correctly - Content scraper using Serper when API key available - Automatic fallback to Trafilatura when Serper unavailable All tests passing ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
WalkthroughAdds a standalone integration test script that exercises Serper-based scraping, content generation from a SearchResult, and a forced fallback to Trafilatura. Tests are gated by SERPER_API_KEY presence, print minimal previews, and aggregate pass/fail statuses via a main() entry point with process exit codes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User as CLI
participant Test as test_serper_scrape_integration.py
participant Env as Environment
participant Serper as Serper API
participant Trafilatura as Trafilatura
rect rgba(230,245,255,0.6)
note over User,Test: Orchestration via main()
User->>Test: python test_serper_scrape_integration.py
Test->>Test: run test_serper_scrape_client()
Test->>Env: read SERPER_API_KEY
alt Key present
Test->>Serper: scrape_webpage(url, key)
Serper-->>Test: content or error
Test->>Test: assert content length, preview
else Key missing
Test->>Test: skip test
end
end
rect rgba(235,255,235,0.6)
note over Test,Serper: Content from SearchResult
Test->>Test: run test_content_scraper_with_serper()
Test->>Env: read SERPER_API_KEY
alt Key present
Test->>Serper: scrape_search_result(result, key)
Serper-->>Test: formatted content
Test->>Test: assert non-empty, title/source checks
else Key missing
Test->>Test: skip test
end
end
rect rgba(255,240,230,0.6)
note over Test,Trafilatura: Forced fallback path
Test->>Test: run test_fallback_to_trafilatura()
Test->>Env: temporarily unset SERPER_API_KEY
Test->>Trafilatura: scrape_search_result(result) [fallback]
Trafilatura-->>Test: content
Test->>Env: restore SERPER_API_KEY
Test->>Test: assert non-empty, no "Error:"
end
Test-->>User: summary + exit code
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (2)
docker/test_serper_scrape_integration.py (2)
15-17: Remove unusednoqadirectives.Ruff reports these
noqa: E402directives are unused because E402 is not enabled in your configuration. You can safely remove them.Apply this diff:
-from clients.serper_client import scrape_webpage # noqa: E402 -from models.domain.search_result import SearchResult # noqa: E402 -from services.content_scraper import scrape_search_result # noqa: E402 +from clients.serper_client import scrape_webpage +from models.domain.search_result import SearchResult +from services.content_scraper import scrape_search_result
20-176: Consider using pytest for better test infrastructure.While the custom test runner works, pytest would provide:
- Built-in test discovery and execution
- Proper skip/xfail markers (
@pytest.mark.skipif)- Better assertion messages and failure reporting
- Fixtures for environment variable management
- Parallel test execution support
- Integration with CI/CD tools
This would eliminate the semantic confusion of returning
Truefor skipped tests and provide a more maintainable testing infrastructure.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docker/test_serper_scrape_integration.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Format Python code with Black (line-length=88, target Python 3.11)
Sort imports with isort using profile="black"
Prefer absolute imports (e.g., from docker.mcp_server import ...) over relative imports
Require type hints for all function signatures (mypy-enforced)
Use Google-style docstrings for public functions
Keep maximum nesting depth to 3 levels or less; refactor into helper methods when deeper
One class per file (applies to production code and test infrastructure)
Extract helper methods for each logical concept; prefer multiple small, single-responsibility methods
Files:
docker/test_serper_scrape_integration.py
docker/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
All Python files in docker/ are included in coverage; increase tests instead of excluding files
Files:
docker/test_serper_scrape_integration.py
🧬 Code graph analysis (1)
docker/test_serper_scrape_integration.py (2)
docker/clients/serper_client.py (1)
scrape_webpage(71-117)docker/services/content_scraper.py (1)
scrape_search_result(86-178)
🪛 Ruff (0.13.3)
docker/test_serper_scrape_integration.py
15-15: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
16-16: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
17-17: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
48-48: Consider moving this statement to an else block
(TRY300)
50-50: Do not catch blind exception: Exception
(BLE001)
51-51: Use explicit conversion flag
Replace with conversion flag
(RUF010)
94-94: Consider moving this statement to an else block
(TRY300)
96-96: Do not catch blind exception: Exception
(BLE001)
97-97: Use explicit conversion flag
Replace with conversion flag
(RUF010)
131-131: Consider moving this statement to an else block
(TRY300)
133-133: Do not catch blind exception: Exception
(BLE001)
134-134: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (1)
docker/test_serper_scrape_integration.py (1)
178-179: LGTM!The main entry point follows Python best practices correctly.
| def test_serper_scrape_client(): | ||
| """Test the Serper scrape client directly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing return type hint.
The coding guidelines require type hints for all function signatures. Add -> bool to the function signature.
As per coding guidelines.
Apply this diff:
-def test_serper_scrape_client():
+def test_serper_scrape_client() -> bool:
"""Test the Serper scrape client directly."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_serper_scrape_client(): | |
| """Test the Serper scrape client directly.""" | |
| def test_serper_scrape_client() -> bool: | |
| """Test the Serper scrape client directly.""" |
🤖 Prompt for AI Agents
In docker/test_serper_scrape_integration.py around lines 20 to 21, the test
function signature lacks a return type hint; update the function definition to
include the return annotation -> bool (i.e., def test_serper_scrape_client() ->
bool:) so it complies with the coding guidelines requiring type hints for all
function signatures.
| def test_content_scraper_with_serper(): | ||
| """Test the content scraper service with Serper integration.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing return type hint.
The coding guidelines require type hints for all function signatures. Add -> bool to the function signature.
As per coding guidelines.
Apply this diff:
-def test_content_scraper_with_serper():
+def test_content_scraper_with_serper() -> bool:
"""Test the content scraper service with Serper integration."""Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In docker/test_serper_scrape_integration.py around lines 55 to 56, the test
function definition lacks the required return type hint; update the function
signature for test_content_scraper_with_serper to include the return type ->
bool (e.g. def test_content_scraper_with_serper() -> bool:), keeping the body
unchanged and ensuring any test framework expectations still work.
| import traceback | ||
|
|
||
| traceback.print_exc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Move traceback import to module level.
The coding guidelines prefer absolute imports at the module level. Move the traceback import to the top of the file with other imports.
As per coding guidelines.
Add this import at the module level (after line 10):
import os
import sys
+import tracebackThen remove the import from inside the function:
except Exception as e:
print(f"❌ Content scraper failed: {str(e)}")
- import traceback
-
traceback.print_exc()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import traceback | |
| traceback.print_exc() | |
| # ── At the top of docker/test_serper_scrape_integration.py, after the existing imports ── | |
| import os | |
| import sys | |
| import traceback |
| import traceback | |
| traceback.print_exc() | |
| # ── In the exception handler (replacing lines 98–100) ── | |
| except Exception as e: | |
| print(f"❌ Content scraper failed: {str(e)}") | |
| traceback.print_exc() |
🤖 Prompt for AI Agents
In docker/test_serper_scrape_integration.py around lines 98 to 100, move the
local "import traceback" up to the module-level imports (place it after line 10
with the other imports) and remove the inline import inside the function so that
traceback is imported once at the top of the file and traceback.print_exc()
continues to work without an in-function import.
| def test_fallback_to_trafilatura(): | ||
| """Test that Trafilatura fallback works when Serper is not available.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing return type hint.
The coding guidelines require type hints for all function signatures. Add -> bool to the function signature.
As per coding guidelines.
Apply this diff:
-def test_fallback_to_trafilatura():
+def test_fallback_to_trafilatura() -> bool:
"""Test that Trafilatura fallback works when Serper is not available."""Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In docker/test_serper_scrape_integration.py around lines 104 to 105, the
test_fallback_to_trafilatura function signature is missing a return type hint;
update the definition to include the return type -> bool so it conforms to the
coding guidelines requiring type hints for all functions.
| def main(): | ||
| """Run all integration tests.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing return type hint.
The coding guidelines require type hints for all function signatures. Add -> int to the function signature.
As per coding guidelines.
Apply this diff:
-def main():
+def main() -> int:
"""Run all integration tests."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def main(): | |
| """Run all integration tests.""" | |
| def main() -> int: | |
| """Run all integration tests.""" |
🤖 Prompt for AI Agents
In docker/test_serper_scrape_integration.py around lines 142-143, the main()
function is missing a return type hint; update the signature to include "-> int"
(i.e., change "def main():" to "def main() -> int:") and ensure the function
returns an integer status code (e.g., return 0) at the end so the annotated type
is satisfied.
Verifies:
All tests passing ✅
🤖 Generated with Claude Code
Summary by CodeRabbit