-
Notifications
You must be signed in to change notification settings - Fork 570
Open
Description
Summary
When using smart_crawl_url with a sitemap.xml URL, the crawling process successfully fetches and processes all pages but fails to store any content in the Supabase database. Single page crawling works correctly.
Environment
- Docker Image: mcp/crawl4ai-rag (latest)
- Database: Supabase
- MCP Server: Latest version
Steps to Reproduce
- Start Docker container with proper Supabase configuration
- Call
smart_crawl_urlwith a sitemap.xml URL - Observe crawling logs show successful page processing
- Check Supabase database - no new content stored
Expected Behavior
- Sitemap crawling should store all crawled pages in Supabase database
- Should generate embeddings and store in
crawled_pagestable - Should update
sourcestable with crawl metadata
Actual Behavior
- ✅ Sitemap parsing works correctly
- ✅ Page fetching and scraping works (shows [COMPLETE] status)
- ❌ No database storage operations occur
- ❌ No embedding generation API calls
- ❌ No content appears in Supabase
Evidence
Single Page Crawling (WORKS):
[COMPLETE] ● https://example.com/page
POST /api.openai.com/v1/embeddings
POST /supabase.co/crawled_pages
Sitemap Crawling (BROKEN):
[COMPLETE] ● https://example.com/page1
[COMPLETE] ● https://example.com/page2
[COMPLETE] ● https://example.com/page3
Test Case
- Working:
crawl_single_page("https://python.langchain.com/docs/concepts/output_parsers/") - Broken:
smart_crawl_url("https://python.langchain.com/sitemap.xml")
Additional Context
The bug appears to be in the storage logic for sitemap crawling specifically. All other functionality works correctly, and the crawler can process hundreds of pages but fails to persist any of them.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels