Skip to content

Sitemap Crawling Doesn't Store Content in Database #58

@Haqbani

Description

@Haqbani

Summary

When using smart_crawl_url with a sitemap.xml URL, the crawling process successfully fetches and processes all pages but fails to store any content in the Supabase database. Single page crawling works correctly.

Environment

  • Docker Image: mcp/crawl4ai-rag (latest)
  • Database: Supabase
  • MCP Server: Latest version

Steps to Reproduce

  1. Start Docker container with proper Supabase configuration
  2. Call smart_crawl_url with a sitemap.xml URL
  3. Observe crawling logs show successful page processing
  4. Check Supabase database - no new content stored

Expected Behavior

  • Sitemap crawling should store all crawled pages in Supabase database
  • Should generate embeddings and store in crawled_pages table
  • Should update sources table with crawl metadata

Actual Behavior

  • ✅ Sitemap parsing works correctly
  • ✅ Page fetching and scraping works (shows [COMPLETE] status)
  • ❌ No database storage operations occur
  • ❌ No embedding generation API calls
  • ❌ No content appears in Supabase

Evidence

Single Page Crawling (WORKS):

[COMPLETE] ● https://example.com/page
POST /api.openai.com/v1/embeddings
POST /supabase.co/crawled_pages

Sitemap Crawling (BROKEN):

[COMPLETE] ● https://example.com/page1
[COMPLETE] ● https://example.com/page2
[COMPLETE] ● https://example.com/page3

Test Case

  • Working: crawl_single_page("https://python.langchain.com/docs/concepts/output_parsers/")
  • Broken: smart_crawl_url("https://python.langchain.com/sitemap.xml")

Additional Context

The bug appears to be in the storage logic for sitemap crawling specifically. All other functionality works correctly, and the crawler can process hundreds of pages but fails to persist any of them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions