Skip to content

Can't start a basic crawl #25

@paulraben

Description

@paulraben

Bug Details

What operation were you trying to use?

  • Search
  • Map URLs
  • Scrape URL
  • Crawl Website
  • Get Crawl Status
  • Extract Data
  • Get Extract Status
  • Something else

What happened?

Two issues:

  1. Initial API validation error: First 2-3 crawl attempts fail with scrapeOptions.formats validation error (expected array, received object), even though the config only includes default empty headers. After retries, the crawl starts.

  2. Crawl never completes: Once started, the crawl with limit: 5 stays in "running" indefinitely. Status never reaches "completed" and must be manually stopped in the Firecrawl dashboard. No pages are returned.

What did you expect to happen?

  • Crawl should complete within a reasonable time (limit: 5 pages)
  • Should return markdown content for each crawled page
  • Should respect the prompt-generated paths and excludePaths configuration
  • Status should change from "running" to "completed"

Error Message (if any)

{
  "success": false,
  "code": "BAD_REQUEST",
  "error": "Bad Request",
  "details": [
    {
      "code": "invalid_type",
      "expected": "array",
      "received": "object",
      "path": ["scrapeOptions", "formats"],
      "message": "Expected array, received object"
    },
    {
      "code": "unrecognized_keys",
      "keys": ["formats"],
      "path": [],
      "message": "Unrecognized key in body -- please review the v2 API documentation for request body changes"
    }
  ]
}

Environment

n8n Version

Node Version

@mendable/n8n-nodes-firecrawl v1

Configuration Used

{
  "operation": "crawl",
  "url": "={{ $json.companyWebsite }}",
  "prompt": "Only extract content related to recent developments at the company that can be used as a point of relevance in an outreach message. Do not extract generic information that is not of current concern to the company. It can be things mentioned on the home page, blog, news, press, about, why etc.",
  "limit": 5,
  "delay": 1000,
  "maxConcurrency": null,
  "excludePaths": {
    "items": [
      {
        "path": "data/*"
      }
    ]
  },
  "crawlOptions": {
    "allowSubdomains": true
  },
  "scrapeOptions": {
    "options": {
      "headers": {}
    }
  },
  "requestOptions": {
    "batching": {
      "batch": {
        "batchSize": 1,
        "batchInterval": 3000
      }
    }
  }
}

Note: The scrapeOptions only contains default empty headers, yet the error suggests formats is being sent incorrectly by the node.

Additional Context

  • The error suggests the node may be adding formats to scrapeOptions automatically, or there's a mismatch between what the node sends and what the v2 API expects
  • The crawl job is created successfully (returns job ID), but status polling shows it stays in "running" state indefinitely
  • This is consistent behavior - happens every time, not intermittent
  • No workarounds found - crawl must be manually stopped in Firecrawl dashboard
  • Node has retryOnFail: true and waitBetweenTries: 5000 configured

Main blocker: The crawl starts but never completes, even for small limits (5 pages), making the node unusable for crawl operations.


Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions