Skip to content

feat(v2): migrate Go SDK from Firecrawl API v1 to v2#1

Open
ArmandoHerra wants to merge 33 commits intomainfrom
feat/sdk-v2-api-migration
Open

feat(v2): migrate Go SDK from Firecrawl API v1 to v2#1
ArmandoHerra wants to merge 33 commits intomainfrom
feat/sdk-v2-api-migration

Conversation

@ArmandoHerra
Copy link
Owner

@ArmandoHerra ArmandoHerra commented Mar 15, 2026

Summary

Complete migration of the firecrawl-go SDK from Firecrawl API v1 to v2 with full Python SDK feature parity. This PR delivers 31 commits across 5 implementation phases — all existing endpoints migrated to v2, 3 new endpoint groups implemented (Search, Batch Scrape, Extract), typed error system, security hardening, 167 unit tests, 32 E2E tests, and comprehensive documentation.

Migration Phase — Complete (11/11 specs)

Phase 1: Foundation

  • MIG-01 — Fix 6 foundation bugs (retry counter, defer leak, body reuse, error ordering)
  • MIG-02 — Split monolithic 838-line firecrawl.go into 14 modular files
  • MIG-03 — CI/CD pipeline: Makefile, golangci-lint v2, GitHub Actions (Go 1.23/1.24/1.25), Dependabot

Phase 2: Core Migration (v1 to v2)

  • MIG-04 — 31+ v2 type definitions (LocationConfig, WebhookConfig, ActionConfig, MapLink, SearchParams, BatchScrapeParams, ExtractParams, PaginationConfig, etc.)
  • MIG-05 — context.Context on all public methods with context-aware polling
  • MIG-06 — ScrapeURL to /v2/scrape with typed struct marshaling
  • MIG-07 — CrawlURL/AsyncCrawlURL to /v2/crawl with shared buildCrawlRequest helper
  • MIG-08 — monitorJobStatus updated for v2 statuses (scraping/completed/failed)
  • MIG-09 — MapURL to /v2/map with MapLink response objects
  • MIG-10/11 — Verification checkpoints: all /v2 paths, zero /v1 references, all struct marshaling

Improvement Phase — Complete (13/16 specs, 3 P2 deferred)

Phase 3: Testing and Security

  • IMP-04 — Typed error system: APIError struct, 8 sentinel errors, errors.Is/errors.As support
  • IMP-05 — Security hardening: SSRF-preventing URL validation, UUID job ID sanitization, HTTPS warning, APIKey unexported with accessor
  • IMP-06 — Unit test foundation: newMockServer, respondJSON, ptr helpers
  • IMP-07 — 97 unit tests for all existing methods (scrape, crawl, map, helpers, errors, types, security)
  • IMP-15 — HTTP client improvements: SDKVersion, User-Agent header, NewFirecrawlAppWithOptions, WithTimeout/WithTransport/WithUserAgent

Phase 4: New v2 Endpoints

  • IMP-01 — Search endpoint (POST /v2/search) with typed SearchResponse
  • IMP-02 — Batch Scrape endpoints: BatchScrapeURLs (sync), AsyncBatchScrapeURLs, CheckBatchScrapeStatus with pagination
  • IMP-03 — Extract endpoints: Extract (sync), AsyncExtract, CheckExtractStatus with "processing" status polling
  • IMP-08 — Unit tests for all new endpoints (155+ total)
  • IMP-10 — PaginationConfig wired into CheckCrawlStatus/CheckBatchScrapeStatus, GetCrawlStatusPage/GetBatchScrapeStatusPage public methods

Phase 5: Documentation and E2E

  • IMP-09 — Integration tests modernized for v2, 9 new E2E tests (32 total)
  • IMP-11 — Comprehensive README rewrite, CONTRIBUTING.md
  • IMP-12 — CHANGELOG.md (Keep a Changelog format), .env.example update

Phase 6: Advanced Features (P2 — deferred)

  • IMP-13 — Browser session management
  • IMP-14 — Watcher/WebSocket monitoring
  • IMP-16 — Agent endpoint

Breaking Changes

  • All public methods require context.Context as first parameter
  • CrawlParams: MaxDepth to MaxDiscoveryDepth, AllowBackwardLinks to CrawlEntireDomain, IgnoreSitemap to Sitemap enum
  • CrawlParams.Webhook: *string to *WebhookConfig
  • MapResponse.Links: []string to []MapLink
  • ScrapeParams.ParsePDF removed (use Parsers []ParserConfig)
  • FirecrawlApp.APIKey unexported (use APIKey() accessor)
  • Search signature: (query, *any) to (ctx, query, *SearchParams) returning *SearchResponse
  • Minimum Go version: 1.22 to 1.23

Public API (14 methods)

Method Endpoint Status
ScrapeURL POST /v2/scrape Migrated
CrawlURL POST /v2/crawl (sync) Migrated
AsyncCrawlURL POST /v2/crawl (async) Migrated
CheckCrawlStatus GET /v2/crawl/{id} Migrated + PaginationConfig
CancelCrawlJob DELETE /v2/crawl/{id} Migrated
MapURL POST /v2/map Migrated
Search POST /v2/search NEW
BatchScrapeURLs POST /v2/batch/scrape (sync) NEW
AsyncBatchScrapeURLs POST /v2/batch/scrape (async) NEW
CheckBatchScrapeStatus GET /v2/batch/scrape/{id} NEW + PaginationConfig
Extract POST /v2/extract (sync) NEW
AsyncExtract POST /v2/extract (async) NEW
CheckExtractStatus GET /v2/extract/{id} NEW
GetCrawlStatusPage / GetBatchScrapeStatusPage GET (next URL) NEW

Test Coverage

Category Count
Unit tests 167
E2E tests 32
Total test functions 199
Lint issues 0
Go versions tested 1.23, 1.24, 1.25

File Structure (27 Go files)

firecrawl-go/
├── client.go / client_options.go  # App struct, constructors, options
├── types.go                       # 31+ v2 type definitions
├── scrape.go / crawl.go / map.go  # Migrated endpoints
├── search.go / batch.go / extract.go  # New v2 endpoints
├── errors.go                      # APIError + 8 sentinel errors
├── helpers.go / options.go        # HTTP transport, retry, polling
├── security.go                    # URL validation, ID sanitization
├── *_test.go (13 files)           # 167 unit + 32 E2E tests
├── Makefile / .golangci.yml       # Dev tooling
├── .github/workflows/ci.yml      # CI pipeline
├── README.md / CONTRIBUTING.md / CHANGELOG.md / LICENSE

- Fix monitorJobStatus retry counter starting at threshold (3→0)
- Fix defer resp.Body.Close() connection leak in retry loop
- Fix request body consumed on first attempt, retries sending empty body
- Fix ScrapeURL checking Success before unmarshal error
- Fix ScrapeOptions gate only checking Formats field
- Remove dead commented-out v0 extractor code
- Split 838-line firecrawl.go into 9 domain-specific files
- client.go: struct, constructor, headers
- types.go: all request/response type definitions
- scrape.go, crawl.go, map.go, search.go: endpoint methods
- errors.go, helpers.go, options.go: internal utilities
- Zero logic changes — pure structural refactor
- Add Makefile with build, test, lint, fmt, vet, coverage, and check targets
- Add .golangci.yml with errcheck, govet, bodyclose, noctx, gosec linters
- Add GitHub Actions CI workflow (lint + test matrix Go 1.22/1.23 + integration)
- Add Dependabot config for gomod and github-actions ecosystems
- Add .editorconfig for consistent editor settings
- Delete legacy firecrawl_test.go_V0
@ArmandoHerra ArmandoHerra self-assigned this Mar 15, 2026
@ArmandoHerra ArmandoHerra added the enhancement New feature or request label Mar 15, 2026
- Add //go:build integration tag to gate E2E tests behind -tags=integration
- Replace init() with TestMain for graceful skip when .env is missing
- Fix gofumpt formatting in crawl.go
- Use http.NewRequestWithContext to satisfy noctx linter
- Use errors.New for dynamic format string to satisfy staticcheck SA1006
- Disable fieldalignment in govet config (structs rewritten in MIG-04)
- Skip 80% coverage gate when coverage is 0.0% (no test files ran)
- Threshold activates automatically once unit tests are added
…-1.25

- Bump actions/checkout v4 → v5, actions/setup-go v5 → v6
- Bump golangci-lint-action v6 → v7
- Add Go 1.24 and 1.25 to test matrix
- Use Go 1.25 for lint and integration jobs
…rom matrix

- Add version: "2" to .golangci.yml for golangci-lint v2.x compatibility
- Move linters-settings under linters.settings per v2 schema
- Drop Go 1.22 from test matrix (EOL, keep 1.23-1.25)
- golangci-lint v2 treats gofumpt as a formatter, not a linter
- Move from linters.enable to formatters.enable per v2 schema
- Check resp.Body.Close() return values to satisfy errcheck
- Refactor monitorJobStatus status chain to switch statement (QF1003)
…names

- Rewrite types.go with 31 v2 type definitions (ScrapeParams, CrawlParams,
  MapParams, SearchParams, BatchScrapeParams, ExtractParams, WebhookConfig,
  LocationConfig, ActionConfig, ParserConfig, MapLink, PaginationConfig, etc.)
- Rename CrawlParams fields: MaxDepth→MaxDiscoveryDepth,
  AllowBackwardLinks→CrawlEntireDomain, IgnoreSitemap→Sitemap enum,
  Webhook *string→*WebhookConfig
- Change MapResponse.Links from []string to []MapLink
- Remove ParsePDF from ScrapeParams, replace with Parsers []ParserConfig
- Add v2 scrape options: Mobile, Location, Actions, Proxy, BlockAds, etc.
- Bump go.mod minimum Go version to 1.23

BREAKING CHANGE: CrawlParams, MapParams, ScrapeParams, and MapResponse have
renamed/removed/added fields per Firecrawl API v2.
…lpers

- Add ctx context.Context as first parameter to all 7 public methods
- Add ctx to makeRequest and monitorJobStatus internal helpers
- Use http.NewRequestWithContext(ctx, ...) for request creation
- Replace time.Sleep with context-aware select in polling loop
- Check ctx.Err() at loop boundaries for fast cancellation
- Update integration tests with context.Background()

BREAKING CHANGE: All public methods now require context.Context as first
parameter. Callers must pass context.Background() or a derived context.
- Change ScrapeURL endpoint from /v1/scrape to /v2/scrape
- Replace map[string]any body with typed scrapeRequest struct
- Refactor makeRequest to accept pre-marshaled []byte instead of map[string]any
- Update all makeRequest callers (crawl, map) to marshal at call site
…aling

- Change CrawlURL, AsyncCrawlURL, CheckCrawlStatus, CancelCrawlJob to /v2/crawl
- Update monitorJobStatus polling path to /v2/crawl/{id}
- Replace map[string]any body with typed crawlRequest struct
- Extract shared buildCrawlRequest helper to eliminate duplication
- Replace v1 polling statuses (active, paused, pending, queued, waiting)
  with single v2 "scraping" status
- Add explicit "failed" case for v2 failure handling
- Change default to "unknown crawl status" for unexpected values
- Change MapURL endpoint from /v1/map to /v2/map
- Replace map[string]any body with typed mapRequest struct
- Response uses MapLink objects per v2 API (from MIG-04)
…t structure

- Replace v1 method signatures with v2 (context.Context, renamed fields)
- Add project structure, Makefile targets, CI pipeline docs
- Add configuration, development setup, and testing sections
- Update usage examples for ScrapeURL, CrawlURL, MapURL with v2 params
- Define 8 sentinel errors for programmatic error handling
- Add APIError struct with StatusCode, Message, Action fields
- Implement Unwrap() for errors.Is/errors.As support
- Update handleError to return *APIError wrapping sentinels
- Use ErrNoAPIKey in NewFirecrawlApp constructor
…IKey

- Add security.go with SSRF-preventing pagination URL validation
- Add UUID-format job ID validation to prevent path injection
- Unexport APIKey field to apiKey, add APIKey() accessor method
- Add String() method with key redaction for safe logging
- Add HTTPS enforcement warning for non-localhost HTTP URLs
- Wire validations into CheckCrawlStatus, CancelCrawlJob, monitorJobStatus
- Add 14 unit tests for all security functions

BREAKING CHANGE: FirecrawlApp.APIKey is now unexported. Use app.APIKey() instead.
- Create testhelpers_test.go with newMockServer, respondJSON, ptr helpers
- Add client_test.go with constructor and env fallback tests
- Add errors_test.go with handleError status code and APIError tests
- Add scrape_test.go with ScrapeURL success, params, and error tests
…tests)

- Add crawl_test.go with 11 tests for CrawlURL, AsyncCrawlURL, Check/Cancel
- Add map_test.go with 5 tests for MapURL success, params, and errors
- Add helpers_test.go with 4 tests for makeRequest retry and context
- Add types_test.go with 8 tests for StringOrStringSlice unmarshaling
- Add search_test.go with stub verification test
- Extend scrape_test.go with 6 tests (all params, errors, context cancel)
- Extend client_test.go with 4 tests (env fallback, timeout config)
…sion

- Add SDKVersion constant and User-Agent header on all requests
- Add ClientOption functional options (WithTimeout, WithTransport, etc.)
- Add NewFirecrawlAppWithOptions constructor with configurable transport
- Clone DefaultTransport for connection pool tuning
- Add 13 unit tests for options and User-Agent behavior
- Replace stub with full POST /v2/search implementation
- Define searchRequest struct with all v2 search params
- Return typed *SearchResponse with web/images/news results
- Add 6 unit tests covering success, params, and error cases
- Add BatchScrapeURLs (sync with polling), AsyncBatchScrapeURLs, CheckBatchScrapeStatus
- Add monitorBatchScrapeStatus internal poller with context-aware polling
- Include validateJobID and validatePaginationURL security checks
- Add 21 unit tests covering all batch scrape operations
- Add AsyncExtract, Extract (sync with polling), CheckExtractStatus
- Add monitorExtractStatus with "processing" status polling
- Include validateJobID security check on status endpoints
- Add 17 unit tests covering all extract operations
- Add TestSearch_RateLimited for 429 sentinel error handling
- Add TestSearch_ContextCancelled for pre-cancelled context
- IMP-01/02/03 already shipped 44 tests exceeding the 34-test target
- Wire PaginationConfig into CheckCrawlStatus and CheckBatchScrapeStatus
- Add GetCrawlStatusPage and GetBatchScrapeStatusPage public methods
- Implement auto-pagination with MaxPages, MaxResults, MaxWaitTime limits
- Validate pagination URLs against API host (SSRF prevention)
- Add 12 unit tests for pagination behavior
…E tests

- Fix v1 field names (MaxDepth, AllowBackwardLinks, IgnoreSitemap) in E2E tests
- Update Map tests to use MapLink response objects
- Add 9 new E2E tests for Search, BatchScrape, Extract, and PaginationConfig
- Total: 32 E2E tests (was 23), all async-only for fast CI
- Document all 14 public methods across 6 endpoint groups
- Add Search, Batch Scrape, Extract usage examples
- Add Error Handling section (APIError, sentinel errors, errors.Is/errors.As)
- Add Client Options section (NewFirecrawlAppWithOptions, WithTimeout, etc.)
- Add PaginationConfig and Security sections
- Create CONTRIBUTING.md with setup, workflow, and style guide
…files

- Create CHANGELOG.md with proper Added/Changed/Fixed/Removed sections
- Delete informal changelog.md (replaced by CHANGELOG.md)
- Update README project structure reference
- Update .env.example with runtime and test variable documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant