Skip to content

fix(firecrawl): align self-host template with v2.8.0#769

Open
Manan-Santoki wants to merge 7 commits intoDokploy:canaryfrom
Manan-Santoki:fix/firecrawl-template-v2-8-0
Open

fix(firecrawl): align self-host template with v2.8.0#769
Manan-Santoki wants to merge 7 commits intoDokploy:canaryfrom
Manan-Santoki:fix/firecrawl-template-v2-8-0

Conversation

@Manan-Santoki
Copy link
Copy Markdown
Contributor

@Manan-Santoki Manan-Santoki commented Apr 3, 2026

Summary

  • align the Firecrawl Dokploy template with the current upstream self-hosted architecture
  • pin Firecrawl source builds to v2.8.0 and switch startup to the upstream harness entrypoint
  • update the template env model for PostgreSQL and RabbitMQ, and refresh the meta.json entry

What changed

  • replaced the stale split-process Firecrawl deployment with the current harness-driven api service
  • added the missing rabbitmq dependency and kept redis, playwright-service, and nuq-postgres
  • changed Firecrawl-owned services from latest images to remote source builds pinned to v2.8.0
  • updated Dokploy template variables to use POSTGRES_*, NUQ_RABBITMQ_URL, and current optional Firecrawl envs
  • pinned the Firecrawl metadata version to v2.8.0
  • running the required meta processing also removed a pre-existing duplicate strapi entry from meta.json

Root cause

The existing Firecrawl template still launched the old runtime layout using node --import ./dist/src/otel.js .... Upstream Firecrawl now starts through node dist/src/harness.js --start-docker and expects RabbitMQ plus harness-managed workers, so the old Dokploy template crashed with ERR_MODULE_NOT_FOUND for /app/dist/src/otel.js.

Impact

Dokploy users should get a Firecrawl deployment shape that matches current upstream self-hosting instead of the broken legacy image startup path. The template is also now version-pinned instead of following latest.

Validation

  • node dedupe-and-sort-meta.js
  • docker compose -f blueprints/firecrawl/docker-compose.yml config
  • npm run validate-docker-compose -- --file ../blueprints/firecrawl/docker-compose.yml (from build-scripts/)
  • npm run validate-template -- --dir ../blueprints/firecrawl (from build-scripts/)
  • git diff --check

Greptile Summary

This PR aligns the Firecrawl blueprint with the v2.8.0 upstream self-hosted architecture, replacing the broken legacy split-process startup with the harness-driven layout, adding RabbitMQ, and pinning all source builds to the v2.8.0 git tag.

  • P1: The default NUQ_RABBITMQ_URL (amqp://rabbitmq:5672) omits credentials. RabbitMQ's built-in guest account is restricted to loopback connections, so the api container will receive ACCESS_REFUSED and fail to start. The rabbitmq service needs a configured user and the URL needs matching credentials.

Confidence Score: 3/5

Not safe to merge — the RabbitMQ URL missing credentials will prevent the api service from connecting to RabbitMQ on startup.

One P1 defect: the default AMQP URL has no credentials and RabbitMQ's guest account is blocked from non-localhost connections, meaning the api container will fail to start in a default Dokploy deployment. The rest of the changes are structurally correct and well-aligned with the upstream v2.8.0 layout.

blueprints/firecrawl/docker-compose.yml (rabbitmq service and NUQ_RABBITMQ_URL default) and blueprints/firecrawl/template.toml (NUQ_RABBITMQ_URL default on line 67)

Reviews (1): Last reviewed commit: "fix(firecrawl): default optional env var..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

built with Refined Cloudflare Pages Action

⚡ Cloudflare Pages Deployment

Name Status Preview Last Commit
templates ✅ Ready (View Log) Visit Preview d486003

@Manan-Santoki Manan-Santoki marked this pull request as ready for review April 4, 2026 00:21
Copilot AI review requested due to automatic review settings April 4, 2026 00:21
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 4, 2026
@Manan-Santoki
Copy link
Copy Markdown
Contributor Author

image image image

Tested on my Dokploy instance and the updated template works as expected.

Verified:

  • optional env vars are imported as empty values instead of literal placeholders
  • the template imports cleanly
  • deployment completes successfully
  • the app starts and responds normally

Attached screenshots from my test instance for reference. Let me know if you want me to verify any other edge cases.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the firecrawl Dokploy blueprint to match upstream Firecrawl v2.8.0’s current self-hosted architecture (harness-driven API + required dependencies), and refreshes the associated metadata entry.

Changes:

  • Replaces the legacy split-process Firecrawl services with a single harness-driven api service pinned to v2.8.0.
  • Adds RabbitMQ as a required dependency and updates env wiring to POSTGRES_* and NUQ_RABBITMQ_URL.
  • Updates meta.json Firecrawl version/description and removes a pre-existing duplicate strapi entry during meta processing.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
meta.json Pins Firecrawl metadata to v2.8.0, refreshes description, and removes a duplicate strapi entry.
blueprints/firecrawl/template.toml Updates Dokploy variables/env model for the new upstream service layout (Postgres, RabbitMQ, optional provider/tuning envs).
blueprints/firecrawl/docker-compose.yml Switches to upstream v2.8.0 build contexts and harness entrypoint; adds RabbitMQ + volumes and updates env defaults.

POSTGRES_USER: ${POSTGRES_USER:-postgres}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-firecrawl}
POSTGRES_DB: ${POSTGRES_DB:-postgres}
NUQ_RABBITMQ_URL: ${NUQ_RABBITMQ_URL:-amqp://rabbitmq:5672}
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NUQ_RABBITMQ_URL defaults to amqp://rabbitmq:5672 (no credentials). The RabbitMQ official image’s default guest/guest user is restricted to localhost by default, so the API container typically won’t be able to authenticate over the Docker network and Firecrawl/NuQ will fail to connect. Define a non-guest RabbitMQ user/password and include them in NUQ_RABBITMQ_URL (or build the URL from RABBITMQ_DEFAULT_USER/RABBITMQ_DEFAULT_PASS).

Suggested change
NUQ_RABBITMQ_URL: ${NUQ_RABBITMQ_URL:-amqp://rabbitmq:5672}
RABBITMQ_DEFAULT_USER: ${RABBITMQ_DEFAULT_USER:-firecrawl}
RABBITMQ_DEFAULT_PASS: ${RABBITMQ_DEFAULT_PASS:-firecrawl}
NUQ_RABBITMQ_URL: ${NUQ_RABBITMQ_URL:-amqp://${RABBITMQ_DEFAULT_USER:-firecrawl}:${RABBITMQ_DEFAULT_PASS:-firecrawl}@rabbitmq:5672}

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +108
rabbitmq:
image: rabbitmq:3.13-management
restart: unless-stopped
command: rabbitmq-server
volumes:
- rabbitmq_data:/var/lib/rabbitmq
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rabbitmq service doesn’t configure RABBITMQ_DEFAULT_USER / RABBITMQ_DEFAULT_PASS, which means it will fall back to the default guest account (localhost-only by default in the official image). Add explicit credentials (and optionally a vhost) so other containers can connect reliably, and align them with NUQ_RABBITMQ_URL.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +68
"POSTGRES_HOST=nuq-postgres",
"POSTGRES_PORT=5432",
"POSTGRES_USER=${postgres_user}",
"POSTGRES_PASSWORD=${postgres_password}",
"POSTGRES_DB=${postgres_db}",
"REDIS_URL=redis://redis:6379",
"REDIS_RATE_LIMIT_URL=redis://redis:6379",
"NUQ_RABBITMQ_URL=amqp://rabbitmq:5672",
"PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape",
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template.toml hardcodes NUQ_RABBITMQ_URL=amqp://rabbitmq:5672 without credentials. With the default RabbitMQ image settings, this will usually attempt guest/guest and fail from non-localhost clients. Add template variables for RabbitMQ user/password (or generate a secret), set RABBITMQ_DEFAULT_USER/RABBITMQ_DEFAULT_PASS, and update NUQ_RABBITMQ_URL to include them.

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +113
rabbitmq:
image: rabbitmq:3.13-management
restart: unless-stopped
command: rabbitmq-server
volumes:
- rabbitmq_data:/var/lib/rabbitmq
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "-q", "check_running"]
interval: 5s
timeout: 5s
retries: 3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RabbitMQ guest user blocked from non-localhost connections

The default NUQ_RABBITMQ_URL (amqp://rabbitmq:5672) carries no credentials, so the AMQP client falls back to guest/guest. RabbitMQ 3.x blocks the built-in guest account from any non-loopback address by default — a container-to-container connection is never loopback, so the api service will receive ACCESS_REFUSED on startup.

Either expose a dedicated user via the rabbitmq service and include credentials in the URL:

rabbitmq:
  image: rabbitmq:3.13-management
  restart: unless-stopped
  environment:
    RABBITMQ_DEFAULT_USER: ${RABBITMQ_DEFAULT_USER:-firecrawl}
    RABBITMQ_DEFAULT_PASS: ${RABBITMQ_DEFAULT_PASS:-firecrawl}
  ...

And update the default URL:

NUQ_RABBITMQ_URL: ${NUQ_RABBITMQ_URL:-amqp://firecrawl:firecrawl@rabbitmq:5672}

The same change is needed in template.toml line 67.

BLOCK_MEDIA: ${BLOCK_MEDIA:-}
NO_PROXY: ${NO_PROXY:-localhost,127.0.0.1,redis,nuq-postgres,playwright-service,host.docker.internal}

ALLOW_LOCAL_WEBHOOKS: ${ALLOW_LOCAL_WEBHOOKS:-false}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ALLOW_LOCAL_WEBHOOKS does not belong in playwright-service

ALLOW_LOCAL_WEBHOOKS is a Firecrawl API-level flag that guards outbound webhook delivery. The playwright browser service has no webhook logic and does not read this variable. Its presence here is harmless but misleading — it creates the impression that removing it from the api service env block would still be covered, which it would not be.

Suggested change
ALLOW_LOCAL_WEBHOOKS: ${ALLOW_LOCAL_WEBHOOKS:-false}
BLOCK_MEDIA: ${BLOCK_MEDIA:-false}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants