Skip to content

Commit f5ee579

Browse files
dependabot[bot]github-actions[bot]ntindle
authored
chore(backend/deps): Bump firecrawl-py from 2.16.3 to 4.3.1 in /autogpt_platform/backend (#10809)
Bumps [firecrawl-py](https://github.com/firecrawl/firecrawl) from 2.16.3 to 4.3.1. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/firecrawl/firecrawl/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=firecrawl-py&package-manager=pip&previous-version=2.16.3&new-version=4.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Upgrade firecrawl-py to v4.3.6 and refactor firecrawl blocks to new v4 API, formats handling, method names, and response fields. > > - **Dependencies** > - Bump `firecrawl-py` from `2.16.3` to `4.3.6` (adds `httpx`, updates `pydantic>=2`). > - **Firecrawl API migration** > - Centralize `ScrapeFormat` in `backend/blocks/firecrawl/_api.py`. > - Add `_format_utils.convert_to_format_options` to map `ScrapeFormat` (incl. `screenshot@fullPage`) to v4 `FormatOption`/`ScreenshotFormat`. > - Switch to v4 types (`firecrawl.v2.types.ScrapeOptions`); adopt snake_case fields (`only_main_content`, `max_age`, `wait_for`). > - Rename methods: `crawl_url` → `crawl`, `scrape_url` → `scrape`, `map_url` → `map`. > - Normalize response attributes: `rawHtml` → `raw_html`, `changeTracking` → `change_tracking`. > - **Blocks** > - `crawl.py`, `scrape.py`, `search.py`: use new formats conversion and updated options/fields; adjust iteration over results (`search`: iterate `web` when present). > - `map.py`: return both `links` and detailed `results` (url/title/description) and update output schema accordingly. > - **Project files** > - Update `pyproject.toml` and `poetry.lock` for new dependency versions. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit d872f2e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: Nicholas Tindle <[email protected]>
1 parent 57a06f7 commit f5ee579

File tree

9 files changed

+93
-73
lines changed

9 files changed

+93
-73
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from enum import Enum
2+
3+
4+
class ScrapeFormat(Enum):
5+
MARKDOWN = "markdown"
6+
HTML = "html"
7+
RAW_HTML = "rawHtml"
8+
LINKS = "links"
9+
SCREENSHOT = "screenshot"
10+
SCREENSHOT_FULL_PAGE = "screenshot@fullPage"
11+
JSON = "json"
12+
CHANGE_TRACKING = "changeTracking"
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
"""Utility functions for converting between our ScrapeFormat enum and firecrawl FormatOption types."""
2+
3+
from typing import List
4+
5+
from firecrawl.v2.types import FormatOption, ScreenshotFormat
6+
7+
from backend.blocks.firecrawl._api import ScrapeFormat
8+
9+
10+
def convert_to_format_options(
11+
formats: List[ScrapeFormat],
12+
) -> List[FormatOption]:
13+
"""Convert our ScrapeFormat enum values to firecrawl FormatOption types.
14+
15+
Handles special cases like screenshot@fullPage which needs to be converted
16+
to a ScreenshotFormat object.
17+
"""
18+
result: List[FormatOption] = []
19+
20+
for format_enum in formats:
21+
if format_enum.value == "screenshot@fullPage":
22+
# Special case: convert to ScreenshotFormat with full_page=True
23+
result.append(ScreenshotFormat(type="screenshot", full_page=True))
24+
else:
25+
# Regular string literals
26+
result.append(format_enum.value)
27+
28+
return result

autogpt_platform/backend/backend/blocks/firecrawl/crawl.py

Lines changed: 11 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
from enum import Enum
21
from typing import Any
32

4-
from firecrawl import FirecrawlApp, ScrapeOptions
3+
from firecrawl import FirecrawlApp
4+
from firecrawl.v2.types import ScrapeOptions
55

6+
from backend.blocks.firecrawl._api import ScrapeFormat
67
from backend.sdk import (
78
APIKeyCredentials,
89
Block,
@@ -14,21 +15,10 @@
1415
)
1516

1617
from ._config import firecrawl
17-
18-
19-
class ScrapeFormat(Enum):
20-
MARKDOWN = "markdown"
21-
HTML = "html"
22-
RAW_HTML = "rawHtml"
23-
LINKS = "links"
24-
SCREENSHOT = "screenshot"
25-
SCREENSHOT_FULL_PAGE = "screenshot@fullPage"
26-
JSON = "json"
27-
CHANGE_TRACKING = "changeTracking"
18+
from ._format_utils import convert_to_format_options
2819

2920

3021
class FirecrawlCrawlBlock(Block):
31-
3222
class Input(BlockSchema):
3323
credentials: CredentialsMetaInput = firecrawl.credentials_field()
3424
url: str = SchemaField(description="The URL to crawl")
@@ -78,18 +68,17 @@ def __init__(self):
7868
async def run(
7969
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
8070
) -> BlockOutput:
81-
8271
app = FirecrawlApp(api_key=credentials.api_key.get_secret_value())
8372

8473
# Sync call
85-
crawl_result = app.crawl_url(
74+
crawl_result = app.crawl(
8675
input_data.url,
8776
limit=input_data.limit,
8877
scrape_options=ScrapeOptions(
89-
formats=[format.value for format in input_data.formats],
90-
onlyMainContent=input_data.only_main_content,
91-
maxAge=input_data.max_age,
92-
waitFor=input_data.wait_for,
78+
formats=convert_to_format_options(input_data.formats),
79+
only_main_content=input_data.only_main_content,
80+
max_age=input_data.max_age,
81+
wait_for=input_data.wait_for,
9382
),
9483
)
9584
yield "data", crawl_result.data
@@ -101,14 +90,14 @@ async def run(
10190
elif f == ScrapeFormat.HTML:
10291
yield "html", data.html
10392
elif f == ScrapeFormat.RAW_HTML:
104-
yield "raw_html", data.rawHtml
93+
yield "raw_html", data.raw_html
10594
elif f == ScrapeFormat.LINKS:
10695
yield "links", data.links
10796
elif f == ScrapeFormat.SCREENSHOT:
10897
yield "screenshot", data.screenshot
10998
elif f == ScrapeFormat.SCREENSHOT_FULL_PAGE:
11099
yield "screenshot_full_page", data.screenshot
111100
elif f == ScrapeFormat.CHANGE_TRACKING:
112-
yield "change_tracking", data.changeTracking
101+
yield "change_tracking", data.change_tracking
113102
elif f == ScrapeFormat.JSON:
114103
yield "json", data.json

autogpt_platform/backend/backend/blocks/firecrawl/extract.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020

2121
@cost(BlockCost(2, BlockCostType.RUN))
2222
class FirecrawlExtractBlock(Block):
23-
2423
class Input(BlockSchema):
2524
credentials: CredentialsMetaInput = firecrawl.credentials_field()
2625
urls: list[str] = SchemaField(
@@ -53,7 +52,6 @@ def __init__(self):
5352
async def run(
5453
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
5554
) -> BlockOutput:
56-
5755
app = FirecrawlApp(api_key=credentials.api_key.get_secret_value())
5856

5957
extract_result = app.extract(

autogpt_platform/backend/backend/blocks/firecrawl/map.py

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
from typing import Any
2+
13
from firecrawl import FirecrawlApp
24

35
from backend.sdk import (
@@ -14,14 +16,16 @@
1416

1517

1618
class FirecrawlMapWebsiteBlock(Block):
17-
1819
class Input(BlockSchema):
1920
credentials: CredentialsMetaInput = firecrawl.credentials_field()
2021

2122
url: str = SchemaField(description="The website url to map")
2223

2324
class Output(BlockSchema):
24-
links: list[str] = SchemaField(description="The links of the website")
25+
links: list[str] = SchemaField(description="List of URLs found on the website")
26+
results: list[dict[str, Any]] = SchemaField(
27+
description="List of search results with url, title, and description"
28+
)
2529

2630
def __init__(self):
2731
super().__init__(
@@ -35,12 +39,22 @@ def __init__(self):
3539
async def run(
3640
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
3741
) -> BlockOutput:
38-
3942
app = FirecrawlApp(api_key=credentials.api_key.get_secret_value())
4043

4144
# Sync call
42-
map_result = app.map_url(
45+
map_result = app.map(
4346
url=input_data.url,
4447
)
4548

46-
yield "links", map_result.links
49+
# Convert SearchResult objects to dicts
50+
results_data = [
51+
{
52+
"url": link.url,
53+
"title": link.title,
54+
"description": link.description,
55+
}
56+
for link in map_result.links
57+
]
58+
59+
yield "links", [link.url for link in map_result.links]
60+
yield "results", results_data

autogpt_platform/backend/backend/blocks/firecrawl/scrape.py

Lines changed: 6 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
from enum import Enum
21
from typing import Any
32

43
from firecrawl import FirecrawlApp
54

5+
from backend.blocks.firecrawl._api import ScrapeFormat
66
from backend.sdk import (
77
APIKeyCredentials,
88
Block,
@@ -14,21 +14,10 @@
1414
)
1515

1616
from ._config import firecrawl
17-
18-
19-
class ScrapeFormat(Enum):
20-
MARKDOWN = "markdown"
21-
HTML = "html"
22-
RAW_HTML = "rawHtml"
23-
LINKS = "links"
24-
SCREENSHOT = "screenshot"
25-
SCREENSHOT_FULL_PAGE = "screenshot@fullPage"
26-
JSON = "json"
27-
CHANGE_TRACKING = "changeTracking"
17+
from ._format_utils import convert_to_format_options
2818

2919

3020
class FirecrawlScrapeBlock(Block):
31-
3221
class Input(BlockSchema):
3322
credentials: CredentialsMetaInput = firecrawl.credentials_field()
3423
url: str = SchemaField(description="The URL to crawl")
@@ -78,12 +67,11 @@ def __init__(self):
7867
async def run(
7968
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
8069
) -> BlockOutput:
81-
8270
app = FirecrawlApp(api_key=credentials.api_key.get_secret_value())
8371

84-
scrape_result = app.scrape_url(
72+
scrape_result = app.scrape(
8573
input_data.url,
86-
formats=[format.value for format in input_data.formats],
74+
formats=convert_to_format_options(input_data.formats),
8775
only_main_content=input_data.only_main_content,
8876
max_age=input_data.max_age,
8977
wait_for=input_data.wait_for,
@@ -96,14 +84,14 @@ async def run(
9684
elif f == ScrapeFormat.HTML:
9785
yield "html", scrape_result.html
9886
elif f == ScrapeFormat.RAW_HTML:
99-
yield "raw_html", scrape_result.rawHtml
87+
yield "raw_html", scrape_result.raw_html
10088
elif f == ScrapeFormat.LINKS:
10189
yield "links", scrape_result.links
10290
elif f == ScrapeFormat.SCREENSHOT:
10391
yield "screenshot", scrape_result.screenshot
10492
elif f == ScrapeFormat.SCREENSHOT_FULL_PAGE:
10593
yield "screenshot_full_page", scrape_result.screenshot
10694
elif f == ScrapeFormat.CHANGE_TRACKING:
107-
yield "change_tracking", scrape_result.changeTracking
95+
yield "change_tracking", scrape_result.change_tracking
10896
elif f == ScrapeFormat.JSON:
10997
yield "json", scrape_result.json

autogpt_platform/backend/backend/blocks/firecrawl/search.py

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
from enum import Enum
21
from typing import Any
32

4-
from firecrawl import FirecrawlApp, ScrapeOptions
3+
from firecrawl import FirecrawlApp
4+
from firecrawl.v2.types import ScrapeOptions
55

6+
from backend.blocks.firecrawl._api import ScrapeFormat
67
from backend.sdk import (
78
APIKeyCredentials,
89
Block,
@@ -14,21 +15,10 @@
1415
)
1516

1617
from ._config import firecrawl
17-
18-
19-
class ScrapeFormat(Enum):
20-
MARKDOWN = "markdown"
21-
HTML = "html"
22-
RAW_HTML = "rawHtml"
23-
LINKS = "links"
24-
SCREENSHOT = "screenshot"
25-
SCREENSHOT_FULL_PAGE = "screenshot@fullPage"
26-
JSON = "json"
27-
CHANGE_TRACKING = "changeTracking"
18+
from ._format_utils import convert_to_format_options
2819

2920

3021
class FirecrawlSearchBlock(Block):
31-
3222
class Input(BlockSchema):
3323
credentials: CredentialsMetaInput = firecrawl.credentials_field()
3424
query: str = SchemaField(description="The query to search for")
@@ -61,19 +51,19 @@ def __init__(self):
6151
async def run(
6252
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
6353
) -> BlockOutput:
64-
6554
app = FirecrawlApp(api_key=credentials.api_key.get_secret_value())
6655

6756
# Sync call
6857
scrape_result = app.search(
6958
input_data.query,
7059
limit=input_data.limit,
7160
scrape_options=ScrapeOptions(
72-
formats=[format.value for format in input_data.formats],
73-
maxAge=input_data.max_age,
74-
waitFor=input_data.wait_for,
61+
formats=convert_to_format_options(input_data.formats) or None,
62+
max_age=input_data.max_age,
63+
wait_for=input_data.wait_for,
7564
),
7665
)
7766
yield "data", scrape_result
78-
for site in scrape_result.data:
79-
yield "site", site
67+
if hasattr(scrape_result, "web") and scrape_result.web:
68+
for site in scrape_result.web:
69+
yield "site", site

autogpt_platform/backend/poetry.lock

Lines changed: 6 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

autogpt_platform/backend/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ aioclamd = "^1.0.0"
7878
setuptools = "^80.9.0"
7979
gcloud-aio-storage = "^9.5.0"
8080
pandas = "^2.3.1"
81-
firecrawl-py = "^2.16.3"
81+
firecrawl-py = "^4.3.6"
8282
exa-py = "^1.14.20"
8383
croniter = "^6.0.0"
8484
stagehand = "^0.5.1"

0 commit comments

Comments
 (0)