Skip to content

Commit eba3eff

Browse files
committed
Merge remote-tracking branch 'origin/master' into only-apply-timeout-to-request-handler
2 parents fb85108 + 1ae351e commit eba3eff

File tree

115 files changed

+6689
-2712
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

115 files changed

+6689
-2712
lines changed

.github/workflows/build_and_deploy_docs.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ on:
1010

1111
env:
1212
NODE_VERSION: 20
13-
PYTHON_VERSION: 3.13
13+
PYTHON_VERSION: 3.14
1414

1515
jobs:
1616
build_and_deploy_docs:
@@ -24,13 +24,13 @@ jobs:
2424

2525
steps:
2626
- name: Checkout repository
27-
uses: actions/checkout@v5
27+
uses: actions/checkout@v6
2828
with:
2929
token: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}
3030
ref: ${{ github.event_name == 'workflow_call' && inputs.ref || github.ref }}
3131

3232
- name: Set up Node
33-
uses: actions/setup-node@v5
33+
uses: actions/setup-node@v6
3434
with:
3535
node-version: ${{ env.NODE_VERSION }}
3636

.github/workflows/release.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,21 +47,21 @@ jobs:
4747
name: Lint check
4848
uses: apify/workflows/.github/workflows/python_lint_check.yaml@main
4949
with:
50-
python-versions: '["3.10", "3.11", "3.12", "3.13"]'
50+
python-versions: '["3.10", "3.11", "3.12", "3.13", "3.14"]'
5151

5252
type_check:
5353
name: Type check
5454
uses: apify/workflows/.github/workflows/python_type_check.yaml@main
5555
with:
56-
python-versions: '["3.10", "3.11", "3.12", "3.13"]'
56+
python-versions: '["3.10", "3.11", "3.12", "3.13", "3.14"]'
5757

5858
unit_tests:
5959
name: Unit tests
6060
uses: apify/workflows/.github/workflows/python_unit_tests.yaml@main
6161
secrets:
6262
httpbin_url: ${{ secrets.APIFY_HTTPBIN_TOKEN && format('https://httpbin.apify.actor?token={0}', secrets.APIFY_HTTPBIN_TOKEN) || 'https://httpbin.org'}}
6363
with:
64-
python-versions: '["3.10", "3.11", "3.12", "3.13"]'
64+
python-versions: '["3.10", "3.11", "3.12", "3.13", "3.14"]'
6565

6666
update_changelog:
6767
name: Update changelog

.github/workflows/run_code_checks.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,22 +21,23 @@ jobs:
2121
name: Lint check
2222
uses: apify/workflows/.github/workflows/python_lint_check.yaml@main
2323
with:
24-
python-versions: '["3.10", "3.11", "3.12", "3.13"]'
24+
python-versions: '["3.10", "3.11", "3.12", "3.13", "3.14"]'
2525

2626
type_check:
2727
name: Type check
2828
uses: apify/workflows/.github/workflows/python_type_check.yaml@main
2929
with:
30-
python-versions: '["3.10", "3.11", "3.12", "3.13"]'
30+
python-versions: '["3.10", "3.11", "3.12", "3.13", "3.14"]'
3131

3232
unit_tests:
3333
name: Unit tests
3434
uses: apify/workflows/.github/workflows/python_unit_tests.yaml@main
3535
secrets:
3636
httpbin_url: ${{ secrets.APIFY_HTTPBIN_TOKEN && format('https://httpbin.apify.actor?token={0}', secrets.APIFY_HTTPBIN_TOKEN) || 'https://httpbin.org'}}
3737
with:
38-
python-versions: '["3.10", "3.11", "3.12", "3.13"]'
38+
python-versions: '["3.10", "3.11", "3.12", "3.13", "3.14"]'
3939

4040
docs_check:
4141
name: Docs check
4242
uses: apify/workflows/.github/workflows/python_docs_check.yaml@main
43+
secrets: inherit

.github/workflows/templates_e2e_tests.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77

88
env:
99
NODE_VERSION: 22
10-
PYTHON_VERSION: 3.13
10+
PYTHON_VERSION: 3.14
1111

1212
jobs:
1313
end_to_end_tests:
@@ -24,10 +24,10 @@ jobs:
2424

2525
steps:
2626
- name: Checkout repository
27-
uses: actions/checkout@v5
27+
uses: actions/checkout@v6
2828

2929
- name: Setup node
30-
uses: actions/setup-node@v5
30+
uses: actions/setup-node@v6
3131
with:
3232
node-version: ${{ env.NODE_VERSION }}
3333

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ htmlcov
3030
# IDE, editors
3131
.vscode
3232
.idea
33+
*~
3334
.DS_Store
3435
.nvim.lua
3536
Session.vim

CHANGELOG.md

Lines changed: 44 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,55 @@
33
All notable changes to this project will be documented in this file.
44

55
<!-- git-cliff-unreleased-start -->
6-
## 1.0.3 - **not yet released**
6+
## 1.1.1 - **not yet released**
77

88
### 🐛 Bug Fixes
99

10-
- Add support for Pydantic v2.12 ([#1471](https://github.com/apify/crawlee-python/pull/1471)) ([35c1108](https://github.com/apify/crawlee-python/commit/35c110878c2f445a2866be2522ea8703e9b371dd)) by [@Mantisus](https://github.com/Mantisus), closes [#1464](https://github.com/apify/crawlee-python/issues/1464)
10+
- Unify separators in `unique_key` construction ([#1569](https://github.com/apify/crawlee-python/pull/1569)) ([af46a37](https://github.com/apify/crawlee-python/commit/af46a3733b059a8052489296e172f005def953f7)) by [@vdusek](https://github.com/vdusek), closes [#1512](https://github.com/apify/crawlee-python/issues/1512)
11+
- Fix `same-domain` strategy ignoring public suffix ([#1572](https://github.com/apify/crawlee-python/pull/1572)) ([3d018b2](https://github.com/apify/crawlee-python/commit/3d018b21a28a4bee493829783057188d6106a69b)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1571](https://github.com/apify/crawlee-python/issues/1571)
12+
- Make context helpers work in `FailedRequestHandler` and `ErrorHandler` ([#1570](https://github.com/apify/crawlee-python/pull/1570)) ([b830019](https://github.com/apify/crawlee-python/commit/b830019350830ac33075316061659e2854f7f4a5)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1532](https://github.com/apify/crawlee-python/issues/1532)
13+
- Fix non-ASCII character corruption in `FileSystemStorageClient` on systems without UTF-8 default encoding ([#1580](https://github.com/apify/crawlee-python/pull/1580)) ([f179f86](https://github.com/apify/crawlee-python/commit/f179f8671b0b6af9264450e4fef7e49d1cecd2bd)) by [@Mantisus](https://github.com/Mantisus), closes [#1579](https://github.com/apify/crawlee-python/issues/1579)
1114

1215

1316
<!-- git-cliff-unreleased-end -->
17+
## [1.1.0](https://github.com/apify/crawlee-python/releases/tag/v1.1.0) (2025-11-18)
18+
19+
### 🚀 Features
20+
21+
- Add `chrome` `BrowserType` for `PlaywrightCrawler` to use the Chrome browser ([#1487](https://github.com/apify/crawlee-python/pull/1487)) ([b06937b](https://github.com/apify/crawlee-python/commit/b06937bbc3afe3c936b554bfc503365c1b2c526b)) by [@Mantisus](https://github.com/Mantisus), closes [#1071](https://github.com/apify/crawlee-python/issues/1071)
22+
- Add `RedisStorageClient` based on Redis v8.0+ ([#1406](https://github.com/apify/crawlee-python/pull/1406)) ([d08d13d](https://github.com/apify/crawlee-python/commit/d08d13d39203c24ab61fe254b0956d6744db3b5f)) by [@Mantisus](https://github.com/Mantisus)
23+
- Add support for Python 3.14 ([#1553](https://github.com/apify/crawlee-python/pull/1553)) ([89e9130](https://github.com/apify/crawlee-python/commit/89e9130cabee0fbc974b29c26483b7fa0edf627c)) by [@Mantisus](https://github.com/Mantisus)
24+
- Add `transform_request_function` parameter for `SitemapRequestLoader` ([#1525](https://github.com/apify/crawlee-python/pull/1525)) ([dc90127](https://github.com/apify/crawlee-python/commit/dc901271849b239ba2a947e8ebff8e1815e8c4fb)) by [@Mantisus](https://github.com/Mantisus)
25+
26+
### 🐛 Bug Fixes
27+
28+
- Improve indexing of the `request_queue_records` table for `SqlRequestQueueClient` ([#1527](https://github.com/apify/crawlee-python/pull/1527)) ([6509534](https://github.com/apify/crawlee-python/commit/65095346a9d8b703b10c91e0510154c3c48a4176)) by [@Mantisus](https://github.com/Mantisus), closes [#1526](https://github.com/apify/crawlee-python/issues/1526)
29+
- Improve error handling for `RobotsTxtFile.load` ([#1524](https://github.com/apify/crawlee-python/pull/1524)) ([596a311](https://github.com/apify/crawlee-python/commit/596a31184914a254b3e7a81fd2f48ea8eda7db49)) by [@Mantisus](https://github.com/Mantisus)
30+
- Fix `crawler_runtime` not being updated during run and only in the end ([#1540](https://github.com/apify/crawlee-python/pull/1540)) ([0d6c3f6](https://github.com/apify/crawlee-python/commit/0d6c3f6d3337ddb6cab4873747c28cf95605d550)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1541](https://github.com/apify/crawlee-python/issues/1541)
31+
- Ensure persist state event emission when exiting `EventManager` context ([#1562](https://github.com/apify/crawlee-python/pull/1562)) ([6a44f17](https://github.com/apify/crawlee-python/commit/6a44f172600cbcacebab899082d6efc9105c4e03)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1560](https://github.com/apify/crawlee-python/issues/1560)
32+
33+
34+
## [1.0.4](https://github.com/apify/crawlee-python/releases/tag/v1.0.4) (2025-10-24)
35+
36+
### 🐛 Bug Fixes
37+
38+
- Respect `enqueue_strategy` in `enqueue_links` ([#1505](https://github.com/apify/crawlee-python/pull/1505)) ([6ee04bc](https://github.com/apify/crawlee-python/commit/6ee04bc08c50a70f2e956a79d4ce5072a726c3a8)) by [@Mantisus](https://github.com/Mantisus), closes [#1504](https://github.com/apify/crawlee-python/issues/1504)
39+
- Exclude incorrect links before checking `robots.txt` ([#1502](https://github.com/apify/crawlee-python/pull/1502)) ([3273da5](https://github.com/apify/crawlee-python/commit/3273da5fee62ec9254666b376f382474c3532a56)) by [@Mantisus](https://github.com/Mantisus), closes [#1499](https://github.com/apify/crawlee-python/issues/1499)
40+
- Resolve compatibility issue between `SqlStorageClient` and `AdaptivePlaywrightCrawler` ([#1496](https://github.com/apify/crawlee-python/pull/1496)) ([ce172c4](https://github.com/apify/crawlee-python/commit/ce172c425a8643a1d4c919db4f5e5a6e47e91deb)) by [@Mantisus](https://github.com/Mantisus), closes [#1495](https://github.com/apify/crawlee-python/issues/1495)
41+
- Fix `BasicCrawler` statistics persistence ([#1490](https://github.com/apify/crawlee-python/pull/1490)) ([1eb1c19](https://github.com/apify/crawlee-python/commit/1eb1c19aa6f9dda4a0e3f7eda23f77a554f95076)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1501](https://github.com/apify/crawlee-python/issues/1501)
42+
- Save context state in result for `AdaptivePlaywrightCrawler` after isolated processing in `SubCrawler` ([#1488](https://github.com/apify/crawlee-python/pull/1488)) ([62b7c70](https://github.com/apify/crawlee-python/commit/62b7c70b54085fc65a660062028014f4502beba9)) by [@Mantisus](https://github.com/Mantisus), closes [#1483](https://github.com/apify/crawlee-python/issues/1483)
43+
44+
45+
## [1.0.3](https://github.com/apify/crawlee-python/releases/tag/v1.0.3) (2025-10-17)
46+
47+
### 🐛 Bug Fixes
48+
49+
- Add support for Pydantic v2.12 ([#1471](https://github.com/apify/crawlee-python/pull/1471)) ([35c1108](https://github.com/apify/crawlee-python/commit/35c110878c2f445a2866be2522ea8703e9b371dd)) by [@Mantisus](https://github.com/Mantisus), closes [#1464](https://github.com/apify/crawlee-python/issues/1464)
50+
- Fix database version warning message ([#1485](https://github.com/apify/crawlee-python/pull/1485)) ([18a545e](https://github.com/apify/crawlee-python/commit/18a545ee8add92e844acd0068f9cb8580a82e1c9)) by [@Mantisus](https://github.com/Mantisus)
51+
- Fix `reclaim_request` in `SqlRequestQueueClient` to correctly update the request state ([#1486](https://github.com/apify/crawlee-python/pull/1486)) ([1502469](https://github.com/apify/crawlee-python/commit/150246957f8f7f1ceb77bb77e3a02a903c50cae1)) by [@Mantisus](https://github.com/Mantisus), closes [#1484](https://github.com/apify/crawlee-python/issues/1484)
52+
- Fix `KeyValueStore.auto_saved_value` failing in some scenarios ([#1438](https://github.com/apify/crawlee-python/pull/1438)) ([b35dee7](https://github.com/apify/crawlee-python/commit/b35dee78180e57161b826641d45a61b8d8f6ef51)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1354](https://github.com/apify/crawlee-python/issues/1354)
53+
54+
1455
## [1.0.2](https://github.com/apify/crawlee-python/releases/tag/v1.0.2) (2025-10-08)
1556

1657
### 🐛 Bug Fixes
@@ -256,7 +297,7 @@ All notable changes to this project will be documented in this file.
256297

257298
### 🐛 Bug Fixes
258299

259-
- Fix session managment with retire ([#947](https://github.com/apify/crawlee-python/pull/947)) ([caee03f](https://github.com/apify/crawlee-python/commit/caee03fe3a43cc1d7a8d3f9e19b42df1bdb1c0aa)) by [@Mantisus](https://github.com/Mantisus)
300+
- Fix session management with retire ([#947](https://github.com/apify/crawlee-python/pull/947)) ([caee03f](https://github.com/apify/crawlee-python/commit/caee03fe3a43cc1d7a8d3f9e19b42df1bdb1c0aa)) by [@Mantisus](https://github.com/Mantisus)
260301
- Fix templates - poetry-plugin-export version and camoufox template name ([#952](https://github.com/apify/crawlee-python/pull/952)) ([7addea6](https://github.com/apify/crawlee-python/commit/7addea6605359cceba208e16ec9131724bdb3e9b)) by [@Pijukatel](https://github.com/Pijukatel), closes [#951](https://github.com/apify/crawlee-python/issues/951)
261302
- Fix convert relative link to absolute in `enqueue_links` for response with redirect ([#956](https://github.com/apify/crawlee-python/pull/956)) ([694102e](https://github.com/apify/crawlee-python/commit/694102e163bb9021a4830d2545d153f6f8f3de90)) by [@Mantisus](https://github.com/Mantisus), closes [#955](https://github.com/apify/crawlee-python/issues/955)
262303
- Fix `CurlImpersonateHttpClient` cookies handler ([#946](https://github.com/apify/crawlee-python/pull/946)) ([ed415c4](https://github.com/apify/crawlee-python/commit/ed415c433da2a40b0ee62534f0730d0737e991b8)) by [@Mantisus](https://github.com/Mantisus)

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ make run-docs
103103
Publishing new versions to [PyPI](https://pypi.org/project/crawlee) is automated through GitHub Actions.
104104

105105
- **Beta releases**: On each commit to the master branch, a new beta release is automatically published. The version number is determined based on the latest release and conventional commits. The beta version suffix is incremented by 1 from the last beta release on PyPI.
106-
- **Stable releases**: A stable version release may be created by triggering the `release` GitHub Actions workflow. The version number is determined based on the latest release and conventional commits (`auto` release type), or it may be overriden using the `custom` release type.
106+
- **Stable releases**: A stable version release may be created by triggering the `release` GitHub Actions workflow. The version number is determined based on the latest release and conventional commits (`auto` release type), or it may be overridden using the `custom` release type.
107107

108108
### Publishing to PyPI manually
109109

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ unit-tests-cov:
3838
uv run pytest --numprocesses=auto -vv --cov=src/crawlee --cov-append --cov-report=html tests/unit -m "not run_alone"
3939

4040
e2e-templates-tests $(args):
41-
uv run pytest --numprocesses=$(E2E_TESTS_CONCURRENCY) -vv tests/e2e/project_template "$(args)"
41+
uv run pytest --numprocesses=$(E2E_TESTS_CONCURRENCY) -vv tests/e2e/project_template "$(args)" --timeout=600
4242

4343
format:
4444
uv run ruff check --fix

docs/deployment/apify_platform.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ apify run
9999
For running Crawlee code as an Actor on [Apify platform](https://apify.com/actors) you need to wrap the body of the main function of your crawler with `async with Actor`.
100100

101101
:::info NOTE
102-
Adding `async with Actor` is the only important thing needed to run it on Apify platform as an Actor. It is needed to initialize your Actor (e.g. to set the correct storage implementation) and to correctly handle exitting the process.
102+
Adding `async with Actor` is the only important thing needed to run it on Apify platform as an Actor. It is needed to initialize your Actor (e.g. to set the correct storage implementation) and to correctly handle exiting the process.
103103
:::
104104

105105
Let's look at the `BeautifulSoupCrawler` example from the [Quick start](../quick-start) guide:

docs/examples/code_examples/using_browser_profiles_chrome.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,13 @@ async def main() -> None:
2727

2828
crawler = PlaywrightCrawler(
2929
headless=False,
30-
# Use chromium for Chrome compatibility
31-
browser_type='chromium',
30+
# Use the installed Chrome browser
31+
browser_type='chrome',
3232
# Disable fingerprints to preserve profile identity
3333
fingerprint_generator=None,
3434
# Set user data directory to temp folder
3535
user_data_dir=tmp_profile_dir,
3636
browser_launch_options={
37-
# Use installed Chrome browser
38-
'channel': 'chrome',
3937
# Slow down actions to mimic human behavior
4038
'slow_mo': 200,
4139
'args': [

0 commit comments

Comments
 (0)