Skip to content

Commit 32e16a5

Browse files
committed
resolve
2 parents 22a81ba + 60e39a3 commit 32e16a5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1122
-609
lines changed

.github/workflows/build_and_deploy_docs.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ jobs:
6767
uses: actions/deploy-pages@v4
6868

6969
- name: Invalidate CloudFront cache
70-
run: gh workflow run invalidate.yaml --repo apify/apify-docs-private
70+
run: |
71+
gh workflow run invalidate-cloudfront.yml \
72+
--repo apify/apify-docs-private \
73+
--field deployment=crawlee-web
74+
echo "✅ CloudFront cache invalidation workflow triggered successfully"
7175
env:
7276
GITHUB_TOKEN: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}

CHANGELOG.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,31 @@
33
All notable changes to this project will be documented in this file.
44

55
<!-- git-cliff-unreleased-start -->
6-
## 1.1.1 - **not yet released**
6+
## 1.1.2 - **not yet released**
7+
8+
### 🚀 Features
9+
10+
- Add additional kwargs to Crawler&#x27;s export_data ([#1597](https://github.com/apify/crawlee-python/pull/1597)) ([5977f37](https://github.com/apify/crawlee-python/commit/5977f376b93a7c0d4dd53f0d331a4b04fedba2c6)) by [@vdusek](https://github.com/vdusek), closes [#526](https://github.com/apify/crawlee-python/issues/526)
11+
- Add `goto_options` for `PlaywrightCrawler` ([#1599](https://github.com/apify/crawlee-python/pull/1599)) ([0b82f3b](https://github.com/apify/crawlee-python/commit/0b82f3b6fb175223ea2aa5b348afcd5fdb767972)) by [@Mantisus](https://github.com/Mantisus), closes [#1576](https://github.com/apify/crawlee-python/issues/1576)
12+
13+
### 🐛 Bug Fixes
14+
15+
- Only apply requestHandlerTimeout to request handler ([#1474](https://github.com/apify/crawlee-python/pull/1474)) ([0dfb6c2](https://github.com/apify/crawlee-python/commit/0dfb6c2a13b6650736245fa39b3fbff397644df7)) by [@janbuchar](https://github.com/janbuchar)
16+
- Handle the case when `error_handler` returns `Request` ([#1595](https://github.com/apify/crawlee-python/pull/1595)) ([8a961a2](https://github.com/apify/crawlee-python/commit/8a961a2b07d0d33a7302dbb13c17f3d90999d390)) by [@Mantisus](https://github.com/Mantisus)
17+
18+
19+
<!-- git-cliff-unreleased-end -->
20+
## [1.1.1](https://github.com/apify/crawlee-python/releases/tag/v1.1.1) (2025-12-02)
721

822
### 🐛 Bug Fixes
923

1024
- Unify separators in `unique_key` construction ([#1569](https://github.com/apify/crawlee-python/pull/1569)) ([af46a37](https://github.com/apify/crawlee-python/commit/af46a3733b059a8052489296e172f005def953f7)) by [@vdusek](https://github.com/vdusek), closes [#1512](https://github.com/apify/crawlee-python/issues/1512)
1125
- Fix `same-domain` strategy ignoring public suffix ([#1572](https://github.com/apify/crawlee-python/pull/1572)) ([3d018b2](https://github.com/apify/crawlee-python/commit/3d018b21a28a4bee493829783057188d6106a69b)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1571](https://github.com/apify/crawlee-python/issues/1571)
1226
- Make context helpers work in `FailedRequestHandler` and `ErrorHandler` ([#1570](https://github.com/apify/crawlee-python/pull/1570)) ([b830019](https://github.com/apify/crawlee-python/commit/b830019350830ac33075316061659e2854f7f4a5)) by [@Pijukatel](https://github.com/Pijukatel), closes [#1532](https://github.com/apify/crawlee-python/issues/1532)
1327
- Fix non-ASCII character corruption in `FileSystemStorageClient` on systems without UTF-8 default encoding ([#1580](https://github.com/apify/crawlee-python/pull/1580)) ([f179f86](https://github.com/apify/crawlee-python/commit/f179f8671b0b6af9264450e4fef7e49d1cecd2bd)) by [@Mantisus](https://github.com/Mantisus), closes [#1579](https://github.com/apify/crawlee-python/issues/1579)
28+
- Respect `&lt;base&gt;` when enqueuing ([#1590](https://github.com/apify/crawlee-python/pull/1590)) ([de517a1](https://github.com/apify/crawlee-python/commit/de517a1629cc29b20568143eb64018f216d4ba33)) by [@Mantisus](https://github.com/Mantisus), closes [#1589](https://github.com/apify/crawlee-python/issues/1589)
1429

1530

16-
<!-- git-cliff-unreleased-end -->
1731
## [1.1.0](https://github.com/apify/crawlee-python/releases/tag/v1.1.0) (2025-11-18)
1832

1933
### 🚀 Features

docs/deployment/code_examples/google/cloud_run_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from crawlee.storage_clients import MemoryStorageClient
1010

1111

12-
@get('/')
12+
@get('/') # type: ignore[untyped-decorator]
1313
async def main() -> str:
1414
"""The crawler entry point that will be called when the HTTP endpoint is accessed."""
1515
# highlight-start

docs/deployment/code_examples/google/google_example.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,7 @@
66
import functions_framework
77
from flask import Request, Response
88

9-
from crawlee.crawlers import (
10-
BeautifulSoupCrawler,
11-
BeautifulSoupCrawlingContext,
12-
)
9+
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
1310
from crawlee.storage_clients import MemoryStorageClient
1411

1512

@@ -51,7 +48,7 @@ async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
5148
# highlight-end
5249

5350

54-
@functions_framework.http
51+
@functions_framework.http # type: ignore[untyped-decorator]
5552
def crawlee_run(request: Request) -> Response:
5653
# You can pass data to your crawler using `request`
5754
function_id = request.headers['Function-Execution-Id']

docs/examples/code_examples/export_entire_dataset_to_file_csv.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
3030
await crawler.run(['https://crawlee.dev'])
3131

3232
# Export the entire dataset to a CSV file.
33-
await crawler.export_data(path='results.csv')
33+
# Use semicolon as delimiter and always quote strings.
34+
await crawler.export_data(path='results.csv', delimiter=';', quoting='all')
3435

3536

3637
if __name__ == '__main__':

docs/examples/code_examples/export_entire_dataset_to_file_json.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
3030
await crawler.run(['https://crawlee.dev'])
3131

3232
# Export the entire dataset to a JSON file.
33-
await crawler.export_data(path='results.json')
33+
# Set ensure_ascii=False to allow Unicode characters in the output.
34+
await crawler.export_data(path='results.json', ensure_ascii=False)
3435

3536

3637
if __name__ == '__main__':

docs/examples/export_entire_dataset_to_file.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';
1111
import JsonExample from '!!raw-loader!roa-loader!./code_examples/export_entire_dataset_to_file_json.py';
1212
import CsvExample from '!!raw-loader!roa-loader!./code_examples/export_entire_dataset_to_file_csv.py';
1313

14-
This example demonstrates how to use the <ApiLink to="class/BasicCrawler#export_data">`BasicCrawler.export_data`</ApiLink> method of the crawler to export the entire default dataset to a single file. This method supports exporting data in either CSV or JSON format.
14+
This example demonstrates how to use the <ApiLink to="class/BasicCrawler#export_data">`BasicCrawler.export_data`</ApiLink> method of the crawler to export the entire default dataset to a single file. This method supports exporting data in either CSV or JSON format and also accepts additional keyword arguments so you can fine-tune the underlying `json.dump` or `csv.writer` behavior.
1515

1616
:::note
1717

docs/guides/code_examples/running_in_web_server/server.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
app = FastAPI(lifespan=lifespan, title='Crawler app')
1515

1616

17-
@app.get('/', response_class=HTMLResponse)
17+
@app.get('/', response_class=HTMLResponse) # type: ignore[untyped-decorator]
1818
def index() -> str:
1919
return """
2020
<!DOCTYPE html>
@@ -32,7 +32,7 @@ def index() -> str:
3232
"""
3333

3434

35-
@app.get('/scrape')
35+
@app.get('/scrape') # type: ignore[untyped-decorator]
3636
async def scrape_url(request: Request, url: str | None = None) -> dict:
3737
if not url:
3838
return {'url': 'missing', 'scrape result': 'no results'}

pyproject.toml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "crawlee"
7-
version = "1.1.1"
7+
version = "1.1.2"
88
description = "Crawlee for Python"
99
authors = [{ name = "Apify Technologies s.r.o.", email = "[email protected]" }]
1010
license = { file = "LICENSE" }
@@ -34,6 +34,7 @@ keywords = [
3434
"scraping",
3535
]
3636
dependencies = [
37+
"async-timeout>=5.0.1",
3738
"cachetools>=5.5.0",
3839
"colorama>=0.4.0",
3940
"impit>=0.8.0",
@@ -74,7 +75,7 @@ otel = [
7475
]
7576
sql_postgres = [
7677
"sqlalchemy[asyncio]>=2.0.0,<3.0.0",
77-
"asyncpg>=0.24.0; python_version < '3.14'" # TODO: https://github.com/apify/crawlee-python/issues/1555
78+
"asyncpg>=0.24.0"
7879
]
7980
sql_sqlite = [
8081
"sqlalchemy[asyncio]>=2.0.0,<3.0.0",
@@ -101,7 +102,7 @@ dev = [
101102
"build<2.0.0", # For e2e tests.
102103
"dycw-pytest-only<3.0.0",
103104
"fakeredis[probabilistic,json,lua]<3.0.0",
104-
"mypy~=1.18.0",
105+
"mypy~=1.19.0",
105106
"pre-commit<5.0.0",
106107
"proxy-py<3.0.0",
107108
"pydoc-markdown<5.0.0",
@@ -221,6 +222,13 @@ timeout = 300
221222
markers = [
222223
"run_alone: marks tests that must run in isolation",
223224
]
225+
# Ignore DeprecationWarnings coming from Uvicorn's internal imports. Uvicorn relies on deprecated
226+
# modules from `websockets`, which triggers warnings during tests. These are safe to ignore until
227+
# Uvicorn updates its internals.
228+
filterwarnings = [
229+
"ignore:websockets.legacy is deprecated:DeprecationWarning",
230+
"ignore:websockets.server.WebSocketServerProtocol is deprecated:DeprecationWarning",
231+
]
224232

225233
[tool.mypy]
226234
python_version = "3.10"

src/crawlee/_utils/context.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from __future__ import annotations
22

3-
import asyncio
3+
import inspect
44
from collections.abc import Callable
55
from functools import wraps
66
from typing import Any, TypeVar
@@ -44,4 +44,4 @@ async def async_wrapper(self: Any, *args: Any, **kwargs: Any) -> Any:
4444

4545
return await method(self, *args, **kwargs)
4646

47-
return async_wrapper if asyncio.iscoroutinefunction(method) else sync_wrapper # type: ignore[return-value]
47+
return async_wrapper if inspect.iscoroutinefunction(method) else sync_wrapper # type: ignore[return-value]

0 commit comments

Comments
 (0)