Skip to content

Commit 31ca22e

Browse files
committed
Merge remote-tracking branch 'origin/master' into unique-key
2 parents 2b2cc30 + 991b40a commit 31ca22e

File tree

9 files changed

+1436
-1040
lines changed

9 files changed

+1436
-1040
lines changed

CHANGELOG.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,19 @@ All notable changes to this project will be documented in this file.
55
<!-- git-cliff-unreleased-start -->
66
## 2.7.1 - **not yet released**
77

8+
### 🚀 Features
9+
10+
- Add deduplication to `add_batch_of_requests` ([#534](https://github.com/apify/apify-sdk-python/pull/534)) ([dd03c4d](https://github.com/apify/apify-sdk-python/commit/dd03c4d446f611492adf35f1b5738648ee5a66f7)) by [@Pijukatel](https://github.com/Pijukatel), closes [#514](https://github.com/apify/apify-sdk-python/issues/514)
11+
812
### 🐛 Bug Fixes
913

1014
- Restrict apify-shared and apify-client versions ([#523](https://github.com/apify/apify-sdk-python/pull/523)) ([b3ae5a9](https://github.com/apify/apify-sdk-python/commit/b3ae5a972a65454a4998eda59c9fcc3f6b7e8579)) by [@vdusek](https://github.com/vdusek)
1115
- Expose `APIFY_USER_IS_PAYING` env var to the configuration ([#507](https://github.com/apify/apify-sdk-python/pull/507)) ([0801e54](https://github.com/apify/apify-sdk-python/commit/0801e54887317c1280cc6828ecd3f2cc53287e76)) by [@stepskop](https://github.com/stepskop)
16+
- Resolve DeprecationWarning in ApifyEventManager ([#555](https://github.com/apify/apify-sdk-python/pull/555)) ([0c5111d](https://github.com/apify/apify-sdk-python/commit/0c5111dafe19796ec1fb9652a44c031bed9758df)) by [@vdusek](https://github.com/vdusek), closes [#343](https://github.com/apify/apify-sdk-python/issues/343)
17+
18+
### Chore
19+
20+
- [**breaking**] Update apify-client and apify-shared to v2.0 ([#548](https://github.com/apify/apify-sdk-python/pull/548)) ([8ba084d](https://github.com/apify/apify-sdk-python/commit/8ba084ded6cd018111343f2219260b481c8d4e35)) by [@vdusek](https://github.com/vdusek)
1221

1322
### Refactor
1423

@@ -64,6 +73,54 @@ All notable changes to this project will be documented in this file.
6473
- Tagline overlap ([#501](https://github.com/apify/apify-sdk-python/pull/501)) ([bae8340](https://github.com/apify/apify-sdk-python/commit/bae8340c46fea756ea35ea4d591da84c09d478e2)) by [@katzino](https://github.com/katzino)
6574

6675

76+
## [2.7.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.7.0) (2025-07-14)
77+
78+
### 🚀 Features
79+
80+
- **crypto:** Decrypt secret objects ([#482](https://github.com/apify/apify-sdk-python/pull/482)) ([ce9daf7](https://github.com/apify/apify-sdk-python/commit/ce9daf7381212b8dc194e8a643e5ca0dedbc0078)) by [@MFori](https://github.com/MFori)
81+
82+
### 🐛 Bug Fixes
83+
84+
- Sync `@docusaurus` theme version [internal] ([#500](https://github.com/apify/apify-sdk-python/pull/500)) ([a7485e7](https://github.com/apify/apify-sdk-python/commit/a7485e7d2276fde464ce862573d5b95e7d4d836a)) by [@katzino](https://github.com/katzino)
85+
- Tagline overlap ([#501](https://github.com/apify/apify-sdk-python/pull/501)) ([bae8340](https://github.com/apify/apify-sdk-python/commit/bae8340c46fea756ea35ea4d591da84c09d478e2)) by [@katzino](https://github.com/katzino)
86+
87+
88+
## [2.7.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.7.0) (2025-07-14)
89+
90+
### 🚀 Features
91+
92+
- **crypto:** Decrypt secret objects ([#482](https://github.com/apify/apify-sdk-python/pull/482)) ([ce9daf7](https://github.com/apify/apify-sdk-python/commit/ce9daf7381212b8dc194e8a643e5ca0dedbc0078)) by [@MFori](https://github.com/MFori)
93+
94+
### 🐛 Bug Fixes
95+
96+
- Sync `@docusaurus` theme version [internal] ([#500](https://github.com/apify/apify-sdk-python/pull/500)) ([a7485e7](https://github.com/apify/apify-sdk-python/commit/a7485e7d2276fde464ce862573d5b95e7d4d836a)) by [@katzino](https://github.com/katzino)
97+
- Tagline overlap ([#501](https://github.com/apify/apify-sdk-python/pull/501)) ([bae8340](https://github.com/apify/apify-sdk-python/commit/bae8340c46fea756ea35ea4d591da84c09d478e2)) by [@katzino](https://github.com/katzino)
98+
99+
100+
## [2.7.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.7.0) (2025-07-14)
101+
102+
### 🚀 Features
103+
104+
- **crypto:** Decrypt secret objects ([#482](https://github.com/apify/apify-sdk-python/pull/482)) ([ce9daf7](https://github.com/apify/apify-sdk-python/commit/ce9daf7381212b8dc194e8a643e5ca0dedbc0078)) by [@MFori](https://github.com/MFori)
105+
106+
### 🐛 Bug Fixes
107+
108+
- Sync `@docusaurus` theme version [internal] ([#500](https://github.com/apify/apify-sdk-python/pull/500)) ([a7485e7](https://github.com/apify/apify-sdk-python/commit/a7485e7d2276fde464ce862573d5b95e7d4d836a)) by [@katzino](https://github.com/katzino)
109+
- Tagline overlap ([#501](https://github.com/apify/apify-sdk-python/pull/501)) ([bae8340](https://github.com/apify/apify-sdk-python/commit/bae8340c46fea756ea35ea4d591da84c09d478e2)) by [@katzino](https://github.com/katzino)
110+
111+
112+
## [2.7.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.7.0) (2025-07-14)
113+
114+
### 🚀 Features
115+
116+
- **crypto:** Decrypt secret objects ([#482](https://github.com/apify/apify-sdk-python/pull/482)) ([ce9daf7](https://github.com/apify/apify-sdk-python/commit/ce9daf7381212b8dc194e8a643e5ca0dedbc0078)) by [@MFori](https://github.com/MFori)
117+
118+
### 🐛 Bug Fixes
119+
120+
- Sync `@docusaurus` theme version [internal] ([#500](https://github.com/apify/apify-sdk-python/pull/500)) ([a7485e7](https://github.com/apify/apify-sdk-python/commit/a7485e7d2276fde464ce862573d5b95e7d4d836a)) by [@katzino](https://github.com/katzino)
121+
- Tagline overlap ([#501](https://github.com/apify/apify-sdk-python/pull/501)) ([bae8340](https://github.com/apify/apify-sdk-python/commit/bae8340c46fea756ea35ea4d591da84c09d478e2)) by [@katzino](https://github.com/katzino)
122+
123+
67124
## [2.7.3](https://github.com/apify/apify-sdk-python/releases/tag/v2.7.3) (2025-08-11)
68125

69126
### 🐛 Bug Fixes

pyproject.toml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -65,22 +65,22 @@ scrapy = ["scrapy>=2.11.0"]
6565
dev = [
6666
"build~=1.3.0",
6767
"crawlee[parsel]",
68-
"dycw-pytest-only>=2.1.1",
69-
"griffe~=1.12.1",
68+
"dycw-pytest-only~=2.1.0",
69+
"griffe~=1.12.0",
7070
"mypy~=1.17.0",
7171
"pre-commit~=4.3.0",
7272
"pydoc-markdown~=4.8.0",
7373
"pytest-asyncio~=1.1.0",
7474
"pytest-cov~=6.2.0",
75-
"pytest-httpserver>=1.1.3",
76-
"pytest-timeout>=2.4.0",
75+
"pytest-httpserver~=1.1.0",
76+
"pytest-timeout~=2.4.0",
7777
"pytest-xdist~=3.8.0",
7878
"pytest~=8.4.0",
7979
"ruff~=0.12.0",
8080
"setuptools", # setuptools are used by pytest but not explicitly required
81-
"types-cachetools>=6.0.0.20250525",
81+
"types-cachetools~=6.0.0.20250525",
8282
"uvicorn[standard]",
83-
"werkzeug~=3.1.3", # Werkzeug is used by httpserver
83+
"werkzeug~=3.1.0", # Werkzeug is used by httpserver
8484
"yarl~=1.20.0", # yarl is used by crawlee
8585
]
8686

src/apify/_actor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ def __init__(
130130
# Set the event manager based on whether the Actor is running on the platform or locally.
131131
self._event_manager = (
132132
ApifyEventManager(
133-
config=self._configuration,
133+
configuration=self._configuration,
134134
persist_state_interval=self._configuration.persist_state_interval,
135135
)
136136
if self.is_at_home()

src/apify/events/_apify_event_manager.py

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from __future__ import annotations
22

33
import asyncio
4+
import contextlib
45
from typing import TYPE_CHECKING, Annotated
56

67
import websockets.asyncio.client
@@ -29,39 +30,51 @@
2930

3031
@docs_group('Event managers')
3132
class ApifyEventManager(EventManager):
32-
"""A class for managing Actor events.
33+
"""Event manager for the Apify platform.
3334
34-
You shouldn't use this class directly,
35-
but instead use it via the `Actor.on()` and `Actor.off()` methods.
36-
"""
35+
This class extends Crawlee's `EventManager` to provide Apify-specific functionality, including websocket
36+
connectivity to the Apify platform for receiving platform events.
37+
38+
The event manager handles:
39+
- Registration and emission of events and their listeners.
40+
- Websocket connection to Apify platform events.
41+
- Processing and validation of platform messages.
42+
- Automatic event forwarding from the platform to local event listeners.
3743
38-
_platform_events_websocket: websockets.asyncio.client.ClientConnection | None = None
39-
_process_platform_messages_task: asyncio.Task | None = None
40-
_send_system_info_interval_task: asyncio.Task | None = None
41-
_connected_to_platform_websocket: asyncio.Future = asyncio.Future()
44+
This class should not be used directly. Use the `Actor.on` and `Actor.off` methods to interact
45+
with the event system.
46+
"""
4247

43-
def __init__(self, config: Configuration, **kwargs: Unpack[EventManagerOptions]) -> None:
44-
"""Create an instance of the EventManager.
48+
def __init__(self, configuration: Configuration, **kwargs: Unpack[EventManagerOptions]) -> None:
49+
"""Initialize a new instance.
4550
4651
Args:
47-
config: The Actor configuration to be used in this event manager.
48-
kwargs: Event manager options - forwarded to the base class
52+
configuration: The Actor configuration for the event manager.
53+
**kwargs: Additional event manager options passed to the parent class.
4954
"""
5055
super().__init__(**kwargs)
5156

52-
self._config = config
53-
self._listener_tasks = set()
54-
self._connected_to_platform_websocket = asyncio.Future[bool]()
57+
self._configuration = configuration
58+
"""The Actor configuration for the event manager."""
59+
60+
self._platform_events_websocket: websockets.asyncio.client.ClientConnection | None = None
61+
"""WebSocket connection to the platform events."""
62+
63+
self._process_platform_messages_task: asyncio.Task | None = None
64+
"""Task for processing messages from the platform websocket."""
65+
66+
self._connected_to_platform_websocket: asyncio.Future[bool] | None = None
67+
"""Future that resolves when the connection to the platform websocket is established."""
5568

5669
@override
5770
async def __aenter__(self) -> Self:
5871
await super().__aenter__()
5972
self._connected_to_platform_websocket = asyncio.Future()
6073

6174
# Run tasks but don't await them
62-
if self._config.actor_events_ws_url:
75+
if self._configuration.actor_events_ws_url:
6376
self._process_platform_messages_task = asyncio.create_task(
64-
self._process_platform_messages(self._config.actor_events_ws_url)
77+
self._process_platform_messages(self._configuration.actor_events_ws_url)
6578
)
6679
is_connected = await self._connected_to_platform_websocket
6780
if not is_connected:
@@ -81,16 +94,19 @@ async def __aexit__(
8194
if self._platform_events_websocket:
8295
await self._platform_events_websocket.close()
8396

84-
if self._process_platform_messages_task:
85-
await self._process_platform_messages_task
97+
if self._process_platform_messages_task and not self._process_platform_messages_task.done():
98+
self._process_platform_messages_task.cancel()
99+
with contextlib.suppress(asyncio.CancelledError):
100+
await self._process_platform_messages_task
86101

87102
await super().__aexit__(exc_type, exc_value, exc_traceback)
88103

89104
async def _process_platform_messages(self, ws_url: str) -> None:
90105
try:
91106
async with websockets.asyncio.client.connect(ws_url) as websocket:
92107
self._platform_events_websocket = websocket
93-
self._connected_to_platform_websocket.set_result(True)
108+
if self._connected_to_platform_websocket is not None:
109+
self._connected_to_platform_websocket.set_result(True)
94110

95111
async for message in websocket:
96112
try:
@@ -110,7 +126,7 @@ async def _process_platform_messages(self, ws_url: str) -> None:
110126
event=parsed_message.name,
111127
event_data=parsed_message.data
112128
if not isinstance(parsed_message.data, SystemInfoEventData)
113-
else parsed_message.data.to_crawlee_format(self._config.dedicated_cpus or 1),
129+
else parsed_message.data.to_crawlee_format(self._configuration.dedicated_cpus or 1),
114130
)
115131

116132
if parsed_message.name == Event.MIGRATING:
@@ -120,4 +136,5 @@ async def _process_platform_messages(self, ws_url: str) -> None:
120136
logger.exception('Cannot parse Actor event', extra={'message': message})
121137
except Exception:
122138
logger.exception('Error in websocket connection')
123-
self._connected_to_platform_websocket.set_result(False)
139+
if self._connected_to_platform_websocket is not None:
140+
self._connected_to_platform_websocket.set_result(False)

src/apify/storage_clients/_apify/_request_queue_client.py

Lines changed: 67 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -228,20 +228,77 @@ async def add_batch_of_requests(
228228
Returns:
229229
Response containing information about the added requests.
230230
"""
231-
# Prepare requests for API by converting to dictionaries.
232-
requests_dict = [
233-
request.model_dump(
234-
by_alias=True,
235-
exclude={'id'}, # Exclude ID fields from requests since the API doesn't accept them.
231+
# Do not try to add previously added requests to avoid pointless expensive calls to API
232+
233+
new_requests: list[Request] = []
234+
already_present_requests: list[ProcessedRequest] = []
235+
236+
for request in requests:
237+
if self._requests_cache.get(request.id):
238+
# We are not sure if it was already handled at this point, and it is not worth calling API for it.
239+
# It could have been handled by another client in the meantime, so cached information about
240+
# `request.was_already_handled` is not reliable.
241+
already_present_requests.append(
242+
ProcessedRequest.model_validate(
243+
{
244+
'id': request.id,
245+
'uniqueKey': request.unique_key,
246+
'wasAlreadyPresent': True,
247+
'wasAlreadyHandled': request.was_already_handled,
248+
}
249+
)
250+
)
251+
252+
else:
253+
# Add new request to the cache.
254+
processed_request = ProcessedRequest.model_validate(
255+
{
256+
'id': request.id,
257+
'uniqueKey': request.unique_key,
258+
'wasAlreadyPresent': True,
259+
'wasAlreadyHandled': request.was_already_handled,
260+
}
261+
)
262+
self._cache_request(
263+
unique_key_to_request_id(request.unique_key),
264+
processed_request,
265+
)
266+
new_requests.append(request)
267+
268+
if new_requests:
269+
# Prepare requests for API by converting to dictionaries.
270+
requests_dict = [
271+
request.model_dump(
272+
by_alias=True,
273+
exclude={'id'}, # Exclude ID fields from requests since the API doesn't accept them.
274+
)
275+
for request in new_requests
276+
]
277+
278+
# Send requests to API.
279+
api_response = AddRequestsResponse.model_validate(
280+
await self._api_client.batch_add_requests(requests=requests_dict, forefront=forefront)
236281
)
237-
for request in requests
238-
]
239282

240-
# Send requests to API.
241-
response = await self._api_client.batch_add_requests(requests=requests_dict, forefront=forefront)
283+
# Add the locally known already present processed requests based on the local cache.
284+
api_response.processed_requests.extend(already_present_requests)
285+
286+
# Remove unprocessed requests from the cache
287+
for unprocessed_request in api_response.unprocessed_requests:
288+
self._requests_cache.pop(unique_key_to_request_id(unprocessed_request.unique_key), None)
289+
290+
else:
291+
api_response = AddRequestsResponse.model_validate(
292+
{'unprocessedRequests': [], 'processedRequests': already_present_requests}
293+
)
294+
295+
logger.debug(
296+
f'Tried to add new requests: {len(new_requests)}, '
297+
f'succeeded to add new requests: {len(api_response.processed_requests) - len(already_present_requests)}, '
298+
f'skipped already present requests: {len(already_present_requests)}'
299+
)
242300

243301
# Update assumed total count for newly added requests.
244-
api_response = AddRequestsResponse.model_validate(response)
245302
new_request_count = 0
246303
for processed_request in api_response.processed_requests:
247304
if not processed_request.was_already_present and not processed_request.was_already_handled:

0 commit comments

Comments
 (0)