Skip to content

Commit b395a71

Browse files
committed
Merge remote-tracking branch 'origin/master' into better-event-listener-typing
2 parents 3f6b716 + 9ff724b commit b395a71

27 files changed

+5354
-6650
lines changed

.github/workflows/pre_release.yaml

Lines changed: 4 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ on:
88
tags-ignore:
99
- "**" # Ignore all tags to prevent duplicate builds when tags are pushed.
1010

11+
# Or it can be triggered manually.
12+
workflow_dispatch:
13+
1114
jobs:
1215
release_metadata:
1316
if: "!startsWith(github.event.head_commit.message, 'docs') && !startsWith(github.event.head_commit.message, 'ci') && startsWith(github.repository, 'apify/')"
@@ -65,26 +68,11 @@ jobs:
6568
steps:
6669
- name: Prepare distribution
6770
uses: apify/workflows/prepare-pypi-distribution@main
68-
with:
71+
with:
6972
package_name: apify
7073
is_prerelease: "yes"
7174
version_number: ${{ needs.release_metadata.outputs.version_number }}
7275
ref: ${{ needs.update_changelog.changelog_commitish }}
7376
# Publishes the package to PyPI using PyPA official GitHub action with OIDC authentication.
7477
- name: Publish package to PyPI
7578
uses: pypa/gh-action-pypi-publish@release/v1
76-
77-
trigger_docker_build:
78-
name: Trigger Docker image build
79-
needs: [release_metadata, update_changelog]
80-
runs-on: ubuntu-latest
81-
steps:
82-
- # Trigger building the Python Docker images in apify/apify-actor-docker repo
83-
name: Trigger Docker image build
84-
run: |
85-
gh api -X POST "/repos/apify/apify-actor-docker/dispatches" \
86-
-F event_type=build-python-images \
87-
-F client_payload[release_tag]=beta \
88-
-F client_payload[apify_version]=${{ needs.release_metadata.outputs.version_number }}
89-
env:
90-
GH_TOKEN: ${{ secrets.APIFY_SERVICE_ACCOUNT_GITHUB_TOKEN }}

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,19 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## [2.2.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.2.0) (2025-01-10)
6+
7+
### 🚀 Features
8+
9+
- Add new config variables to `Actor.config` ([#351](https://github.com/apify/apify-sdk-python/pull/351)) ([7b6478c](https://github.com/apify/apify-sdk-python/commit/7b6478c3fc239b454f733fbd98348dab7b3a1766)) by [@fnesveda](https://github.com/fnesveda)
10+
- Upgrade to Crawlee v0.5 ([#355](https://github.com/apify/apify-sdk-python/pull/355)) ([826f4db](https://github.com/apify/apify-sdk-python/commit/826f4dbcc8cfd693d97e40c17faf91d225d7ffaf)) by [@vdusek](https://github.com/vdusek)
11+
12+
### 🐛 Bug Fixes
13+
14+
- Better error message when attempting to use force_cloud without an Apify token ([#356](https://github.com/apify/apify-sdk-python/pull/356)) ([33245ce](https://github.com/apify/apify-sdk-python/commit/33245ceddb1fa0ed39548181fb57fb3e6b98f954)) by [@janbuchar](https://github.com/janbuchar)
15+
- Allow calling `Actor.reboot()` from migrating handler, align reboot behavior with JS SDK ([#361](https://github.com/apify/apify-sdk-python/pull/361)) ([7ba0221](https://github.com/apify/apify-sdk-python/commit/7ba022121fe7b65470fec901295f74cebce72610)) by [@fnesveda](https://github.com/fnesveda)
16+
17+
518
## [2.1.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.1.0) (2024-12-03)
619

720
### 🚀 Features

docs/03-concepts/04-actor-events.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ During its runtime, the Actor receives Actor events sent by the Apify platform o
4040
{' '}to another worker server soon.</p>
4141
You can use it to persist the state of the Actor so that once it is executed again on the new server,
4242
it doesn't have to start over from the beginning.
43+
Once you have persisted the state of your Actor, you can call <a href="../../reference/class/Actor#reboot"><code>Actor.reboot()</code></a>
44+
to reboot the Actor and trigger the migration immediately, to speed up the process.
4345
</td>
4446
</tr>
4547
<tr>

poetry.lock

Lines changed: 816 additions & 515 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
44

55
[tool.poetry]
66
name = "apify"
7-
version = "2.1.0"
7+
version = "2.2.0"
88
description = "Apify SDK for Python"
99
authors = ["Apify Technologies s.r.o. <[email protected]>"]
1010
license = "Apache-2.0"
@@ -45,32 +45,33 @@ keywords = [
4545
python = "^3.9"
4646
apify-client = ">=1.8.1"
4747
apify-shared = ">=1.1.2"
48-
crawlee = { git = "https://github.com/apify/crawlee-python.git", branch = "improve-event-types" }
48+
crawlee = "~0.5.1"
4949
cryptography = ">=42.0.0"
50-
# TODO: relax the upper bound once the issue is resolved:
51-
# https://github.com/apify/apify-sdk-python/issues/348
52-
httpx = "~0.27.0"
50+
httpx = ">=0.27.0"
5351
lazy-object-proxy = ">=1.10.0"
52+
more_itertools = ">=10.2.0"
5453
scrapy = { version = ">=2.11.0", optional = true }
5554
typing-extensions = ">=4.1.0"
55+
# TODO: Relax the upper bound once the issue is resolved:
56+
# https://github.com/apify/apify-sdk-python/issues/325
5657
websockets = ">=10.0 <14.0.0"
5758

5859
[tool.poetry.group.dev.dependencies]
5960
build = "~1.2.0"
6061
filelock = "~3.16.0"
6162
griffe = "~1.5.0"
62-
mypy = "~1.13.0"
63+
mypy = "~1.14.0"
6364
pre-commit = "~4.0.0"
6465
pydoc-markdown = "~4.8.0"
6566
pytest = "~8.3.0"
66-
pytest-asyncio = "~0.24.0"
67+
pytest-asyncio = "~0.25.0"
6768
pytest-cov = "~6.0.0"
6869
pytest-only = "~2.1.0"
6970
pytest-timeout = "~2.3.0"
7071
pytest-xdist = "~3.6.0"
71-
respx = "~0.21.0"
72-
ruff = "~0.8.0"
73-
setuptools = "~75.6.0" # setuptools are used by pytest but not explicitly required
72+
respx = "~0.22.0"
73+
ruff = "~0.9.0"
74+
setuptools = "~75.8.0" # setuptools are used by pytest but not explicitly required
7475

7576
[tool.poetry.extras]
7677
scrapy = ["scrapy"]

src/apify/_actor.py

Lines changed: 73 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,13 @@
77
from typing import TYPE_CHECKING, Any, Callable, Literal, TypeVar, cast, overload
88

99
from lazy_object_proxy import Proxy
10+
from more_itertools import flatten
1011
from pydantic import AliasChoices
1112

1213
from apify_client import ApifyClientAsync
1314
from apify_shared.consts import ActorEnvVars, ActorExitCodes, ApifyEnvVars
1415
from apify_shared.utils import ignore_docs, maybe_extract_enum_member_value
15-
from crawlee import service_container
16+
from crawlee import service_locator
1617
from crawlee.events import (
1718
Event,
1819
EventAbortingData,
@@ -41,6 +42,7 @@
4142
from typing_extensions import Self
4243

4344
from crawlee.proxy_configuration import _NewUrlFunction
45+
from crawlee.storage_clients import BaseStorageClient
4446

4547
from apify._models import Webhook
4648

@@ -56,6 +58,7 @@ class _ActorType:
5658
_apify_client: ApifyClientAsync
5759
_configuration: Configuration
5860
_is_exiting = False
61+
_is_rebooting = False
5962

6063
def __init__(
6164
self,
@@ -77,17 +80,22 @@ def __init__(
7780
self._configure_logging = configure_logging
7881
self._apify_client = self.new_client()
7982

80-
self._event_manager: EventManager
81-
if self._configuration.is_at_home:
82-
self._event_manager = PlatformEventManager(
83+
# Create an instance of the cloud storage client, the local storage client is obtained
84+
# from the service locator.
85+
self._cloud_storage_client = ApifyStorageClient.from_config(config=self._configuration)
86+
87+
# Set the event manager based on whether the Actor is running on the platform or locally.
88+
self._event_manager = (
89+
PlatformEventManager(
8390
config=self._configuration,
8491
persist_state_interval=self._configuration.persist_state_interval,
8592
)
86-
else:
87-
self._event_manager = LocalEventManager(
93+
if self.is_at_home()
94+
else LocalEventManager(
8895
system_info_interval=self._configuration.system_info_interval,
8996
persist_state_interval=self._configuration.persist_state_interval,
9097
)
98+
)
9199

92100
self._is_initialized = False
93101

@@ -100,9 +108,6 @@ async def __aenter__(self) -> Self:
100108
When you exit the `async with` block, the `Actor.exit()` method is called, and if any exception happens while
101109
executing the block code, the `Actor.fail` method is called.
102110
"""
103-
if self._configure_logging:
104-
_configure_logging(self._configuration)
105-
106111
await self.init()
107112
return self
108113

@@ -162,10 +167,25 @@ def log(self) -> logging.Logger:
162167
"""The logging.Logger instance the Actor uses."""
163168
return logger
164169

170+
@property
171+
def _local_storage_client(self) -> BaseStorageClient:
172+
"""The local storage client the Actor instance uses."""
173+
return service_locator.get_storage_client()
174+
165175
def _raise_if_not_initialized(self) -> None:
166176
if not self._is_initialized:
167177
raise RuntimeError('The Actor was not initialized!')
168178

179+
def _raise_if_cloud_requested_but_not_configured(self, *, force_cloud: bool) -> None:
180+
if not force_cloud:
181+
return
182+
183+
if not self.is_at_home() and self.config.token is None:
184+
raise RuntimeError(
185+
'In order to use the Apify cloud storage from your computer, '
186+
'you need to provide an Apify token using the APIFY_TOKEN environment variable.'
187+
)
188+
169189
async def init(self) -> None:
170190
"""Initialize the Actor instance.
171191
@@ -180,18 +200,19 @@ async def init(self) -> None:
180200
if self._is_initialized:
181201
raise RuntimeError('The Actor was already initialized!')
182202

183-
if self._configuration.token:
184-
service_container.set_cloud_storage_client(ApifyStorageClient(configuration=self._configuration))
203+
self._is_exiting = False
204+
self._was_final_persist_state_emitted = False
185205

186-
if self._configuration.is_at_home:
187-
service_container.set_default_storage_client_type('cloud')
188-
else:
189-
service_container.set_default_storage_client_type('local')
206+
# If the Actor is running on the Apify platform, we set the cloud storage client.
207+
if self.is_at_home():
208+
service_locator.set_storage_client(self._cloud_storage_client)
190209

191-
service_container.set_event_manager(self._event_manager)
210+
service_locator.set_event_manager(self.event_manager)
211+
service_locator.set_configuration(self.configuration)
192212

193-
self._is_exiting = False
194-
self._was_final_persist_state_emitted = False
213+
# The logging configuration has to be called after all service_locator set methods.
214+
if self._configure_logging:
215+
_configure_logging()
195216

196217
self.log.info('Initializing Actor...')
197218
self.log.info('System info', extra=get_system_info())
@@ -241,7 +262,6 @@ async def finalize() -> None:
241262
await self._event_manager.wait_for_all_listeners_to_complete(timeout=event_listeners_timeout)
242263

243264
await self._event_manager.__aexit__(None, None, None)
244-
cast(dict, service_container._services).clear() # noqa: SLF001
245265

246266
await asyncio.wait_for(finalize(), cleanup_timeout.total_seconds())
247267
self._is_initialized = False
@@ -343,12 +363,15 @@ async def open_dataset(
343363
An instance of the `Dataset` class for the given ID or name.
344364
"""
345365
self._raise_if_not_initialized()
366+
self._raise_if_cloud_requested_but_not_configured(force_cloud=force_cloud)
367+
368+
storage_client = self._cloud_storage_client if force_cloud else self._local_storage_client
346369

347370
return await Dataset.open(
348371
id=id,
349372
name=name,
350373
configuration=self._configuration,
351-
storage_client=service_container.get_storage_client(client_type='cloud' if force_cloud else None),
374+
storage_client=storage_client,
352375
)
353376

354377
async def open_key_value_store(
@@ -375,12 +398,14 @@ async def open_key_value_store(
375398
An instance of the `KeyValueStore` class for the given ID or name.
376399
"""
377400
self._raise_if_not_initialized()
401+
self._raise_if_cloud_requested_but_not_configured(force_cloud=force_cloud)
402+
storage_client = self._cloud_storage_client if force_cloud else self._local_storage_client
378403

379404
return await KeyValueStore.open(
380405
id=id,
381406
name=name,
382407
configuration=self._configuration,
383-
storage_client=service_container.get_storage_client(client_type='cloud' if force_cloud else None),
408+
storage_client=storage_client,
384409
)
385410

386411
async def open_request_queue(
@@ -409,12 +434,15 @@ async def open_request_queue(
409434
An instance of the `RequestQueue` class for the given ID or name.
410435
"""
411436
self._raise_if_not_initialized()
437+
self._raise_if_cloud_requested_but_not_configured(force_cloud=force_cloud)
438+
439+
storage_client = self._cloud_storage_client if force_cloud else self._local_storage_client
412440

413441
return await RequestQueue.open(
414442
id=id,
415443
name=name,
416444
configuration=self._configuration,
417-
storage_client=service_container.get_storage_client(client_type='cloud' if force_cloud else None),
445+
storage_client=storage_client,
418446
)
419447

420448
async def push_data(self, data: dict | list[dict]) -> None:
@@ -870,12 +898,32 @@ async def reboot(
870898
self.log.error('Actor.reboot() is only supported when running on the Apify platform.')
871899
return
872900

901+
if self._is_rebooting:
902+
self.log.debug('Actor is already rebooting, skipping the additional reboot call.')
903+
return
904+
905+
self._is_rebooting = True
906+
873907
if not custom_after_sleep:
874908
custom_after_sleep = self._configuration.metamorph_after_sleep
875909

876-
self._event_manager.emit(event=Event.PERSIST_STATE, event_data=EventPersistStateData(is_migrating=True))
910+
# Call all the listeners for the PERSIST_STATE and MIGRATING events, and wait for them to finish.
911+
# PERSIST_STATE listeners are called to allow the Actor to persist its state before the reboot.
912+
# MIGRATING listeners are called to allow the Actor to gracefully stop in-progress tasks before the reboot.
913+
# Typically, crawlers are listening for the MIIGRATING event to stop processing new requests.
914+
# We can't just emit the events and wait for all listeners to finish,
915+
# because this method might be called from an event listener itself, and we would deadlock.
916+
persist_state_listeners = flatten(
917+
(self._event_manager._listeners_to_wrappers[Event.PERSIST_STATE] or {}).values() # noqa: SLF001
918+
)
919+
migrating_listeners = flatten(
920+
(self._event_manager._listeners_to_wrappers[Event.MIGRATING] or {}).values() # noqa: SLF001
921+
)
877922

878-
await self._event_manager.__aexit__(None, None, None)
923+
await asyncio.gather(
924+
*[listener(EventPersistStateData(is_migrating=True)) for listener in persist_state_listeners],
925+
*[listener(EventMigratingData()) for listener in migrating_listeners],
926+
)
879927

880928
if not self._configuration.actor_run_id:
881929
raise RuntimeError('actor_run_id cannot be None when running on the Apify platform.')
@@ -972,7 +1020,7 @@ async def create_proxy_configuration(
9721020
password: str | None = None,
9731021
groups: list[str] | None = None,
9741022
country_code: str | None = None,
975-
proxy_urls: list[str] | None = None,
1023+
proxy_urls: list[str | None] | None = None,
9761024
new_url_function: _NewUrlFunction | None = None,
9771025
) -> ProxyConfiguration | None:
9781026
"""Create a ProxyConfiguration object with the passed proxy configuration.

0 commit comments

Comments
 (0)