Skip to content

Commit b916f7d

Browse files
committed
Merge remote-tracking branch 'origin/master' into enhanced-missing-local-storage-error
2 parents 1a0efd9 + 1d00718 commit b916f7d

30 files changed

+1497
-1157
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,16 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## [2.4.0](https://github.com/apify/apify-sdk-python/releases/tag/v2.4.0) (2025-03-07)
6+
7+
### 🚀 Features
8+
9+
- Update to Crawlee v0.6 ([#420](https://github.com/apify/apify-sdk-python/pull/420)) ([9be4336](https://github.com/apify/apify-sdk-python/commit/9be433667231cc5739861fa693d7a726860d6aca)) by [@vdusek](https://github.com/vdusek)
10+
- Add Actor `exit_process` option ([#424](https://github.com/apify/apify-sdk-python/pull/424)) ([994c832](https://github.com/apify/apify-sdk-python/commit/994c8323b994e009db0ccdcb624891a2fef97070)) by [@vdusek](https://github.com/vdusek), closes [#396](https://github.com/apify/apify-sdk-python/issues/396), [#401](https://github.com/apify/apify-sdk-python/issues/401)
11+
- Upgrade websockets to v14 to adapt to library API changes ([#425](https://github.com/apify/apify-sdk-python/pull/425)) ([5f49275](https://github.com/apify/apify-sdk-python/commit/5f49275ca1177e5ba56856ffe3860f6b97bee9ee)) by [@Mantisus](https://github.com/Mantisus), closes [#325](https://github.com/apify/apify-sdk-python/issues/325)
12+
- Add signing of public URL ([#407](https://github.com/apify/apify-sdk-python/pull/407)) ([a865461](https://github.com/apify/apify-sdk-python/commit/a865461c703aea01d91317f4fdf38c1bedd35f00)) by [@danpoletaev](https://github.com/danpoletaev)
13+
14+
515
## [2.3.1](https://github.com/apify/apify-sdk-python/releases/tag/v2.3.1) (2025-02-25)
616

717
### 🐛 Bug Fixes
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
id: pay-per-event
3+
title: Pay-per-event monetization
4+
description: Monetize your Actors using the pay-per-event pricing model
5+
---
6+
7+
import ActorChargeSource from '!!raw-loader!./code/actor_charge.py';
8+
import ConditionalActorChargeSource from '!!raw-loader!./code/conditional_actor_charge.py';
9+
import ApiLink from '@site/src/components/ApiLink';
10+
import CodeBlock from '@theme/CodeBlock';
11+
12+
Apify provides several [pricing models](https://docs.apify.com/platform/actors/publishing/monetize) for monetizing your Actors. The most recent and most flexible one is [pay-per-event](https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event), which lets you charge your users programmatically directly from your Actor. As the name suggests, you may charge the users each time a specific event occurs, for example a call to an external API or when you return a result.
13+
14+
To use the pay-per-event pricing model, you first need to [set it up](https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event) for your Actor in the Apify console. After that, you're free to start charging for events.
15+
16+
## Charging for events
17+
18+
After monetization is set in the Apify console, you can add <ApiLink to="class/Actor#charge">`Actor.charge`</ApiLink> calls to your code and start monetizing!
19+
20+
<CodeBlock language="python">
21+
{ActorChargeSource}
22+
</CodeBlock>
23+
24+
Then you just push your code to Apify and that's it! The SDK will even keep track of the max total charge setting for you, so you will not provide more value than what the user chose to pay for.
25+
26+
If you need finer control over charging, you can access call <ApiLink to="class/Actor#get_charging_manager">`Actor.get_charging_manager()`</ApiLink> to access the <ApiLink to="class/ChargingManager">`ChargingManager`</ApiLink>, which can provide more detailed information - for example how many events of each type can be charged before reaching the configured limit.
27+
28+
## Transitioning from a different pricing model
29+
30+
When you plan to start using the pay-per-event pricing model for an Actor that is already monetized with a different pricing model, your source code will need support both pricing models during the transition period enforced by the Apify platform. Arguably the most frequent case is the transition from the pay-per-result model which utilizes the `ACTOR_MAX_PAID_DATASET_ITEMS` environment variable to prevent returning unpaid dataset items. The following is an example how to handle such scenarios. The key part is the <ApiLink to="class/ChargingManager#get_pricing_info">`ChargingManager.get_pricing_info()`</ApiLink> method which returns information about the current pricing model.
31+
32+
<CodeBlock language="python">
33+
{ConditionalActorChargeSource}
34+
</CodeBlock>
35+
36+
## Local development
37+
38+
It is encouraged to test your monetization code on your machine before releasing it to the public. To tell your Actor that it should work in pay-per-event mode, pass it the `ACTOR_TEST_PAY_PER_EVENT` environment variable:
39+
40+
```shell
41+
ACTOR_TEST_PAY_PER_EVENT=true python -m youractor
42+
```
43+
44+
If you also wish to see a log of all the events charged throughout the run, the Apify SDK keeps a log of charged events in a so called charging dataset. Your charging dataset can be found under the `charging_log` name (unless you change your storage settings, this dataset is stored in `storage/datasets/charging_log/`). Please note that this log is not available when running the Actor in production on the Apify platform.
45+
46+
Because pricing configuration is stored by the Apify platform, all events will have a default price of $1.
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
from apify import Actor
2+
3+
4+
async def main() -> None:
5+
async with Actor:
6+
# highlight-start
7+
# Charge for a single occurence of an event
8+
await Actor.charge(event_name='init')
9+
# highlight-end
10+
11+
# Prepare some mock results
12+
result = [
13+
{'word': 'Lorem'},
14+
{'word': 'Ipsum'},
15+
{'word': 'Dolor'},
16+
{'word': 'Sit'},
17+
{'word': 'Amet'},
18+
]
19+
# highlight-start
20+
# Shortcut for charging for each pushed dataset item
21+
await Actor.push_data(result, 'result-item')
22+
# highlight-end
23+
24+
# highlight-start
25+
# Or you can charge for a given number of events manually
26+
await Actor.charge(
27+
event_name='result-item',
28+
count=len(result),
29+
)
30+
# highlight-end
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
from apify import Actor
2+
3+
4+
async def main() -> None:
5+
async with Actor:
6+
# Check the dataset because there might already be items
7+
# if the run migrated or was restarted
8+
default_dataset = await Actor.open_dataset()
9+
dataset_info = await default_dataset.get_info()
10+
charged_items = dataset_info.item_count if dataset_info else 0
11+
12+
# highlight-start
13+
if Actor.get_charging_manager().get_pricing_info().is_pay_per_event:
14+
# highlight-end
15+
await Actor.push_data({'hello': 'world'}, 'dataset-item')
16+
elif charged_items < (Actor.config.max_paid_dataset_items or 0):
17+
await Actor.push_data({'hello': 'world'})
18+
charged_items += 1

pyproject.toml

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "apify"
7-
version = "2.3.1"
7+
version = "2.4.0"
88
description = "Apify SDK for Python"
99
authors = [{ name = "Apify Technologies s.r.o.", email = "[email protected]" }]
1010
license = { file = "LICENSE" }
@@ -35,16 +35,14 @@ keywords = [
3535
]
3636
dependencies = [
3737
"apify-client>=1.9.2",
38-
"apify-shared>=1.2.1",
39-
"crawlee~=0.5.0",
38+
"apify-shared>=1.3.0",
39+
"crawlee~=0.6.0",
4040
"cryptography>=42.0.0",
4141
"httpx>=0.27.0",
4242
"lazy-object-proxy>=1.10.0",
4343
"more_itertools>=10.2.0",
4444
"typing-extensions>=4.1.0",
45-
# TODO: Relax the upper bound once the issue is resolved:
46-
# https://github.com/apify/apify-sdk-python/issues/325
47-
"websockets>=10.0,<14.0.0",
45+
"websockets>=14.0",
4846
]
4947

5048
[project.optional-dependencies]
@@ -62,7 +60,7 @@ scrapy = ["scrapy>=2.11.0"]
6260
dev = [
6361
"build~=1.2.0",
6462
"filelock~=3.17.0",
65-
"griffe~=1.5.0",
63+
"griffe~=1.6.0",
6664
"mypy~=1.15.0",
6765
"pre-commit~=4.1.0",
6866
"pydoc-markdown~=4.8.0",
@@ -74,7 +72,7 @@ dev = [
7472
"pytest-xdist~=3.6.0",
7573
"respx~=0.22.0",
7674
"ruff~=0.9.0",
77-
"setuptools~=75.8.0", # setuptools are used by pytest but not explicitly required
75+
"setuptools~=76.0.0", # setuptools are used by pytest but not explicitly required
7876
]
7977

8078
[tool.hatch.build.targets.wheel]

src/apify/_actor.py

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import functools
55
import os
66
import sys
7+
from contextlib import suppress
78
from datetime import timedelta
89
from typing import TYPE_CHECKING, Any, Callable, Literal, TypeVar, cast, overload
910

@@ -44,7 +45,7 @@
4445
from typing_extensions import Self
4546

4647
from crawlee.proxy_configuration import _NewUrlFunction
47-
from crawlee.storage_clients import BaseStorageClient
48+
from crawlee.storage_clients import StorageClient
4849

4950
from apify._models import Webhook
5051

@@ -83,6 +84,7 @@ def __init__(
8384
configuration: Configuration | None = None,
8485
*,
8586
configure_logging: bool = True,
87+
exit_process: bool | None = None,
8688
) -> None:
8789
"""Create an Actor instance.
8890
@@ -93,7 +95,10 @@ def __init__(
9395
configuration: The Actor configuration to be used. If not passed, a new Configuration instance will
9496
be created.
9597
configure_logging: Should the default logging configuration be configured?
98+
exit_process: Whether the Actor should call `sys.exit` when the context manager exits. The default is
99+
True except for the IPython, Pytest and Scrapy environments.
96100
"""
101+
self._exit_process = self._get_default_exit_process() if exit_process is None else exit_process
97102
self._is_exiting = False
98103

99104
self._configuration = configuration or Configuration.get_global_configuration()
@@ -160,9 +165,19 @@ def __repr__(self) -> str:
160165

161166
return super().__repr__()
162167

163-
def __call__(self, configuration: Configuration | None = None, *, configure_logging: bool = True) -> Self:
168+
def __call__(
169+
self,
170+
configuration: Configuration | None = None,
171+
*,
172+
configure_logging: bool = True,
173+
exit_process: bool | None = None,
174+
) -> Self:
164175
"""Make a new Actor instance with a non-default configuration."""
165-
return self.__class__(configuration=configuration, configure_logging=configure_logging)
176+
return self.__class__(
177+
configuration=configuration,
178+
configure_logging=configure_logging,
179+
exit_process=exit_process,
180+
)
166181

167182
@property
168183
def apify_client(self) -> ApifyClientAsync:
@@ -190,7 +205,7 @@ def log(self) -> logging.Logger:
190205
return logger
191206

192207
@property
193-
def _local_storage_client(self) -> BaseStorageClient:
208+
def _local_storage_client(self) -> StorageClient:
194209
"""The local storage client the Actor instance uses."""
195210
return service_locator.get_storage_client()
196211

@@ -300,13 +315,7 @@ async def finalize() -> None:
300315
await asyncio.wait_for(finalize(), cleanup_timeout.total_seconds())
301316
self._is_initialized = False
302317

303-
if is_running_in_ipython():
304-
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in IPython')
305-
elif os.getenv('PYTEST_CURRENT_TEST', default=False): # noqa: PLW1508
306-
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running in an unit test')
307-
elif os.getenv('SCRAPY_SETTINGS_MODULE'):
308-
self.log.debug(f'Not calling sys.exit({exit_code}) because Actor is running with Scrapy')
309-
else:
318+
if self._exit_process:
310319
sys.exit(exit_code)
311320

312321
async def fail(
@@ -1150,6 +1159,26 @@ async def create_proxy_configuration(
11501159

11511160
return proxy_configuration
11521161

1162+
def _get_default_exit_process(self) -> bool:
1163+
"""Returns False for IPython, Pytest, and Scrapy environments, True otherwise."""
1164+
if is_running_in_ipython():
1165+
self.log.debug('Running in IPython, setting default `exit_process` to False.')
1166+
return False
1167+
1168+
# Check if running in Pytest by detecting the relevant environment variable.
1169+
if os.getenv('PYTEST_CURRENT_TEST'):
1170+
self.log.debug('Running in Pytest, setting default `exit_process` to False.')
1171+
return False
1172+
1173+
# Check if running in Scrapy by attempting to import it.
1174+
with suppress(ImportError):
1175+
import scrapy # noqa: F401
1176+
1177+
self.log.debug('Running in Scrapy, setting default `exit_process` to False.')
1178+
return False
1179+
1180+
return True
1181+
11531182

11541183
Actor = cast(_ActorType, Proxy(_ActorType))
11551184
"""The entry point of the SDK, through which all the Actor operations should be done."""

src/apify/_crypto.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
from __future__ import annotations
22

33
import base64
4+
import hashlib
5+
import hmac
6+
import string
47
from typing import Any
58

69
from cryptography.exceptions import InvalidTag as InvalidTagException
@@ -153,3 +156,38 @@ def decrypt_input_secrets(private_key: rsa.RSAPrivateKey, input_data: Any) -> An
153156
)
154157

155158
return input_data
159+
160+
161+
CHARSET = string.digits + string.ascii_letters
162+
163+
164+
def encode_base62(num: int) -> str:
165+
"""Encode the given number to base62."""
166+
if num == 0:
167+
return CHARSET[0]
168+
169+
res = ''
170+
while num > 0:
171+
num, remainder = divmod(num, 62)
172+
res = CHARSET[remainder] + res
173+
return res
174+
175+
176+
@ignore_docs
177+
def create_hmac_signature(secret_key: str, message: str) -> str:
178+
"""Generate an HMAC signature and encodes it using Base62. Base62 encoding reduces the signature length.
179+
180+
HMAC signature is truncated to 30 characters to make it shorter.
181+
182+
Args:
183+
secret_key: Secret key used for signing signatures.
184+
message: Message to be signed.
185+
186+
Returns:
187+
Base62 encoded signature.
188+
"""
189+
signature = hmac.new(secret_key.encode('utf-8'), message.encode('utf-8'), hashlib.sha256).hexdigest()[:30]
190+
191+
decimal_signature = int(signature, 16)
192+
193+
return encode_base62(decimal_signature)

src/apify/_platform_event_manager.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from datetime import datetime
55
from typing import TYPE_CHECKING, Annotated, Any, Literal, Union
66

7-
import websockets.client
7+
import websockets.asyncio.client
88
from pydantic import BaseModel, Discriminator, Field, TypeAdapter
99
from typing_extensions import Self, Unpack, override
1010

@@ -143,7 +143,7 @@ class PlatformEventManager(EventManager):
143143
but instead use it via the `Actor.on()` and `Actor.off()` methods.
144144
"""
145145

146-
_platform_events_websocket: websockets.client.WebSocketClientProtocol | None = None
146+
_platform_events_websocket: websockets.asyncio.client.ClientConnection | None = None
147147
_process_platform_messages_task: asyncio.Task | None = None
148148
_send_system_info_interval_task: asyncio.Task | None = None
149149
_connected_to_platform_websocket: asyncio.Future = asyncio.Future()
@@ -196,7 +196,7 @@ async def __aexit__(
196196

197197
async def _process_platform_messages(self, ws_url: str) -> None:
198198
try:
199-
async with websockets.client.connect(ws_url) as websocket:
199+
async with websockets.asyncio.client.connect(ws_url) as websocket:
200200
self._platform_events_websocket = websocket
201201
self._connected_to_platform_websocket.set_result(True)
202202

src/apify/apify_storage_client/_apify_storage_client.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
from apify_client import ApifyClientAsync
88
from crawlee._utils.crypto import crypto_random_object_id
9-
from crawlee.storage_clients import BaseStorageClient
9+
from crawlee.storage_clients import StorageClient
1010

1111
from apify._utils import docs_group
1212
from apify.apify_storage_client._dataset_client import DatasetClient
@@ -21,7 +21,7 @@
2121

2222

2323
@docs_group('Classes')
24-
class ApifyStorageClient(BaseStorageClient):
24+
class ApifyStorageClient(StorageClient):
2525
"""A storage client implementation based on the Apify platform storage."""
2626

2727
def __init__(self, *, configuration: Configuration) -> None:
@@ -68,5 +68,5 @@ async def purge_on_start(self) -> None:
6868
pass
6969

7070
@override
71-
def get_rate_limit_errors(self) -> dict[int, int]: # type: ignore[misc]
71+
def get_rate_limit_errors(self) -> dict[int, int]:
7272
return self._apify_client.stats.rate_limit_errors

src/apify/apify_storage_client/_dataset_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
from typing_extensions import override
66

7-
from crawlee.storage_clients._base import BaseDatasetClient
7+
from crawlee.storage_clients._base import DatasetClient as BaseDatasetClient
88
from crawlee.storage_clients.models import DatasetItemsListPage, DatasetMetadata
99

1010
if TYPE_CHECKING:

0 commit comments

Comments
 (0)