Skip to content

Commit 912222a

Browse files
Pijukatelvdusek
andauthored
refactor!: Make Actor initialization stricter and more predictable (#576)
### Description - All relevant parts of `Actor` are initialized in `async init,` not in `__init__`. - `Actor` is considered finalized after `Actor.init` was run. This also means that the same configuration used by the `Actor` is set in the global `service_locator`. - There are three valid scenarios for setting up the configuration. - Setting global configuration in `service_locator` before the `Actor.init` - Having no configuration set in `service_locator` and set it through `Actor.(configuration=...)` and running `Actor.init()` - Having no configuration set in `service_locator` and no configuration passed to `Actor` will create and set implicit default configuration - Properly set `ApifyFileSystemStorageClient` as local client to support pre-existing input file. - Depends on apify/crawlee-python/pull/1386 - Enable caching of `ApifyStorageClient` based on `token` and `api_public_url` and update NDU storage handling. ### Issues Rated to: #513, #590 ### Testing - Added many new initialization tests that show possible and prohibited use cases https://github.com/apify/apify-sdk-python/pull/576/files#diff-d64e1d346cc84a225ace3eb1d1ca826ff1e25c77064c9b1e0145552845fa7b41 - Running benchmark actor based on this and the related Crawlee branch --------- Co-authored-by: Vlada Dusek <[email protected]>
1 parent 68a7f48 commit 912222a

30 files changed

+846
-303
lines changed

docs/02_concepts/09_running_webserver.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ The URL is available in the following places:
1313

1414
- In Apify Console, on the Actor run details page as the **Container URL** field.
1515
- In the API as the `container_url` property of the [Run object](https://docs.apify.com/api/v2#/reference/actors/run-object/get-run).
16-
- In the Actor as the `Actor.config.container_url` property.
16+
- In the Actor as the `Actor.configuration.container_url` property.
1717

18-
The web server running inside the container must listen at the port defined by the `Actor.config.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
18+
The web server running inside the container must listen at the port defined by the `Actor.configuration.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
1919

2020
## Example
2121

docs/02_concepts/code/07_webhook_preventing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ async def main() -> None:
77
webhook = Webhook(
88
event_types=['ACTOR.RUN.FAILED'],
99
request_url='https://example.com/run-failed',
10-
idempotency_key=Actor.config.actor_run_id,
10+
idempotency_key=Actor.configuration.actor_run_id,
1111
)
1212

1313
# Add the webhook to the Actor.

docs/02_concepts/code/09_webserver.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ def run_server() -> None:
2222
# and save a reference to the server.
2323
global http_server
2424
with ThreadingHTTPServer(
25-
('', Actor.config.web_server_port), RequestHandler
25+
('', Actor.configuration.web_server_port), RequestHandler
2626
) as server:
27-
Actor.log.info(f'Server running on {Actor.config.web_server_port}')
27+
Actor.log.info(f'Server running on {Actor.configuration.web_server_port}')
2828
http_server = server
2929
server.serve_forever()
3030

docs/02_concepts/code/conditional_actor_charge.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,6 @@ async def main() -> None:
1313
if Actor.get_charging_manager().get_pricing_info().is_pay_per_event:
1414
# highlight-end
1515
await Actor.push_data({'hello': 'world'}, 'dataset-item')
16-
elif charged_items < (Actor.config.max_paid_dataset_items or 0):
16+
elif charged_items < (Actor.configuration.max_paid_dataset_items or 0):
1717
await Actor.push_data({'hello': 'world'})
1818
charged_items += 1

docs/03_guides/code/03_playwright.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ async def main() -> None:
4040
async with async_playwright() as playwright:
4141
# Configure the browser to launch in headless mode as per Actor configuration.
4242
browser = await playwright.chromium.launch(
43-
headless=Actor.config.headless,
43+
headless=Actor.configuration.headless,
4444
args=['--disable-gpu'],
4545
)
4646
context = await browser.new_context()

docs/03_guides/code/04_selenium.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ async def main() -> None:
4141
Actor.log.info('Launching Chrome WebDriver...')
4242
chrome_options = ChromeOptions()
4343

44-
if Actor.config.headless:
44+
if Actor.configuration.headless:
4545
chrome_options.add_argument('--headless')
4646

4747
chrome_options.add_argument('--no-sandbox')

docs/04_upgrading/upgrading_to_v3.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,42 @@ This page summarizes the breaking changes between Apify Python SDK v2.x and v3.0
99

1010
Support for Python 3.9 has been dropped. The Apify Python SDK v3.x now requires Python 3.10 or later. Make sure your environment is running a compatible version before upgrading.
1111

12+
## Actor initialization and ServiceLocator changes
13+
14+
`Actor` initialization and global `service_locator` services setup is more strict and predictable.
15+
- Services in `Actor` can't be changed after calling `Actor.init`, entering the `async with Actor` context manager or after requesting them from the `Actor`.
16+
- Services in `Actor` can be different from services in Crawler.
17+
18+
19+
**Now (v3.0):**
20+
21+
```python
22+
from crawlee.crawlers import BasicCrawler
23+
from crawlee.storage_clients import MemoryStorageClient
24+
from crawlee.configuration import Configuration
25+
from crawlee.events import LocalEventManager
26+
from apify import Actor
27+
28+
async def main():
29+
30+
async with Actor():
31+
# This crawler will use same services as Actor and global service_locator
32+
crawler_1 = BasicCrawler()
33+
34+
# This crawler will use custom services
35+
custom_configuration = Configuration()
36+
custom_event_manager = LocalEventManager.from_config(custom_configuration)
37+
custom_storage_client = MemoryStorageClient()
38+
crawler_2 = BasicCrawler(
39+
configuration=custom_configuration,
40+
event_manager=custom_event_manager,
41+
storage_client=custom_storage_client,
42+
)
43+
```
44+
45+
## Removed Actor.config property
46+
- `Actor.config` property has been removed. Use `Actor.configuration` instead.
47+
1248
## Storages
1349

1450
<!-- TODO -->

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ keywords = [
3636
dependencies = [
3737
"apify-client>=2.0.0,<3.0.0",
3838
"apify-shared>=2.0.0,<3.0.0",
39-
"crawlee==0.6.13b37",
39+
"crawlee==0.6.13b42",
4040
"cachetools>=5.5.0",
4141
"cryptography>=42.0.0",
4242
"impit>=0.6.1",

0 commit comments

Comments
 (0)