You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor!: Make Actor initialization stricter and more predictable (#576)
### Description
- All relevant parts of `Actor` are initialized in `async init,` not in
`__init__`.
- `Actor` is considered finalized after `Actor.init` was run. This also
means that the same configuration used by the `Actor` is set in the
global `service_locator`.
- There are three valid scenarios for setting up the configuration.
- Setting global configuration in `service_locator` before the
`Actor.init`
- Having no configuration set in `service_locator` and set it through
`Actor.(configuration=...)` and running `Actor.init()`
- Having no configuration set in `service_locator` and no configuration
passed to `Actor` will create and set implicit default configuration
- Properly set `ApifyFileSystemStorageClient` as local client to support
pre-existing input file.
- Depends on apify/crawlee-python/pull/1386
- Enable caching of `ApifyStorageClient` based on `token` and
`api_public_url` and update NDU storage handling.
### Issues
Rated to: #513, #590
### Testing
- Added many new initialization tests that show possible and prohibited
use cases
https://github.com/apify/apify-sdk-python/pull/576/files#diff-d64e1d346cc84a225ace3eb1d1ca826ff1e25c77064c9b1e0145552845fa7b41
- Running benchmark actor based on this and the related Crawlee branch
---------
Co-authored-by: Vlada Dusek <[email protected]>
Copy file name to clipboardExpand all lines: docs/02_concepts/09_running_webserver.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,9 @@ The URL is available in the following places:
13
13
14
14
- In Apify Console, on the Actor run details page as the **Container URL** field.
15
15
- In the API as the `container_url` property of the [Run object](https://docs.apify.com/api/v2#/reference/actors/run-object/get-run).
16
-
- In the Actor as the `Actor.config.container_url` property.
16
+
- In the Actor as the `Actor.configuration.container_url` property.
17
17
18
-
The web server running inside the container must listen at the port defined by the `Actor.config.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
18
+
The web server running inside the container must listen at the port defined by the `Actor.configuration.container_port` property. When running Actors locally, the port defaults to `4321`, so the web server will be accessible at `http://localhost:4321`.
Copy file name to clipboardExpand all lines: docs/04_upgrading/upgrading_to_v3.md
+36Lines changed: 36 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,42 @@ This page summarizes the breaking changes between Apify Python SDK v2.x and v3.0
9
9
10
10
Support for Python 3.9 has been dropped. The Apify Python SDK v3.x now requires Python 3.10 or later. Make sure your environment is running a compatible version before upgrading.
11
11
12
+
## Actor initialization and ServiceLocator changes
13
+
14
+
`Actor` initialization and global `service_locator` services setup is more strict and predictable.
15
+
- Services in `Actor` can't be changed after calling `Actor.init`, entering the `async with Actor` context manager or after requesting them from the `Actor`.
16
+
- Services in `Actor` can be different from services in Crawler.
17
+
18
+
19
+
**Now (v3.0):**
20
+
21
+
```python
22
+
from crawlee.crawlers import BasicCrawler
23
+
from crawlee.storage_clients import MemoryStorageClient
24
+
from crawlee.configuration import Configuration
25
+
from crawlee.events import LocalEventManager
26
+
from apify import Actor
27
+
28
+
asyncdefmain():
29
+
30
+
asyncwith Actor():
31
+
# This crawler will use same services as Actor and global service_locator
0 commit comments