You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Description
- `ApifyRequestQueueClient` can be created in two access modes -
`single`, `shared`:
- `shared` - current version that supports multiple producers/consumers
and locking of requests. More Apify API calls, higher API usage -> more
expensive, slower.
- `single` - new constrained client for self-consumer and multiple
constrained producers. (Detailed constraints in the docs). Fewer Apify
API calls, lower API usage -> cheaper, faster.
- Most of the `ApifyRequestQueueClient` tests were moved away from
actor-based tests, so that they can be parametrized for both variants of
the `ApifyRequestQueueClients` and to make local debugging easier.
#### Usage:
RequestQueue with `shared`:
`await
RequestQueue.open(storage_client=ApifyStorageClient(request_queue_access="shared"))`
RequestQueue with default `single`:
`await RequestQueue.open(storage_client=ApifyStorageClient())`
#### Stats difference:
The full client is doing significantly more API calls and regarding the
API usage it is doing 50% more RequestQueue writes and also more
RequestQueue reads.
**Example rq related stats for crawler started with 1000 requests:**
`shared`:
API calls: 2123
API usage: {'readCount': 1000, 'writeCount': 3000, 'deleteCount': 0,
'headItemReadCount': 0, 'storageBytes': 104035}
`single`:
API calls: 1059
API usage: {'readCount': 3, 'writeCount': 2000, 'deleteCount': 0,
'headItemReadCount': 14, 'storageBytes': 103826}
### Issues
- Part of: #513
---------
Co-authored-by: Jan Buchar <[email protected]>
Copy file name to clipboardExpand all lines: docs/04_upgrading/upgrading_to_v3.md
+49-11Lines changed: 49 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,6 +69,13 @@ Some changes in the related model classes:
69
69
## Removed Actor.config property
70
70
-`Actor.config` property has been removed. Use `Actor.configuration` instead.
71
71
72
+
## Default storage ids in configuration changed to None
73
+
-`Configuration.default_key_value_store_id` changed from `'default'` to `None`.
74
+
-`Configuration.default_dataset_id` changed from `'default'` to `None`.
75
+
-`Configuration.default_request_queue_id` changed from `'default'` to `None`.
76
+
77
+
Previously using the default storage without specifying its `id` in `Configuration` would lead to using specific storage with id `'default'`. Now it will use newly created unnamed storage with `'id'` assigned by the Apify platform, consecutive calls to get the default storage will return the same storage.
78
+
72
79
## Actor initialization and ServiceLocator changes
73
80
74
81
`Actor` initialization and global `service_locator` services setup is more strict and predictable.
@@ -102,20 +109,51 @@ async def main():
102
109
)
103
110
```
104
111
105
-
## Removed Actor.config property
106
-
-`Actor.config` property has been removed. Use `Actor.configuration` instead.
112
+
### Changes in storage clients
107
113
108
-
## Default storage ids in configuration changed to None
109
-
-`Configuration.default_key_value_store_id` changed from `'default'` to `None`.
110
-
-`Configuration.default_dataset_id` changed from `'default'` to `None`.
111
-
-`Configuration.default_request_queue_id` changed from `'default'` to `None`.
114
+
## Explicit control over storage clients used in Actor
115
+
- It is now possible to have full control over which storage clients are used by the `Actor`. To make development of Actors convenient, the `Actor` has two storage clients. One that is used when running on Apify platform or when opening storages with `force_cloud=True` and the other client that is used when running outside the Apify platform. The `Actor` has reasonable defaults and for the majority of use-cases there is no need to change it. However, if you need to use a different storage client, you can set it up before entering `Actor` context through `service_locator`.
116
+
117
+
**Now (v3.0):**
118
+
119
+
```python
120
+
from crawlee import service_locator
121
+
from apify.storage_clients import ApifyStorageClient, SmartApifyStorageClient, MemoryStorageClient
Previously using the default storage without specifying its `id` in `Configuration` would lead to using specific storage with id `'default'`. Now it will use newly created unnamed storage with `'id'` assigned by the Apify platform, consecutive calls to get the default storage will return the same storage.
114
136
115
-
## Storages
137
+
## The default use of optimized ApifyRequestQueueClient
116
138
117
-
<!-- TODO -->
139
+
- The default client for working with Apify platform based `RequestQueue` is now optimized and simplified client which does significantly lower amount of API calls, but does not support multiple consumers working on the same queue. It is cheaper and faster and is suitable for the majority of the use cases.
140
+
- The full client is still available, but it has to be explicitly requested via `request_queue_access="shared"` argument when using the `ApifyStorageClient`.
118
141
119
-
## Storage clients
142
+
**Now (v3.0):**
143
+
144
+
```python
145
+
from crawlee import service_locator
146
+
from apify.storage_clients import ApifyStorageClient, SmartApifyStorageClient
147
+
from apify import Actor
120
148
121
-
<!-- TODO -->
149
+
150
+
asyncdefmain():
151
+
# Full client that supports multiple consumers of the Apify Request Queue
0 commit comments