Skip to content

Commit 457a779

Browse files
jbartadevvdusek
andauthored
docs: Added new getting started page (#232)
I tried to do the same thing as we did for the JS client [here](apify/apify-client-js#542) but for Python. The goal is to improve onboarding (adaptation) for using our API (in Python in this case) <img width="1768" alt="Snímek obrazovky 2024-05-29 v 17 17 17" src="https://github.com/apify/apify-client-python/assets/45016873/bbb4a7ae-5542-43a5-9ae9-182715cce88c"> --------- Co-authored-by: Vlada Dusek <[email protected]>
1 parent 004ec21 commit 457a779

File tree

6 files changed

+347
-217
lines changed

6 files changed

+347
-217
lines changed

docs/features.md

Lines changed: 0 additions & 37 deletions
This file was deleted.

docs/index.md

Lines changed: 345 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,345 @@
1+
---
2+
sidebar_label: 'Getting started'
3+
title: 'Getting started'
4+
---
5+
6+
# Apify API client for Python
7+
8+
`apify-client` is the official library to access the [Apify REST API](https://docs.apify.com/api/v2) from your Python applications. It provides useful features like automatic retries and convenience functions that improve the experience of using the Apify API. All requests and responses (including errors) are encoded in JSON format with UTF-8 encoding.
9+
10+
## Pre-requisites
11+
12+
`apify-client` requires Python version 3.8 or higher. Python is available for download on the [official website](https://www.python.org/). Check for your current Python version by running:
13+
14+
```bash
15+
python -V
16+
```
17+
18+
## Installation
19+
20+
You can install the client from its [PyPI listing](https://pypi.org/project/apify-client/).
21+
To do that, run:
22+
23+
```bash
24+
pip install apify-client
25+
```
26+
27+
## Authentication and initialization
28+
29+
To use the client, you need an [API token](https://docs.apify.com/platform/integrations/api#api-token). You can find your token under [Integrations](https://console.apify.com/account/integrations) tab in Apify Console. Copy the token and initialize the client by providing the token (`MY-APIFY-TOKEN`) as a parameter to the `ApifyClient` constructor.
30+
31+
```python
32+
# import Apify client
33+
from apify_client import ApifyClient
34+
35+
# Client initialization with the API token
36+
apify_client = ApifyClient('MY-APIFY-TOKEN')
37+
```
38+
39+
:::warning Secure access
40+
41+
The API token is used to authorize your requests to the Apify API. You can be charged for the usage of the underlying services, so do not share your API token with untrusted parties or expose it on the client side of your applications.
42+
43+
:::
44+
45+
## Quick start
46+
47+
One of the most common use cases is starting [Actors](https://docs.apify.com/platform/actors) (serverless programs running in the [Apify cloud](https://docs.apify.com/platform)) and getting results from their [datasets](https://docs.apify.com/platform/storage/dataset) (storage) after they finish the job (usually scraping, automation processes or data processing).
48+
49+
```python
50+
from apify_client import ApifyClient
51+
52+
apify_client = ApifyClient('MY-APIFY-TOKEN')
53+
54+
# Start an Actor and waits for it to finish
55+
actor_call = apify_client.actor('username/actor-name').call()
56+
57+
# Get a Actor's dataset
58+
dataset_client = apify_client.dataset(actor_call['defaultDatasetId'])
59+
60+
# Lists items from the Actor's dataset
61+
dataset_items = dataset_client.list_items().items
62+
```
63+
64+
### Running Actors
65+
66+
To start an Actor, you can use the [ActorClient](/reference/class/ActorClient) (`client.actor()`) and pass the Actor's ID (e.g. `john-doe/my-cool-actor`) to define which Actor you want to run. The Actor's ID is a combination of the username and the Actor owner’s username. You can run both your own Actors and [Actors from Apify Store](https://docs.apify.com/platform/actors/running/actors-in-store).
67+
68+
#### Passing input to the Actor
69+
70+
To define the Actor's input, you can pass a run input to the [`call()`](/reference/class/ActorClient#call) method. The input can be any JSON object that the Actor expects (respects the Actor's [input schema](https://docs.apify.com/platform/actors/development/actor-definition/input-schema)).The input is used to pass configuration to the Actor, such as URLs to scrape, search terms, or any other data.
71+
72+
```python
73+
from apify_client import ApifyClient
74+
75+
apify_client = ApifyClient('MY-APIFY-TOKEN')
76+
77+
# Define the input for the Actor
78+
actor_input = {
79+
'some': 'input',
80+
}
81+
82+
# Start an Actor and waits for it to finish
83+
actor_call = apify_client.actor('username/actor-name').call(run_input=actor_input)
84+
```
85+
86+
### Getting results from the dataset
87+
88+
To get the results from the dataset, you can use the [DatasetClient](/reference/class/DatasetClient) (`client.dataset()`) and [`list_items()`](/reference/class/DatasetClient#list_items) method. You need to pass the dataset ID to define which dataset you want to access. You can get the dataset ID from the Actor's run dictionary (represented by `defaultDatasetId`).
89+
90+
```python
91+
from apify_client import ApifyClient
92+
93+
apify_client = ApifyClient('MY-APIFY-TOKEN')
94+
95+
# Get dataset
96+
dataset_client = apify_client.dataset('dataset-id')
97+
98+
# Lists items from the Actor's dataset
99+
dataset_items = dataset_client.list_items().items
100+
```
101+
102+
:::note Dataset access
103+
104+
Running an Actor might take time, depending on the Actor's complexity and the amount of data it processes. If you want only to get data and have an immediate response you should access the existing dataset of the finished [Actor run](https://docs.apify.com/platform/actors/running/runs-and-builds#runs).
105+
106+
:::
107+
108+
## Usage concepts
109+
110+
The `ApifyClient` interface follows a generic pattern that applies to all of its components. By calling individual methods of `ApifyClient`, specific clients that target individual API resources are created. There are two types of those clients:
111+
112+
- [`actorClient`](/reference/class/ActorClient): a client for the management of a single resource
113+
- [`actorCollectionClient`](/reference/class/ActorCollectionClient): a client for the collection of resources
114+
115+
```python
116+
from apify_client import ApifyClient
117+
118+
apify_client = ApifyClient('MY-APIFY-TOKEN')
119+
120+
# Collection clients do not require a parameter
121+
actor_collection_client = apify_client.actors()
122+
123+
# Create an actor with the name: my-actor
124+
my_actor = actor_collection_client.create(name='my-actor')
125+
126+
# List all of your actors
127+
actor_list = actor_collection_client.list().items
128+
```
129+
130+
:::note Resource identification
131+
132+
The resource ID can be either the `id` of the said resource, or a combination of your `username/resource-name`.
133+
134+
:::
135+
136+
```python
137+
# Resource clients accept an ID of the resource
138+
actor_client = apify_client.actor('username/actor-name')
139+
140+
# Fetch the 'username/actor-name' object from the API
141+
my_actor = actor_client.get()
142+
143+
# Start the run of 'username/actor-name' and return the Run object
144+
my_actor_run = actor_client.start()
145+
```
146+
147+
### Nested clients
148+
149+
Sometimes clients return other clients. That's to simplify working with nested collections, such as runs of a given Actor.
150+
151+
```python
152+
from apify_client import ApifyClient
153+
154+
apify_client = ApifyClient('MY-APIFY-TOKEN')
155+
156+
actor_client = apify_client.actor('username/actor-name')
157+
runs_client = actor_client.runs()
158+
159+
# List the last 10 runs of the Actor
160+
actor_runs = runs_client.list(limit=10, desc=True).items
161+
162+
# Select the last run of the Actor that finished with a SUCCEEDED status
163+
last_succeeded_run_client = actor_client.last_run(status='SUCCEEDED')
164+
165+
# Get dataset
166+
actor_run_dataset_client = last_succeeded_run_client.dataset()
167+
168+
# Fetch items from the run's dataset
169+
dataset_items = actor_run_dataset_client.list_items().items
170+
```
171+
172+
The quick access to `dataset` and other storage directly from the run client can be used with the [`last_run()`](/reference/class/ActorClient#last_run) method.
173+
174+
## Features
175+
176+
Based on the endpoint, the client automatically extracts the relevant data
177+
and returns it in the expected format.
178+
Date strings are automatically converted to `datetime.datetime` objects.
179+
For exceptions, we throw an [`ApifyApiError`](/reference/class/ApifyApiError),
180+
which wraps the plain JSON errors returned by API and enriches them with other context for easier debugging.
181+
182+
```python
183+
from apify_client import ApifyClient
184+
185+
apify_client = ApifyClient('MY-APIFY-TOKEN')
186+
187+
try:
188+
# Try to list items from non-existing dataset
189+
dataset_client = apify_client.dataset('not-existing-dataset-id')
190+
dataset_items = dataset_client.list_items().items
191+
except Exception as ApifyApiError:
192+
# The exception is an instance of ApifyApiError
193+
print(ApifyApiError)
194+
```
195+
196+
### Retries with exponential backoff
197+
198+
Network communication sometimes fails.
199+
The client will automatically retry requests that failed due to a network error,
200+
an internal error of the Apify API (HTTP 500+) or rate limit error (HTTP 429).
201+
By default, it will retry up to 8 times.
202+
First retry will be attempted after ~500ms, second after ~1000ms and so on.
203+
You can configure those parameters using the `max_retries` and `min_delay_between_retries_millis` options
204+
of the [`ApifyClient`](/reference/class/ApifyClient) constructor.
205+
206+
```python
207+
from apify_client import ApifyClient
208+
209+
apify_client = ApifyClient(
210+
token='MY-APIFY-TOKEN',
211+
max_retries=8,
212+
min_delay_between_retries_millis=500, # 0.5s
213+
timeout_secs=360, # 6 mins
214+
)
215+
```
216+
217+
### Support for asynchronous usage
218+
219+
The package offers an asynchronous version of the client,
220+
[`ApifyClientAsync`](/reference/class/ApifyClientAsync),
221+
which allows you to work with the Apify API in an asynchronous way, using the standard `async`/`await` syntax [offered by Python](https://docs.python.org/3/library/asyncio-task.html).
222+
223+
For example, to run an actor and asynchronously stream its log while it's running, you can use this snippet:
224+
225+
```python
226+
from apify_client import ApifyClientAsync
227+
apify_client_async = ApifyClientAsync('MY-APIFY-TOKEN')
228+
229+
async def main():
230+
run = await apify_client_async.actor('my-actor').start()
231+
232+
async with apify_client_async.run(run['id']).log().stream() as async_log_stream:
233+
if async_log_stream:
234+
async for line in async_log_stream.aiter_lines():
235+
print(line)
236+
237+
asyncio.run(main())
238+
```
239+
240+
### Logging
241+
242+
The library logs some useful debug information to the `apify_client` logger
243+
when sending requests to the Apify API.
244+
To have them printed out to the standard output, you need to add a handler to the logger:
245+
246+
```python
247+
import logging
248+
apify_client_logger = logging.getLogger('apify_client')
249+
apify_client_logger.setLevel(logging.DEBUG)
250+
apify_client_logger.addHandler(logging.StreamHandler())
251+
```
252+
253+
The log records have useful properties added with the `extra` argument,
254+
like `attempt`, `status_code`, `url`, `client_method` and `resource_id`.
255+
To print those out, you'll need to use a custom log formatter.
256+
To learn more about log formatters and how to use them,
257+
please refer to the official Python [documentation on logging](https://docs.python.org/3/howto/logging.html#formatters).
258+
259+
### Convenience functions and options
260+
261+
Some actions can't be performed by the API itself, such as indefinite waiting for an actor run to finish (because of network timeouts).
262+
The client provides convenient [`call()`](/reference/class/ActorClient#call)
263+
and [`wait_for_finish()`](/reference/class/ActorClient#wait_for_finish) methods that do that.
264+
265+
[Key-value store](https://docs.apify.com/platform/storage/key-value-store) records can be retrieved as objects, buffers or streams via the respective options,
266+
dataset items can be fetched as individual objects or serialized data, or iterated asynchronously.
267+
268+
```python
269+
from apify_client import ApifyClient
270+
271+
apify_client = ApifyClient('MY-APIFY-TOKEN')
272+
273+
# Start an Actor and waits for it to finish
274+
finished_actor_run = apify_client.actor('username/actor-name').call()
275+
276+
# Starts an Actor and waits maximum 60s (1 minute) for the finish
277+
actor_run = apify_client.actor('username/actor-name').start(wait_for_finish=60)
278+
```
279+
280+
### Pagination
281+
282+
Most methods named `list` or `list_something` return a [`ListPage`](/reference/class/ListPage) object,
283+
containing properties `items`, `total`, `offset`, `count` and `limit`.
284+
There are some exceptions though, like `list_keys` or `list_head` which paginate differently.
285+
The results you're looking for are always stored under `items` and you can use the `limit`
286+
property to get only a subset of results. Other properties can be available depending on the method.
287+
288+
```python
289+
from apify_client import ApifyClient
290+
291+
apify_client = ApifyClient('MY-APIFY-TOKEN')
292+
293+
# Resource clients accept an ID of the resource
294+
dataset_client = apify_client.dataset('dataset-id')
295+
296+
# Number of items per page
297+
limit = 1000
298+
# Initial offset
299+
offset = 0
300+
# List to store all items
301+
all_items = []
302+
303+
while True:
304+
response = dataset_client.list_items(limit=limit, offset=offset)
305+
items = response.items
306+
total = response.total
307+
308+
print(f'Fetched {len(items)} items')
309+
310+
# Merge new items with other already loaded items
311+
all_items.extend(items)
312+
313+
# If there are no more items to fetch, exit the loading
314+
if offset + limit >= total:
315+
break
316+
317+
offset += limit
318+
319+
print(f'Overall fetched {len(all_items)} items')
320+
```
321+
322+
### Streaming resources
323+
324+
Some resources (dataset items, key-value store records and logs)
325+
support streaming the resource from the Apify API in parts,
326+
without having to download the whole (potentially huge) resource to memory before processing it.
327+
328+
The methods to stream these resources are
329+
[`DatasetClient.stream_items()`](/reference/class/DatasetClient#stream_items),
330+
[`KeyValueStoreClient.stream_record()`](/reference/class/KeyValueStoreClient#stream_record),
331+
and [`LogClient.stream()`](/reference/class/LogClient#stream).
332+
333+
Instead of the parsed resource, they return a raw, context-managed
334+
[`httpx.Response`](https://www.python-httpx.org/quickstart/#streaming-responses) object,
335+
which has to be consumed using the `with` keyword,
336+
and automatically gets closed once you exit the `with` block, preventing memory leaks and unclosed connections.
337+
338+
For example, to consume an actor run log in a streaming fashion, you can use this snippet:
339+
340+
```python
341+
with apify_client.run('MY-RUN-ID').log().stream() as log_stream:
342+
if log_stream:
343+
for line in log_stream.iter_lines():
344+
print(line)
345+
```

0 commit comments

Comments
 (0)