Skip to content

Commit 55dd033

Browse files
authored
chore: Add Python Client examples [docs] (#191)
Part of apify/apify-web/issues/3616 Added multiple examples, consulted with @vdusek. First three are the same as [JavaScript ones](apify/apify-client-js#548), just reimplemented to Python. Last one is using Python-specific Pandas library for data analysis.
1 parent 17d5cf7 commit 55dd033

File tree

2 files changed

+144
-0
lines changed

2 files changed

+144
-0
lines changed

docs/examples.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
sidebar_label: Examples
3+
title: 'Code examples'
4+
---
5+
6+
## Passing an input to the Actor
7+
8+
The fastest way to get results from an Actor is to pass input directly to the `call` function.
9+
We can set up the input, pass it to `call` function and get the reference of running Actor (or wait for finish).
10+
11+
```python
12+
from apify_client import ApifyClient
13+
14+
# Client initialization with the API token
15+
apify_client = ApifyClient(token='MY_APIFY_TOKEN')
16+
17+
actor_client = apify_client.actor('apify/instagram-hashtag-scraper')
18+
19+
input_data = { 'hashtags': ['rainbow'], 'resultsLimit': 20 }
20+
21+
# Run the Actor and wait for it to finish up to 60 seconds.
22+
# Input is not persisted for next runs.
23+
run_data = actor_client.call(run_input=input_data, timeout_secs=60)
24+
```
25+
26+
## Manipulating with tasks
27+
28+
To run multiple inputs with the same Actor, most convenient way is to create multiple [tasks](https://docs.apify.com/platform/actors/running/tasks) with different inputs.
29+
Task input is persisted on Apify platform when task is created.
30+
31+
```python
32+
33+
import asyncio
34+
35+
from apify_client import ApifyClientAsync
36+
from apify_client.clients.resource_clients import TaskClientAsync
37+
38+
animal_hashtags = ['zebra', 'lion', 'hippo']
39+
40+
41+
async def run_apify_task(client: TaskClientAsync) -> dict:
42+
result = await client.call()
43+
return result or {}
44+
45+
46+
async def main() -> None:
47+
apify_client = ApifyClientAsync(token='MY_APIFY_TOKEN')
48+
49+
# Create Apify tasks
50+
51+
apify_tasks: list[dict] = []
52+
apify_tasks_client = apify_client.tasks()
53+
54+
for hashtag in animal_hashtags:
55+
apify_task = await apify_tasks_client.create(
56+
name=f'hashtags-{hashtag}',
57+
actor_id='apify/instagram-hashtag-scraper',
58+
task_input={'hashtags': [hashtag], 'resultsLimit': 20},
59+
memory_mbytes=1024,
60+
)
61+
apify_tasks.append(apify_task)
62+
63+
print('Tasks created:', apify_tasks)
64+
65+
# Create Apify task clients
66+
67+
apify_task_clients: list[TaskClientAsync] = []
68+
69+
for apify_task in apify_tasks:
70+
task_id = apify_task['id']
71+
apify_task_client = apify_client.task(task_id)
72+
apify_task_clients.append(apify_task_client)
73+
74+
print('Task clients created:', apify_task_clients)
75+
76+
# Execute Apify tasks
77+
78+
run_apify_tasks = [run_apify_task(client) for client in apify_task_clients]
79+
task_run_results = await asyncio.gather(*run_apify_tasks)
80+
81+
print('Task results:', task_run_results)
82+
83+
84+
if __name__ == '__main__':
85+
asyncio.run(main())
86+
```
87+
88+
## Getting latest data from an Actor, joining datasets
89+
90+
Actor data are stored to [datasets](https://docs.apify.com/platform/storage/dataset). Datasets can be retrieved from Actor runs.
91+
Dataset items can be listed with pagination.
92+
Also, datasets can be merged together to make analysis further on with single file as dataset can be exported to various data format (CSV, JSON, XSLX, XML).
93+
[Integrations](https://docs.apify.com/platform/integrations) can do the trick as well.
94+
95+
```python
96+
from apify_client import ApifyClient
97+
98+
# Client initialization with the API token
99+
apify_client = ApifyClient(token='MY_APIFY_TOKEN')
100+
101+
actor_client = apify_client.actor('apify/instagram-hashtag-scraper')
102+
103+
actor_runs = actor_client.runs()
104+
105+
# See pagination to understand how to get more datasets
106+
actor_datasets = actor_runs.list(limit=20)
107+
108+
merging_dataset = apify_client.datasets().get_or_create(name='merge-dataset')
109+
110+
for dataset_item in actor_datasets.items:
111+
# Dataset items can be handled here. Dataset items can be paginated
112+
dataset_items = apify_client.dataset(dataset_id=dataset_item['id']).list_items(limit=1000)
113+
114+
# Items can be pushed to single dataset
115+
apify_client.dataset(merging_dataset['id']).push_items(dataset_items.items)
116+
117+
# ...
118+
```
119+
120+
## Integration with data analysis libraries (Pandas)
121+
122+
The Apify API client for Python can be easily integrated with data analysis libraries.
123+
Following example demonstrates how to load items from the last dataset run and pass them to a Pandas DataFrame for further analysis.
124+
Pandas is a data analysis library that provides data structures and functions to efficiently manipulate large datasets.
125+
126+
```python
127+
from apify_client import ApifyClient
128+
import pandas
129+
130+
# Initialize the Apify client
131+
client = ApifyClient(token="MY_APIFY_TOKEN")
132+
133+
# Load items from last dataset run
134+
dataset_data = client.actor('apify/web-scraper').last_run().dataset().list_items()
135+
136+
# Pass dataset items to Pandas DataFrame
137+
data_frame = pandas.DataFrame(dataset_data.items)
138+
139+
print(data_frame.info)
140+
```

website/sidebars.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ module.exports = {
1212
type: 'doc',
1313
id: 'usage-concepts',
1414
},
15+
{
16+
type: 'doc',
17+
id: 'examples',
18+
},
1519
{
1620
type: 'doc',
1721
id: 'changelog',

0 commit comments

Comments
 (0)