Skip to content

Commit cd8b05e

Browse files
authored
docs: Move code samples to files and other updates (#378)
### Description - Relocated all code samples to dedicated files for being able to perform static analysis. - Updated home page example and moved it to a dedicated file. - Updated `pyproject.toml` configurations to include only directories with source files (`website/` is added there, same as in Crawlee). - Revised most of the documentation content. - Updated legacy examples (new imports, types, ...). - Added a new Crawlee guide. - Fixed the Crawlee Playwright example in the README. - The text in the Concepts section was remained as it is, only some minor updates there. Code samples were moved to separate files and corrected if necessary as was said before. - Ensured all examples in the documentation are fully executable, meeting the prerequisites for the upcoming "Run on Apify" feature. - The docs content files were renamed to undescore-based, in the same way as we have in Crawlee. ### Issues - Closes: #250 ### Testing - All code samples pass checks with Ruff and Mypy. ### Checklist - [x] CI passed
1 parent 5d8274a commit cd8b05e

File tree

98 files changed

+2327
-1708
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+2327
-1708
lines changed

CHANGELOG.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ All notable changes to this project will be documented in this file.
8181

8282
### 🚀 Features
8383

84-
- Add actor standby port ([#220](https://github.com/apify/apify-sdk-python/pull/220)) ([6d0d87d](https://github.com/apify/apify-sdk-python/commit/6d0d87dcaedaf42d8eeb7d23c56f6b102434cbcb)) by [@jirimoravcik](https://github.com/jirimoravcik)
84+
- Add Actor standby port ([#220](https://github.com/apify/apify-sdk-python/pull/220)) ([6d0d87d](https://github.com/apify/apify-sdk-python/commit/6d0d87dcaedaf42d8eeb7d23c56f6b102434cbcb)) by [@jirimoravcik](https://github.com/jirimoravcik)
8585

8686

8787
## [1.7.1](https://github.com/apify/apify-sdk-python/releases/tag/v1.7.1) (2024-05-23)
@@ -122,12 +122,12 @@ All notable changes to this project will be documented in this file.
122122
- Add test for get_env and is_at_home ([#29](https://github.com/apify/apify-sdk-python/pull/29)) ([cc45afb](https://github.com/apify/apify-sdk-python/commit/cc45afbf848db3626054c599cb3a5a2972a48748)) by [@drobnikj](https://github.com/drobnikj)
123123
- Updating pull request toolkit config [INTERNAL] ([387143c](https://github.com/apify/apify-sdk-python/commit/387143ccf2c32a99c95e9931e5649e558d35daeb)) by [@mtrunkat](https://github.com/mtrunkat)
124124
- Add documentation for `StorageManager` and `StorageClientManager`, open_* methods in `Actor` ([#34](https://github.com/apify/apify-sdk-python/pull/34)) ([3f6b942](https://github.com/apify/apify-sdk-python/commit/3f6b9426dc03fea40d80af2e4c8f04ecf2620e8a)) by [@jirimoravcik](https://github.com/jirimoravcik)
125-
- Add tests for actor lifecycle ([#35](https://github.com/apify/apify-sdk-python/pull/35)) ([4674728](https://github.com/apify/apify-sdk-python/commit/4674728905be5076283ff3795332866e8bef6ee8)) by [@drobnikj](https://github.com/drobnikj)
125+
- Add tests for Actor lifecycle ([#35](https://github.com/apify/apify-sdk-python/pull/35)) ([4674728](https://github.com/apify/apify-sdk-python/commit/4674728905be5076283ff3795332866e8bef6ee8)) by [@drobnikj](https://github.com/drobnikj)
126126
- Add docs for `Dataset`, `KeyValueStore`, and `RequestQueue` ([#37](https://github.com/apify/apify-sdk-python/pull/37)) ([174548e](https://github.com/apify/apify-sdk-python/commit/174548e952b47ee519d1a05c0821a2c42c2fddf6)) by [@jirimoravcik](https://github.com/jirimoravcik)
127127
- Docs string for memory storage clients ([#31](https://github.com/apify/apify-sdk-python/pull/31)) ([8f55d46](https://github.com/apify/apify-sdk-python/commit/8f55d463394307b004193efc43b67b44d030f6de)) by [@drobnikj](https://github.com/drobnikj)
128-
- Add test for storage actor methods ([#39](https://github.com/apify/apify-sdk-python/pull/39)) ([b89bbcf](https://github.com/apify/apify-sdk-python/commit/b89bbcfdcae4f436a68e92f1f60628aea1036dde)) by [@drobnikj](https://github.com/drobnikj)
128+
- Add test for storage Actor methods ([#39](https://github.com/apify/apify-sdk-python/pull/39)) ([b89bbcf](https://github.com/apify/apify-sdk-python/commit/b89bbcfdcae4f436a68e92f1f60628aea1036dde)) by [@drobnikj](https://github.com/drobnikj)
129129
- Various fixes and improvements ([#41](https://github.com/apify/apify-sdk-python/pull/41)) ([5bae238](https://github.com/apify/apify-sdk-python/commit/5bae238821b3b63c73d0cbadf4b478511cb045d2)) by [@jirimoravcik](https://github.com/jirimoravcik)
130-
- Add the rest unit tests for actor ([#40](https://github.com/apify/apify-sdk-python/pull/40)) ([72d92ea](https://github.com/apify/apify-sdk-python/commit/72d92ea080670ceecc234c149058d2ebe763e3a8)) by [@drobnikj](https://github.com/drobnikj)
130+
- Add the rest unit tests for Actor ([#40](https://github.com/apify/apify-sdk-python/pull/40)) ([72d92ea](https://github.com/apify/apify-sdk-python/commit/72d92ea080670ceecc234c149058d2ebe763e3a8)) by [@drobnikj](https://github.com/drobnikj)
131131
- Decrypt input secrets if there are some ([#45](https://github.com/apify/apify-sdk-python/pull/45)) ([6eb1630](https://github.com/apify/apify-sdk-python/commit/6eb163077341218a3f9dcf566986d7464f6ab09e)) by [@drobnikj](https://github.com/drobnikj)
132132
- Add a few integration tests ([#48](https://github.com/apify/apify-sdk-python/pull/48)) ([1843f48](https://github.com/apify/apify-sdk-python/commit/1843f48845e724e1c2682b8d09a6b5c48c57d9ec)) by [@drobnikj](https://github.com/drobnikj)
133133
- Add integration tests for storages, proxy configuration ([#49](https://github.com/apify/apify-sdk-python/pull/49)) ([fd0566e](https://github.com/apify/apify-sdk-python/commit/fd0566ed3b8c85c7884f8bba3cf7394215fabed0)) by [@jirimoravcik](https://github.com/jirimoravcik)
@@ -139,4 +139,4 @@ All notable changes to this project will be documented in this file.
139139
- Key error for storage name ([#28](https://github.com/apify/apify-sdk-python/pull/28)) ([83b30a9](https://github.com/apify/apify-sdk-python/commit/83b30a90df4d3b173302f1c6006b346091fced60)) by [@drobnikj](https://github.com/drobnikj)
140140

141141

142-
<!-- generated by git-cliff -->
142+
<!-- generated by git-cliff -->

Makefile

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
.PHONY: clean install-dev build publish-to-pypi lint type-check unit-tests unit-tests-cov \
22
integration-tests format check-code build-api-reference build-docs run-docs
33

4-
DIRS_WITH_CODE = src tests
5-
64
# This is default for local testing, but GitHub workflows override it to a higher value in CI
75
INTEGRATION_TESTS_CONCURRENCY = 1
86

@@ -22,11 +20,11 @@ publish-to-pypi:
2220
poetry publish --no-interaction -vv
2321

2422
lint:
25-
poetry run ruff format --check $(DIRS_WITH_CODE)
26-
poetry run ruff check $(DIRS_WITH_CODE)
23+
poetry run ruff format --check
24+
poetry run ruff check
2725

2826
type-check:
29-
poetry run mypy $(DIRS_WITH_CODE)
27+
poetry run mypy
3028

3129
unit-tests:
3230
poetry run pytest --numprocesses=auto --verbose --cov=src/apify tests/unit
@@ -38,8 +36,8 @@ integration-tests:
3836
poetry run pytest --numprocesses=$(INTEGRATION_TESTS_CONCURRENCY) --verbose tests/integration
3937

4038
format:
41-
poetry run ruff check --fix $(DIRS_WITH_CODE)
42-
poetry run ruff format $(DIRS_WITH_CODE)
39+
poetry run ruff check --fix
40+
poetry run ruff format
4341

4442
# The check-code target runs a series of checks equivalent to those performed by pre-commit hooks
4543
# and the run_checks.yaml GitHub Actions workflow.
@@ -49,7 +47,7 @@ build-api-reference:
4947
cd website && poetry run ./build_api_reference.sh
5048

5149
build-docs:
52-
cd website && npm clean-install && poetry run npm run build
50+
cd website && poetry run npm clean-install && poetry run npm run build
5351

5452
run-docs: build-api-reference
55-
cd website && npm clean-install && poetry run npm run start
53+
cd website && poetry run npm clean-install && poetry run npm run start

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,11 @@ Below are few examples demonstrating how to use the Apify SDK with some web scra
3636
This example illustrates how to integrate the Apify SDK with [HTTPX](https://www.python-httpx.org/) and [BeautifulSoup](https://pypi.org/project/beautifulsoup4/) to scrape data from web pages.
3737

3838
```python
39-
from apify import Actor
4039
from bs4 import BeautifulSoup
4140
from httpx import AsyncClient
4241

42+
from apify import Actor
43+
4344

4445
async def main() -> None:
4546
async with Actor:
@@ -84,8 +85,9 @@ async def main() -> None:
8485
This example demonstrates how to use the Apify SDK alongside `PlaywrightCrawler` from [Crawlee](https://crawlee.dev/python) to perform web scraping.
8586

8687
```python
87-
from apify import Actor, Request
88-
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
88+
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
89+
90+
from apify import Actor
8991

9092

9193
async def main() -> None:

docs/01-overview/01-introduction.mdx

Lines changed: 0 additions & 69 deletions
This file was deleted.

docs/01-overview/03-structure.mdx

Lines changed: 0 additions & 52 deletions
This file was deleted.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: Introduction
3+
sidebar_label: Introduction
4+
---
5+
6+
import CodeBlock from '@theme/CodeBlock';
7+
8+
import IntroductionExample from '!!raw-loader!./code/01_introduction.py';
9+
10+
The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) using Python.
11+
12+
<CodeBlock className="language-python">
13+
{IntroductionExample}
14+
</CodeBlock>
15+
16+
## What are Actors?
17+
18+
Actors are serverless cloud programs capable of performing tasks in a web browser, similar to what a human can do. These tasks can range from simple operations, such as filling out forms or unsubscribing from services, to complex jobs like scraping and processing large numbers of web pages.
19+
20+
Actors can be executed locally or on the [Apify platform](https://docs.apify.com/platform/), which provides features for running them at scale, monitoring, scheduling, and even publishing and monetizing them.
21+
22+
If you're new to Apify, refer to the Apify platform documentation to learn [what Apify is](https://docs.apify.com/platform/about).
23+
24+
## Quick Start
25+
26+
This section provides a quick start guide for creating and running Actors.
27+
28+
### Creating Actors
29+
30+
To create and run Actors using the Apify Console, see the [Console documentation](https://docs.apify.com/academy/getting-started/creating-actors#choose-your-template).
31+
32+
For creating and running Python Actors locally, refer to the documentation for [creating and running Python Actors locally](./running_locally).
33+
34+
### Guides
35+
36+
Integrate the Apify SDK with popular web scraping libraries by following these guides:
37+
- [Requests or HTTPX](../guides/requests_and_httpx)
38+
- [Beautiful Soup](../guides/beautiful_soup)
39+
- [Playwright](../guides/playwright)
40+
- [Selenium](../guides/selenium)
41+
- [Scrapy](../guides/scrapy)
42+
43+
### Usage Concepts
44+
45+
For a deeper understanding of the Apify SDK's features, refer to the **Usage concepts** section in the sidebar. Key topics include:
46+
- [Actor lifecycle](../concepts/actor-lifecycle)
47+
- [Working with storages](../concepts/storages)
48+
- [Handling Actor events](../concepts/actor-events)
49+
- [Using proxies](../concepts/proxy-management)
50+
51+
## Installing the Apify SDK Separately
52+
53+
When creating an Actor using the Apify CLI, the Apify SDK for Python is installed automatically. If you want to install it independently, use the following command:
54+
55+
```bash
56+
pip install apify
57+
```
58+
59+
If your goal is not to develop Apify Actors but to interact with the Apify API from Python, consider using the [Apify API client for Python](https://docs.apify.com/api/client/python) directly.

docs/01-overview/02-running-locally.mdx renamed to docs/01_overview/02_running_actors_locally.mdx

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,40 @@
11
---
2-
title: Running Python Actors locally
2+
title: Running Actor locally
33
sidebar_label: Running Actors locally
44
---
55

66
import Tabs from '@theme/Tabs';
77
import TabItem from '@theme/TabItem';
88
import CodeBlock from '@theme/CodeBlock';
99

10+
In this page, you'll learn how to create and run Apify Actors locally on your computer.
11+
1012
## Requirements
1113

12-
The Apify SDK requires Python version 3.8 or above to run Python actors locally.
14+
The Apify SDK requires Python version 3.9 or above to run Python Actors locally.
1315

1416
## Creating your first Actor
1517

16-
To create a new Apify Actor on your computer, you can use the [Apify CLI](https://docs.apify.com/cli),
17-
and select one of the [Python Actor templates](https://apify.com/templates?category=python).
18+
To create a new Apify Actor on your computer, you can use the [Apify CLI](https://docs.apify.com/cli), and select one of the [Python Actor templates](https://apify.com/templates/categories/python).
1819

19-
For example, to create an Actor from the "[beta] Python SDK" template,
20-
you can use the [`apify create` command](https://docs.apify.com/cli/docs/reference#apify-create-actorname).
20+
For example, to create an Actor from the Python SDK template, you can use the [`apify create`](https://docs.apify.com/cli/docs/reference#apify-create-actorname) command.
2121

2222
```bash
2323
apify create my-first-actor --template python-start
2424
```
2525

26-
This will create a new folder called `my-first-actor`,
27-
download and extract the "Getting started with Python" Actor template there,
28-
create a virtual environment in `my-first-actor/.venv`,
29-
and install the Actor dependencies in it.
26+
This will create a new folder called `my-first-actor`, download and extract the "Getting started with Python" Actor template there, create a virtual environment in `my-first-actor/.venv`, and install the Actor dependencies in it.
3027

3128
## Running the Actor
3229

33-
To run the Actor, you can use the [`apify run` command](https://docs.apify.com/cli/docs/reference#apify-run):
30+
To run the Actor, you can use the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command:
3431

3532
```bash
3633
cd my-first-actor
3734
apify run
3835
```
3936

40-
This will activate the virtual environment in `.venv` (if no other virtual environment is activated yet),
41-
then start the Actor, passing the right environment variables for local running,
42-
and configure it to use local storages from the `storage` folder.
37+
This will activate the virtual environment in `.venv` (if no other virtual environment is activated yet), then start the Actor, passing the right environment variables for local running, and configure it to use local storages from the `storage` folder.
4338

4439
The Actor input, for example, will be in `storage/key_value_stores/default/INPUT.json`.
4540

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: Actor structure
3+
sidebar_label: Actor structure
4+
---
5+
6+
import CodeBlock from '@theme/CodeBlock';
7+
import Tabs from '@theme/Tabs';
8+
import TabItem from '@theme/TabItem';
9+
10+
import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/main.py';
11+
import MainExample from '!!raw-loader!./code/actor_structure/__main__.py';
12+
13+
All Python Actor templates follow the same structure.
14+
15+
The `.actor/` directory contains the [Actor configuration](https://docs.apify.com/platform/actors/development/actor-config), such as the Actor's definition and input schema, and the Dockerfile necessary to run the Actor on the Apify platform.
16+
17+
The Actor's runtime dependencies are specified in the `requirements.txt` file,
18+
which follows the [standard requirements file format](https://pip.pypa.io/en/stable/reference/requirements-file-format/).
19+
20+
The Actor's source code is in the `src/` folder. This folder contains two important files: `main.py`, which contains the main function of the Actor, and `__main__.py`, which is the entrypoint of the Actor package, setting up the Actor [logger](../concepts/logging) and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).
21+
22+
<Tabs>
23+
<TabItem value="main.py" label="main.py" default>
24+
<CodeBlock className="language-python">
25+
{MainExample}
26+
</CodeBlock>
27+
</TabItem>
28+
<TabItem value="__main__.py" label="__main.py__">
29+
<CodeBlock className="language-python">
30+
{UnderscoreMainExample}
31+
</CodeBlock>
32+
</TabItem>
33+
</Tabs>
34+
35+
If you want to modify the Actor structure, you need to make sure that your Actor is executable as a module, via `python -m src`, as that is the command started by `apify run` in the Apify CLI. We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import httpx
2+
from bs4 import BeautifulSoup
3+
4+
from apify import Actor
5+
6+
7+
async def main() -> None:
8+
async with Actor:
9+
actor_input = await Actor.get_input()
10+
async with httpx.AsyncClient() as client:
11+
response = await client.get(actor_input['url'])
12+
soup = BeautifulSoup(response.content, 'html.parser')
13+
data = {'url': actor_input['url'], 'title': soup.title.string if soup.title else None}
14+
await Actor.push_data(data)

docs/01_overview/code/actor_structure/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)