Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ All notable changes to this project will be documented in this file.

### 🚀 Features

- Add actor standby port ([#220](https://github.com/apify/apify-sdk-python/pull/220)) ([6d0d87d](https://github.com/apify/apify-sdk-python/commit/6d0d87dcaedaf42d8eeb7d23c56f6b102434cbcb)) by [@jirimoravcik](https://github.com/jirimoravcik)
- Add Actor standby port ([#220](https://github.com/apify/apify-sdk-python/pull/220)) ([6d0d87d](https://github.com/apify/apify-sdk-python/commit/6d0d87dcaedaf42d8eeb7d23c56f6b102434cbcb)) by [@jirimoravcik](https://github.com/jirimoravcik)


## [1.7.1](https://github.com/apify/apify-sdk-python/releases/tag/v1.7.1) (2024-05-23)
Expand Down Expand Up @@ -122,12 +122,12 @@ All notable changes to this project will be documented in this file.
- Add test for get_env and is_at_home ([#29](https://github.com/apify/apify-sdk-python/pull/29)) ([cc45afb](https://github.com/apify/apify-sdk-python/commit/cc45afbf848db3626054c599cb3a5a2972a48748)) by [@drobnikj](https://github.com/drobnikj)
- Updating pull request toolkit config [INTERNAL] ([387143c](https://github.com/apify/apify-sdk-python/commit/387143ccf2c32a99c95e9931e5649e558d35daeb)) by [@mtrunkat](https://github.com/mtrunkat)
- Add documentation for `StorageManager` and `StorageClientManager`, open_* methods in `Actor` ([#34](https://github.com/apify/apify-sdk-python/pull/34)) ([3f6b942](https://github.com/apify/apify-sdk-python/commit/3f6b9426dc03fea40d80af2e4c8f04ecf2620e8a)) by [@jirimoravcik](https://github.com/jirimoravcik)
- Add tests for actor lifecycle ([#35](https://github.com/apify/apify-sdk-python/pull/35)) ([4674728](https://github.com/apify/apify-sdk-python/commit/4674728905be5076283ff3795332866e8bef6ee8)) by [@drobnikj](https://github.com/drobnikj)
- Add tests for Actor lifecycle ([#35](https://github.com/apify/apify-sdk-python/pull/35)) ([4674728](https://github.com/apify/apify-sdk-python/commit/4674728905be5076283ff3795332866e8bef6ee8)) by [@drobnikj](https://github.com/drobnikj)
- Add docs for `Dataset`, `KeyValueStore`, and `RequestQueue` ([#37](https://github.com/apify/apify-sdk-python/pull/37)) ([174548e](https://github.com/apify/apify-sdk-python/commit/174548e952b47ee519d1a05c0821a2c42c2fddf6)) by [@jirimoravcik](https://github.com/jirimoravcik)
- Docs string for memory storage clients ([#31](https://github.com/apify/apify-sdk-python/pull/31)) ([8f55d46](https://github.com/apify/apify-sdk-python/commit/8f55d463394307b004193efc43b67b44d030f6de)) by [@drobnikj](https://github.com/drobnikj)
- Add test for storage actor methods ([#39](https://github.com/apify/apify-sdk-python/pull/39)) ([b89bbcf](https://github.com/apify/apify-sdk-python/commit/b89bbcfdcae4f436a68e92f1f60628aea1036dde)) by [@drobnikj](https://github.com/drobnikj)
- Add test for storage Actor methods ([#39](https://github.com/apify/apify-sdk-python/pull/39)) ([b89bbcf](https://github.com/apify/apify-sdk-python/commit/b89bbcfdcae4f436a68e92f1f60628aea1036dde)) by [@drobnikj](https://github.com/drobnikj)
- Various fixes and improvements ([#41](https://github.com/apify/apify-sdk-python/pull/41)) ([5bae238](https://github.com/apify/apify-sdk-python/commit/5bae238821b3b63c73d0cbadf4b478511cb045d2)) by [@jirimoravcik](https://github.com/jirimoravcik)
- Add the rest unit tests for actor ([#40](https://github.com/apify/apify-sdk-python/pull/40)) ([72d92ea](https://github.com/apify/apify-sdk-python/commit/72d92ea080670ceecc234c149058d2ebe763e3a8)) by [@drobnikj](https://github.com/drobnikj)
- Add the rest unit tests for Actor ([#40](https://github.com/apify/apify-sdk-python/pull/40)) ([72d92ea](https://github.com/apify/apify-sdk-python/commit/72d92ea080670ceecc234c149058d2ebe763e3a8)) by [@drobnikj](https://github.com/drobnikj)
- Decrypt input secrets if there are some ([#45](https://github.com/apify/apify-sdk-python/pull/45)) ([6eb1630](https://github.com/apify/apify-sdk-python/commit/6eb163077341218a3f9dcf566986d7464f6ab09e)) by [@drobnikj](https://github.com/drobnikj)
- Add a few integration tests ([#48](https://github.com/apify/apify-sdk-python/pull/48)) ([1843f48](https://github.com/apify/apify-sdk-python/commit/1843f48845e724e1c2682b8d09a6b5c48c57d9ec)) by [@drobnikj](https://github.com/drobnikj)
- Add integration tests for storages, proxy configuration ([#49](https://github.com/apify/apify-sdk-python/pull/49)) ([fd0566e](https://github.com/apify/apify-sdk-python/commit/fd0566ed3b8c85c7884f8bba3cf7394215fabed0)) by [@jirimoravcik](https://github.com/jirimoravcik)
Expand All @@ -139,4 +139,4 @@ All notable changes to this project will be documented in this file.
- Key error for storage name ([#28](https://github.com/apify/apify-sdk-python/pull/28)) ([83b30a9](https://github.com/apify/apify-sdk-python/commit/83b30a90df4d3b173302f1c6006b346091fced60)) by [@drobnikj](https://github.com/drobnikj)


<!-- generated by git-cliff -->
<!-- generated by git-cliff -->
16 changes: 7 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
.PHONY: clean install-dev build publish-to-pypi lint type-check unit-tests unit-tests-cov \
integration-tests format check-code build-api-reference build-docs run-docs

DIRS_WITH_CODE = src tests

# This is default for local testing, but GitHub workflows override it to a higher value in CI
INTEGRATION_TESTS_CONCURRENCY = 1

Expand All @@ -22,11 +20,11 @@ publish-to-pypi:
poetry publish --no-interaction -vv

lint:
poetry run ruff format --check $(DIRS_WITH_CODE)
poetry run ruff check $(DIRS_WITH_CODE)
poetry run ruff format --check
poetry run ruff check

type-check:
poetry run mypy $(DIRS_WITH_CODE)
poetry run mypy

unit-tests:
poetry run pytest --numprocesses=auto --verbose --cov=src/apify tests/unit
Expand All @@ -38,8 +36,8 @@ integration-tests:
poetry run pytest --numprocesses=$(INTEGRATION_TESTS_CONCURRENCY) --verbose tests/integration

format:
poetry run ruff check --fix $(DIRS_WITH_CODE)
poetry run ruff format $(DIRS_WITH_CODE)
poetry run ruff check --fix
poetry run ruff format

# The check-code target runs a series of checks equivalent to those performed by pre-commit hooks
# and the run_checks.yaml GitHub Actions workflow.
Expand All @@ -49,7 +47,7 @@ build-api-reference:
cd website && poetry run ./build_api_reference.sh

build-docs:
cd website && npm clean-install && poetry run npm run build
cd website && poetry run npm clean-install && poetry run npm run build

run-docs: build-api-reference
cd website && npm clean-install && poetry run npm run start
cd website && poetry run npm clean-install && poetry run npm run start
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,11 @@ Below are few examples demonstrating how to use the Apify SDK with some web scra
This example illustrates how to integrate the Apify SDK with [HTTPX](https://www.python-httpx.org/) and [BeautifulSoup](https://pypi.org/project/beautifulsoup4/) to scrape data from web pages.

```python
from apify import Actor
from bs4 import BeautifulSoup
from httpx import AsyncClient

from apify import Actor


async def main() -> None:
async with Actor:
Expand Down Expand Up @@ -84,8 +85,9 @@ async def main() -> None:
This example demonstrates how to use the Apify SDK alongside `PlaywrightCrawler` from [Crawlee](https://crawlee.dev/python) to perform web scraping.

```python
from apify import Actor, Request
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext

from apify import Actor


async def main() -> None:
Expand Down
69 changes: 0 additions & 69 deletions docs/01-overview/01-introduction.mdx

This file was deleted.

52 changes: 0 additions & 52 deletions docs/01-overview/03-structure.mdx

This file was deleted.

59 changes: 59 additions & 0 deletions docs/01_overview/01_introduction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: Introduction
sidebar_label: Introduction
---

import CodeBlock from '@theme/CodeBlock';

import IntroductionExample from '!!raw-loader!./code/01_introduction.py';

The Apify SDK for Python is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) using Python.

<CodeBlock className="language-python">
{IntroductionExample}
</CodeBlock>

## What are Actors?

Actors are serverless cloud programs capable of performing tasks in a web browser, similar to what a human can do. These tasks can range from simple operations, such as filling out forms or unsubscribing from services, to complex jobs like scraping and processing large numbers of web pages.

Actors can be executed locally or on the [Apify platform](https://docs.apify.com/platform/), which provides features for running them at scale, monitoring, scheduling, and even publishing and monetizing them.

If you're new to Apify, refer to the Apify platform documentation to learn [what Apify is](https://docs.apify.com/platform/about).

## Quick Start

This section provides a quick start guide for creating and running Actors.

### Creating Actors

To create and run Actors using the Apify Console, see the [Console documentation](https://docs.apify.com/academy/getting-started/creating-actors#choose-your-template).

For creating and running Python Actors locally, refer to the documentation for [creating and running Python Actors locally](./running_locally).

### Guides

Integrate the Apify SDK with popular web scraping libraries by following these guides:
- [Requests or HTTPX](../guides/requests_and_httpx)
- [Beautiful Soup](../guides/beautiful_soup)
- [Playwright](../guides/playwright)
- [Selenium](../guides/selenium)
- [Scrapy](../guides/scrapy)

### Usage Concepts

For a deeper understanding of the Apify SDK's features, refer to the **Usage concepts** section in the sidebar. Key topics include:
- [Actor lifecycle](../concepts/actor-lifecycle)
- [Working with storages](../concepts/storages)
- [Handling Actor events](../concepts/actor-events)
- [Using proxies](../concepts/proxy-management)

## Installing the Apify SDK Separately

When creating an Actor using the Apify CLI, the Apify SDK for Python is installed automatically. If you want to install it independently, use the following command:

```bash
pip install apify
```

If your goal is not to develop Apify Actors but to interact with the Apify API from Python, consider using the [Apify API client for Python](https://docs.apify.com/api/client/python) directly.
Original file line number Diff line number Diff line change
@@ -1,45 +1,40 @@
---
title: Running Python Actors locally
title: Running Actor locally
sidebar_label: Running Actors locally
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';

In this page, you'll learn how to create and run Apify Actors locally on your computer.

## Requirements

The Apify SDK requires Python version 3.8 or above to run Python actors locally.
The Apify SDK requires Python version 3.9 or above to run Python Actors locally.

## Creating your first Actor

To create a new Apify Actor on your computer, you can use the [Apify CLI](https://docs.apify.com/cli),
and select one of the [Python Actor templates](https://apify.com/templates?category=python).
To create a new Apify Actor on your computer, you can use the [Apify CLI](https://docs.apify.com/cli), and select one of the [Python Actor templates](https://apify.com/templates/categories/python).

For example, to create an Actor from the "[beta] Python SDK" template,
you can use the [`apify create` command](https://docs.apify.com/cli/docs/reference#apify-create-actorname).
For example, to create an Actor from the Python SDK template, you can use the [`apify create`](https://docs.apify.com/cli/docs/reference#apify-create-actorname) command.

```bash
apify create my-first-actor --template python-start
```

This will create a new folder called `my-first-actor`,
download and extract the "Getting started with Python" Actor template there,
create a virtual environment in `my-first-actor/.venv`,
and install the Actor dependencies in it.
This will create a new folder called `my-first-actor`, download and extract the "Getting started with Python" Actor template there, create a virtual environment in `my-first-actor/.venv`, and install the Actor dependencies in it.

## Running the Actor

To run the Actor, you can use the [`apify run` command](https://docs.apify.com/cli/docs/reference#apify-run):
To run the Actor, you can use the [`apify run`](https://docs.apify.com/cli/docs/reference#apify-run) command:

```bash
cd my-first-actor
apify run
```

This will activate the virtual environment in `.venv` (if no other virtual environment is activated yet),
then start the Actor, passing the right environment variables for local running,
and configure it to use local storages from the `storage` folder.
This will activate the virtual environment in `.venv` (if no other virtual environment is activated yet), then start the Actor, passing the right environment variables for local running, and configure it to use local storages from the `storage` folder.

The Actor input, for example, will be in `storage/key_value_stores/default/INPUT.json`.

Expand Down
35 changes: 35 additions & 0 deletions docs/01_overview/03_actor_structure.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: Actor structure
sidebar_label: Actor structure
---

import CodeBlock from '@theme/CodeBlock';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

import UnderscoreMainExample from '!!raw-loader!./code/actor_structure/main.py';
import MainExample from '!!raw-loader!./code/actor_structure/__main__.py';

All Python Actor templates follow the same structure.

The `.actor/` directory contains the [Actor configuration](https://docs.apify.com/platform/actors/development/actor-config), such as the Actor's definition and input schema, and the Dockerfile necessary to run the Actor on the Apify platform.

The Actor's runtime dependencies are specified in the `requirements.txt` file,
which follows the [standard requirements file format](https://pip.pypa.io/en/stable/reference/requirements-file-format/).

The Actor's source code is in the `src/` folder. This folder contains two important files: `main.py`, which contains the main function of the Actor, and `__main__.py`, which is the entrypoint of the Actor package, setting up the Actor [logger](../concepts/logging) and executing the Actor's main function via [`asyncio.run()`](https://docs.python.org/3/library/asyncio-runner.html#asyncio.run).

<Tabs>
<TabItem value="main.py" label="main.py" default>
<CodeBlock className="language-python">
{MainExample}
</CodeBlock>
</TabItem>
<TabItem value="__main__.py" label="__main.py__">
<CodeBlock className="language-python">
{UnderscoreMainExample}
</CodeBlock>
</TabItem>
</Tabs>

If you want to modify the Actor structure, you need to make sure that your Actor is executable as a module, via `python -m src`, as that is the command started by `apify run` in the Apify CLI. We recommend keeping the entrypoint for the Actor in the `src/__main__.py` file.
14 changes: 14 additions & 0 deletions docs/01_overview/code/01_introduction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import httpx
from bs4 import BeautifulSoup

from apify import Actor


async def main() -> None:
async with Actor:
actor_input = await Actor.get_input()
async with httpx.AsyncClient() as client:
response = await client.get(actor_input['url'])
soup = BeautifulSoup(response.content, 'html.parser')
data = {'url': actor_input['url'], 'title': soup.title.string if soup.title else None}
await Actor.push_data(data)
Empty file.
6 changes: 6 additions & 0 deletions docs/01_overview/code/actor_structure/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import asyncio

from .main import main

if __name__ == '__main__':
asyncio.run(main())
Loading
Loading