Skip to content

Commit b8afba7

Browse files
authored
docs: Update Playwright home page code example (#1548)
- Set headless mode to True by default to improve performance when running via "Run on Apify", preventing slow execution and potential timeout errors. - Align the Python example more closely with its JavaScript counterpart to keep the documentation consistent across languages. - Add `asyncio.run(main())` so the snippet is fully copy-and-play ready. - Also, a minor update regarding the docs code examples (same as in apify/apify-sdk-python#673).
1 parent cb7e609 commit b8afba7

File tree

3 files changed

+16
-13
lines changed

3 files changed

+16
-13
lines changed

.github/workflows/run_code_checks.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,3 +40,4 @@ jobs:
4040
docs_check:
4141
name: Docs check
4242
uses: apify/workflows/.github/workflows/python_docs_check.yaml@main
43+
secrets: inherit

Makefile

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@
44
# This is default for local testing, but GitHub workflows override it to a higher value in CI
55
E2E_TESTS_CONCURRENCY = 1
66

7-
# Placeholder token; replace with a real one for local docs testing if needed
8-
APIFY_TOKEN = apify_api_token_placeholder
9-
107
clean:
118
rm -rf .mypy_cache .pytest_cache .ruff_cache build dist htmlcov .coverage
129

@@ -58,4 +55,4 @@ build-docs:
5855
cd website && corepack enable && yarn && uv run yarn build
5956

6057
run-docs: build-api-reference
61-
export APIFY_SIGNING_TOKEN=$(APIFY_TOKEN) && cd website && corepack enable && yarn && uv run yarn start
58+
cd website && corepack enable && yarn && uv run yarn start
Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,42 @@
1+
import asyncio
2+
13
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
24

35

46
async def main() -> None:
57
crawler = PlaywrightCrawler(
68
max_requests_per_crawl=10, # Limit the max requests per crawl.
7-
headless=False, # Show the browser window.
8-
browser_type='firefox', # Use the Firefox browser.
9+
headless=True, # Run in headless mode (set to False to see the browser).
10+
browser_type='firefox', # Use Firefox browser.
911
)
1012

1113
# Define the default request handler, which will be called for every request.
1214
@crawler.router.default_handler
1315
async def request_handler(context: PlaywrightCrawlingContext) -> None:
1416
context.log.info(f'Processing {context.request.url} ...')
1517

16-
# Extract and enqueue all links found on the page.
17-
await context.enqueue_links()
18-
1918
# Extract data from the page using Playwright API.
2019
data = {
2120
'url': context.request.url,
2221
'title': await context.page.title(),
23-
'content': (await context.page.content())[:100],
2422
}
2523

2624
# Push the extracted data to the default dataset.
2725
await context.push_data(data)
2826

27+
# Extract all links on the page and enqueue them.
28+
await context.enqueue_links()
29+
2930
# Run the crawler with the initial list of URLs.
3031
await crawler.run(['https://crawlee.dev'])
3132

32-
# Export the entire dataset to a JSON file.
33-
await crawler.export_data('results.json')
33+
# Export the entire dataset to a CSV file.
34+
await crawler.export_data('results.csv')
3435

35-
# Or work with the data directly.
36+
# Or access the data directly.
3637
data = await crawler.get_data()
3738
crawler.log.info(f'Extracted data: {data.items}')
39+
40+
41+
if __name__ == '__main__':
42+
asyncio.run(main())

0 commit comments

Comments
 (0)