Skip to content

Releases: apify/crawlee-python

0.3.3

05 Sep 11:38
0e3f595

Choose a tag to compare

0.3.3 (2024-09-05)

🐛 Bug Fixes

0.3.2

04 Sep 11:28

Choose a tag to compare

0.3.2 (2024-09-04)

🐛 Bug Fixes

0.3.1

30 Aug 09:54
3facb48

Choose a tag to compare

0.3.1 (2024-08-30)

🚀 Features

0.3.0

27 Aug 10:08
1c8d3f1

Choose a tag to compare

0.3.0 (2024-08-27)

🚀 Features

🐛 Bug Fixes

Refactor

0.2.1

05 Aug 11:28

Choose a tag to compare

0.2.1 (2024-08-05)

🐛 Bug Fixes

  • Do not import curl impersonate in http clients init (#396) (3bb8009)

0.2.0

05 Aug 09:33
1e838a2

Choose a tag to compare

0.2.0 (2024-08-05)

🚀 Features

0.1.2

30 Jul 19:21
5d6354a

Choose a tag to compare

0.1.2 (2024-07-30)

🚀 Features

🐛 Bug Fixes

  • Minor log fix (#341) (0688bf1)
  • Also use error_handler for context pipeline errors (#331) (7a66445)
  • Strip whitespace from href in enqueue_links (#346) (8a3174a)
  • Warn instead of crashing when an empty dataset is being exported (#342) (22b95d1)
  • Avoid Github rate limiting in project bootstrapping test (#364) (992f07f)
  • Pass crawler configuration to storages (#375) (b2d3a52)
  • Purge request queue on repeated crawler runs (#377) (7ad3d69)

0.1.1

19 Jul 12:15
49ff6e2

Choose a tag to compare

Features

  • Support for proxy configuration in PlaywrightCrawler.
  • Blocking detection in PlaywrightCrawler.
  • Expose crawler.log to public.

Bug fixes

  • Fix Pylance reportPrivateImportUsage errors by defining __all__ in modules __init__.py.
  • Set HTTPX logging level to WARNING by default.
  • Fix CLI behavior with existing project folders

0.1.0

09 Jul 06:49

Choose a tag to compare

Features

Why is Crawlee the preferred choice for web scraping and crawling?

Why use Crawlee instead of just a random HTTP library with an HTML parser?

  • Unified interface for HTTP & headless browser crawling.
  • Automatic parallel crawling based on available system resources.
  • Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking).
  • Automatic retries on errors or when you’re getting blocked.
  • Integrated proxy rotation and session management.
  • Configurable request routing - direct URLs to the appropriate handlers.
  • Persistent queue for URLs to crawl.
  • Pluggable storage of both tabular data and files.
  • Robust error handling.

Why to use Crawlee rather than Scrapy?

  • Crawlee has out-of-the-box support for headless browser crawling (Playwright).
  • Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code.
  • Complete type hint coverage.
  • Based on standard Asyncio.

0.0.7

27 Jun 15:00
fdea3d1

Choose a tag to compare

Fixes

  • selector handling for RETRY_CSS_SELECTORS in _handle_blocked_request in BeautifulSoupCrawler
  • selector handling in enqueue_links in BeautifulSoupCrawler
  • improve AutoscaledPool state management