Skip to content

Commit 0ba04d1

Browse files
authored
refactor!: Update the crawlers & storage clients structure (#828)
## Description Update the dir structure of crawlers & storage clients, as discussed earlier on the Slack. I decided to export nothing on the 2nd level because of the extras & it would also be pretty huge (taking into account we have also models there). E.g. for BS crawler: ```diff - from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext + from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext ``` Or for memory storage client: ```diff - from memory_storage_client import MemoryStorageClient + from storage_clients import MemoryStorageClient ``` This should be generally more aligned with the concepts of Crawlee. Of course, quite a breaking change though. Better to do it now than later. This will not be applied to the JS version because sub-pkgs like `PlaywrightCrawler` are its own package. ## Issue - Closes: #764 ## Breaking changes ### Crawlers & CrawlingContexts - All crawler and crawling context classes have been consolidated into a single sub-package called `crawlers`. - The affected classes include: `AbstractHttpCrawler`, `AbstractHttpParser`, `BasicCrawler`, `BasicCrawlerOptions`, `BasicCrawlingContext`, `BeautifulSoupCrawler`, `BeautifulSoupCrawlingContext`, `BeautifulSoupParserType`, `ContextPipeline`, `HttpCrawler`, `HttpCrawlerOptions`, `HttpCrawlingContext`, `HttpCrawlingResult`, `ParsedHttpCrawlingContext`, `ParselCrawler`, `ParselCrawlingContext`, `PlaywrightCrawler`, `PlaywrightCrawlingContext`, `PlaywrightPreNavCrawlingContext`. Example update: ```diff - from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext + from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext ``` ### Storage clients - All storage client classes have been moved into a single sub-package called `storage_clients`. - The affected classes include: `MemoryStorageClient`, `BaseStorageClient`. Example update: ```diff - from crawlee.memory_storage_client import MemoryStorageClient + from crawlee.storage_clients import MemoryStorageClient ``` ### CurlImpersonateHttpClient - The `CurlImpersonateHttpClient` changed its import location. Example update: ```diff - from crawlee.http_clients.curl_impersonate import CurlImpersonateHttpClient + from crawlee.http_clients import CurlImpersonateHttpClient ```
1 parent c58e973 commit 0ba04d1

File tree

175 files changed

+479
-345
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

175 files changed

+479
-345
lines changed

docs/deployment/code/apify/crawler_as_actor_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from apify import Actor
22

3-
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
3+
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/add_data_to_dataset_bs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
3+
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/add_data_to_dataset_pw.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
3+
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/beautifulsoup_crawler.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
import asyncio
22
from datetime import timedelta
33

4-
from crawlee.basic_crawler import BasicCrawlingContext
5-
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
4+
from crawlee.crawlers import BasicCrawlingContext, BeautifulSoupCrawler, BeautifulSoupCrawlingContext
65

76

87
async def main() -> None:

docs/examples/code/beautifulsoup_crawler_stop.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
3+
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/capture_screenshot_using_playwright.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
3+
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
44
from crawlee.storages import KeyValueStore
55

66

docs/examples/code/crawl_all_links_on_website_bs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
3+
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/crawl_all_links_on_website_pw.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
3+
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/crawl_multiple_urls_bs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
3+
from crawlee.crawlers import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
44

55

66
async def main() -> None:

docs/examples/code/crawl_multiple_urls_pw.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import asyncio
22

3-
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
3+
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
44

55

66
async def main() -> None:

0 commit comments

Comments
 (0)