Skip to content

Commit 8ccb5d4

Browse files
authored
refactor!: remove Base prefix from abstract class names (#980)
Relates: #906
1 parent 39a26e2 commit 8ccb5d4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+228
-209
lines changed

docs/guides/http_clients.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,4 @@ python -m pip install 'crawlee[all]'
4747

4848
## How HTTP clients work
4949

50-
We provide an abstract base class, <ApiLink to="class/BaseHttpClient">`BaseHttpClient`</ApiLink>, which defines the necessary interface for all HTTP clients. HTTP clients are responsible for sending requests and receiving responses, as well as managing cookies, headers, and proxies. They provide methods that are called from crawlers. To implement your own HTTP client, inherit from the <ApiLink to="class/BaseHttpClient">`BaseHttpClient`</ApiLink> class and implement the required methods.
50+
We provide an abstract base class, <ApiLink to="class/HttpClient">`HttpClient`</ApiLink>, which defines the necessary interface for all HTTP clients. HTTP clients are responsible for sending requests and receiving responses, as well as managing cookies, headers, and proxies. They provide methods that are called from crawlers. To implement your own HTTP client, inherit from the <ApiLink to="class/HttpClient">`HttpClient`</ApiLink> class and implement the required methods.

docs/guides/storages.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Crawlee offers multiple storage types for managing and persisting your crawling
3030

3131
## Storage clients
3232

33-
Storage clients in Crawlee are subclasses of <ApiLink to="class/BaseStorageClient">`BaseStorageClient`</ApiLink>. They handle interactions with different storage backends. For instance:
33+
Storage clients in Crawlee are subclasses of <ApiLink to="class/StorageClient">`StorageClient`</ApiLink>. They handle interactions with different storage backends. For instance:
3434

3535
- <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink>: Stores data in memory and persists it to the local file system.
3636
- [`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient): Manages storage on the [Apify platform](https://apify.com). Apify storage client is implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python).
@@ -52,7 +52,7 @@ where:
5252
- `{STORAGE_ID}`: The ID of the specific storage instance (default: `default`).
5353

5454
:::info NOTE
55-
The current <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> and its interface is quite old and not great. We plan to refactor it, together with the whole <ApiLink to="class/BaseStorageClient">`BaseStorageClient`</ApiLink> interface in the near future and it better and and easier to use. We also plan to introduce new storage clients for different storage backends - e.g. for [SQLLite](https://sqlite.org/).
55+
The current <ApiLink to="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> and its interface is quite old and not great. We plan to refactor it, together with the whole <ApiLink to="class/StorageClient">`StorageClient`</ApiLink> interface in the near future and it better and and easier to use. We also plan to introduce new storage clients for different storage backends - e.g. for [SQLite](https://sqlite.org/).
5656
:::
5757

5858
You can override default storage IDs using these environment variables: `CRAWLEE_DEFAULT_DATASET_ID`, `CRAWLEE_DEFAULT_KEY_VALUE_STORE_ID`, or `CRAWLEE_DEFAULT_REQUEST_QUEUE_ID`.

docs/upgrading/upgrading_to_v0x.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ This section summarizes the breaking changes between v0.5.x and v0.6.0.
1313

1414
The `Configuration` fields `chrome_executable_path`, `xvfb`, and `verbose_log` have been removed. The `chrome_executable_path` and `xvfb` fields were unused, while `verbose_log` can be replaced by setting `log_level` to `DEBUG`.
1515

16+
### Abstract base classes
17+
18+
We decided to move away from [Hungarian notation](https://en.wikipedia.org/wiki/Hungarian_notation) and remove all the `Base` prefixes from the abstract classes. It includes the following public classes:
19+
- `BaseStorageClient` -> `StorageClient`
20+
- `BaseBrowserController` -> `BrowserController`
21+
- `BaseBrowserPlugin` -> `BrowserPlugin`
22+
1623
## Upgrading to v0.5
1724

1825
This section summarizes the breaking changes between v0.4.x and v0.5.0.

src/crawlee/_service_locator.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from crawlee.configuration import Configuration
55
from crawlee.errors import ServiceConflictError
66
from crawlee.events import EventManager
7-
from crawlee.storage_clients import BaseStorageClient
7+
from crawlee.storage_clients import StorageClient
88

99

1010
@docs_group('Classes')
@@ -17,7 +17,7 @@ class ServiceLocator:
1717
def __init__(self) -> None:
1818
self._configuration: Configuration | None = None
1919
self._event_manager: EventManager | None = None
20-
self._storage_client: BaseStorageClient | None = None
20+
self._storage_client: StorageClient | None = None
2121

2222
# Flags to check if the services were already set.
2323
self._configuration_was_retrieved = False
@@ -74,7 +74,7 @@ def set_event_manager(self, event_manager: EventManager) -> None:
7474

7575
self._event_manager = event_manager
7676

77-
def get_storage_client(self) -> BaseStorageClient:
77+
def get_storage_client(self) -> StorageClient:
7878
"""Get the storage client."""
7979
if self._storage_client is None:
8080
from crawlee.storage_clients import MemoryStorageClient
@@ -88,7 +88,7 @@ def get_storage_client(self) -> BaseStorageClient:
8888
self._storage_client_was_retrieved = True
8989
return self._storage_client
9090

91-
def set_storage_client(self, storage_client: BaseStorageClient) -> None:
91+
def set_storage_client(self, storage_client: StorageClient) -> None:
9292
"""Set the storage client.
9393
9494
Args:
@@ -98,7 +98,7 @@ def set_storage_client(self, storage_client: BaseStorageClient) -> None:
9898
ServiceConflictError: If the storage client has already been retrieved before.
9999
"""
100100
if self._storage_client_was_retrieved:
101-
raise ServiceConflictError(BaseStorageClient, storage_client, self._storage_client)
101+
raise ServiceConflictError(StorageClient, storage_client, self._storage_client)
102102

103103
self._storage_client = storage_client
104104

src/crawlee/browsers/__init__.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,19 @@
1-
try:
1+
from crawlee._utils.try_import import install_import_hook as _install_import_hook
2+
from crawlee._utils.try_import import try_import as _try_import
3+
4+
_install_import_hook(__name__)
5+
6+
# The following imports are wrapped in try_import to handle optional dependencies,
7+
# ensuring the module can still function even if these dependencies are missing.
8+
with _try_import(__name__, 'BrowserPool'):
29
from ._browser_pool import BrowserPool
10+
with _try_import(__name__, 'PlaywrightBrowserController'):
311
from ._playwright_browser_controller import PlaywrightBrowserController
12+
with _try_import(__name__, 'PlaywrightBrowserPlugin'):
413
from ._playwright_browser_plugin import PlaywrightBrowserPlugin
5-
except ImportError as exc:
6-
raise ImportError(
7-
"To import this, you need to install the 'playwright' extra. "
8-
"For example, if you use pip, run `pip install 'crawlee[playwright]'`.",
9-
) from exc
1014

11-
__all__ = ['BrowserPool', 'PlaywrightBrowserController', 'PlaywrightBrowserPlugin']
15+
__all__ = [
16+
'BrowserPool',
17+
'PlaywrightBrowserController',
18+
'PlaywrightBrowserPlugin',
19+
]

src/crawlee/browsers/_base_browser_controller.py renamed to src/crawlee/browsers/_browser_controller.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
from crawlee.proxy_configuration import ProxyInfo
1616

1717

18-
class BaseBrowserController(ABC):
19-
"""An abstract class for managing browser instance and their pages."""
18+
class BrowserController(ABC):
19+
"""An abstract base class for managing browser instance and their pages."""
2020

2121
AUTOMATION_LIBRARY: str | None = None
2222
"""The name of the automation library that the controller is using."""

src/crawlee/browsers/_base_browser_plugin.py renamed to src/crawlee/browsers/_browser_plugin.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@
99
from collections.abc import Mapping
1010
from types import TracebackType
1111

12-
from crawlee.browsers._base_browser_controller import BaseBrowserController
12+
from crawlee.browsers._browser_controller import BrowserController
1313
from crawlee.browsers._types import BrowserType
1414

1515

16-
class BaseBrowserPlugin(ABC):
16+
class BrowserPlugin(ABC):
1717
"""An abstract base class for browser plugins.
1818
1919
Browser plugins act as wrappers around browser automation tools like Playwright,
@@ -59,7 +59,7 @@ def max_open_pages_per_browser(self) -> int:
5959
"""Return the maximum number of pages that can be opened in a single browser."""
6060

6161
@abstractmethod
62-
async def __aenter__(self) -> BaseBrowserPlugin:
62+
async def __aenter__(self) -> BrowserPlugin:
6363
"""Enter the context manager and initialize the browser plugin.
6464
6565
Raises:
@@ -80,7 +80,7 @@ async def __aexit__(
8080
"""
8181

8282
@abstractmethod
83-
async def new_browser(self) -> BaseBrowserController:
83+
async def new_browser(self) -> BrowserController:
8484
"""Create a new browser instance.
8585
8686
Returns:

src/crawlee/browsers/_browser_pool.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,15 @@
1414
from crawlee._utils.crypto import crypto_random_object_id
1515
from crawlee._utils.docs import docs_group
1616
from crawlee._utils.recurring_task import RecurringTask
17-
from crawlee.browsers._base_browser_controller import BaseBrowserController
17+
from crawlee.browsers._browser_controller import BrowserController
1818
from crawlee.browsers._playwright_browser_plugin import PlaywrightBrowserPlugin
1919
from crawlee.browsers._types import BrowserType, CrawleePage
2020

2121
if TYPE_CHECKING:
2222
from collections.abc import Mapping, Sequence
2323
from types import TracebackType
2424

25-
from crawlee.browsers._base_browser_plugin import BaseBrowserPlugin
25+
from crawlee.browsers._browser_plugin import BrowserPlugin
2626
from crawlee.fingerprint_suite import FingerprintGenerator
2727
from crawlee.proxy_configuration import ProxyInfo
2828

@@ -46,7 +46,7 @@ class BrowserPool:
4646

4747
def __init__(
4848
self,
49-
plugins: Sequence[BaseBrowserPlugin] | None = None,
49+
plugins: Sequence[BrowserPlugin] | None = None,
5050
*,
5151
operation_timeout: timedelta = timedelta(seconds=15),
5252
browser_inactive_threshold: timedelta = timedelta(seconds=10),
@@ -72,10 +72,10 @@ def __init__(
7272
self._operation_timeout = operation_timeout
7373
self._browser_inactive_threshold = browser_inactive_threshold
7474

75-
self._active_browsers = list[BaseBrowserController]()
75+
self._active_browsers = list[BrowserController]()
7676
"""A list of browsers currently active and being used to open pages."""
7777

78-
self._inactive_browsers = list[BaseBrowserController]()
78+
self._inactive_browsers = list[BrowserController]()
7979
"""A list of browsers currently inactive and not being used to open new pages,
8080
but may still contain open pages."""
8181

@@ -145,17 +145,17 @@ def with_default_plugin(
145145
return cls(plugins=[plugin], **kwargs)
146146

147147
@property
148-
def plugins(self) -> Sequence[BaseBrowserPlugin]:
148+
def plugins(self) -> Sequence[BrowserPlugin]:
149149
"""Return the browser plugins."""
150150
return self._plugins
151151

152152
@property
153-
def active_browsers(self) -> Sequence[BaseBrowserController]:
153+
def active_browsers(self) -> Sequence[BrowserController]:
154154
"""Return the active browsers in the pool."""
155155
return self._active_browsers
156156

157157
@property
158-
def inactive_browsers(self) -> Sequence[BaseBrowserController]:
158+
def inactive_browsers(self) -> Sequence[BrowserController]:
159159
"""Return the inactive browsers in the pool."""
160160
return self._inactive_browsers
161161

@@ -230,7 +230,7 @@ async def new_page(
230230
self,
231231
*,
232232
page_id: str | None = None,
233-
browser_plugin: BaseBrowserPlugin | None = None,
233+
browser_plugin: BrowserPlugin | None = None,
234234
proxy_info: ProxyInfo | None = None,
235235
) -> CrawleePage:
236236
"""Open a new page in a browser using the specified or a random browser plugin.
@@ -272,7 +272,7 @@ async def new_page_with_each_plugin(self) -> Sequence[CrawleePage]:
272272
async def _get_new_page(
273273
self,
274274
page_id: str,
275-
plugin: BaseBrowserPlugin,
275+
plugin: BrowserPlugin,
276276
proxy_info: ProxyInfo | None,
277277
) -> CrawleePage:
278278
"""Internal method to initialize a new page in a browser using the specified plugin."""
@@ -301,16 +301,16 @@ async def _get_new_page(
301301

302302
def _pick_browser_with_free_capacity(
303303
self,
304-
browser_plugin: BaseBrowserPlugin,
305-
) -> BaseBrowserController | None:
304+
browser_plugin: BrowserPlugin,
305+
) -> BrowserController | None:
306306
"""Pick a browser with free capacity that matches the specified plugin."""
307307
for browser in self._active_browsers:
308308
if browser.has_free_capacity and browser.AUTOMATION_LIBRARY == browser_plugin.AUTOMATION_LIBRARY:
309309
return browser
310310

311311
return None
312312

313-
async def _launch_new_browser(self, plugin: BaseBrowserPlugin) -> BaseBrowserController:
313+
async def _launch_new_browser(self, plugin: BrowserPlugin) -> BrowserController:
314314
"""Launch a new browser instance using the specified plugin."""
315315
browser = await plugin.new_browser()
316316
self._active_browsers.append(browser)

src/crawlee/browsers/_playwright_browser_controller.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from typing_extensions import override
1111

1212
from crawlee._utils.docs import docs_group
13-
from crawlee.browsers._base_browser_controller import BaseBrowserController
13+
from crawlee.browsers._browser_controller import BrowserController
1414
from crawlee.browsers._types import BrowserType
1515
from crawlee.fingerprint_suite import HeaderGenerator
1616

@@ -28,7 +28,7 @@
2828

2929

3030
@docs_group('Classes')
31-
class PlaywrightBrowserController(BaseBrowserController):
31+
class PlaywrightBrowserController(BrowserController):
3232
"""Controller for managing Playwright browser instances and their pages.
3333
3434
It provides methods to control browser instances, manage their pages, and handle context-specific

src/crawlee/browsers/_playwright_browser_plugin.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from crawlee import service_locator
1212
from crawlee._utils.context import ensure_context
1313
from crawlee._utils.docs import docs_group
14-
from crawlee.browsers._base_browser_plugin import BaseBrowserPlugin
14+
from crawlee.browsers._browser_plugin import BrowserPlugin
1515
from crawlee.browsers._playwright_browser_controller import PlaywrightBrowserController
1616

1717
if TYPE_CHECKING:
@@ -25,7 +25,7 @@
2525

2626

2727
@docs_group('Classes')
28-
class PlaywrightBrowserPlugin(BaseBrowserPlugin):
28+
class PlaywrightBrowserPlugin(BrowserPlugin):
2929
"""A plugin for managing Playwright automation library.
3030
3131
It is a plugin designed to manage browser instances using the Playwright automation library. It acts as a factory

0 commit comments

Comments
 (0)