You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/http_clients.mdx
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,24 +36,24 @@ class HttpClient {
36
36
%% Specific classes
37
37
%% ========================
38
38
39
+
class ImpitHttpClient
40
+
39
41
class HttpxHttpClient
40
42
41
43
class CurlImpersonateHttpClient
42
44
43
-
class ImpitHttpClient
44
-
45
45
%% ========================
46
46
%% Inheritance arrows
47
47
%% ========================
48
48
49
+
HttpClient --|> ImpitHttpClient
49
50
HttpClient --|> HttpxHttpClient
50
51
HttpClient --|> CurlImpersonateHttpClient
51
-
HttpClient --|> ImpitHttpClient
52
52
```
53
53
54
54
## Switching between HTTP clients
55
55
56
-
Crawlee currently provides three main HTTP clients: <ApiLinkto="class/HttpxHttpClient">`HttpxHttpClient`</ApiLink>, which uses the `httpx` library, <ApiLinkto="class/CurlImpersonateHttpClient">`CurlImpersonateHttpClient`</ApiLink>, which uses the `curl-cffi` library, and <ApiLinkto="class/ImpitHttpClient">`ImpitHttpClient`</ApiLink>, which uses the `impit` library. You can switch between them by setting the `http_client` parameter when initializing a crawler class. The default HTTP client is <ApiLinkto="class/HttpxHttpClient">`HttpxHttpClient`</ApiLink>.
56
+
Crawlee currently provides three main HTTP clients: <ApiLinkto="class/ImpitHttpClient">`ImpitHttpClient`</ApiLink>, which uses the `impit` library, <ApiLinkto="class/HttpxHttpClient">`HttpxHttpClient`</ApiLink>, which uses the `httpx` library with `browserforge` for custom HTTP headers and fingerprints, and <ApiLinkto="class/CurlImpersonateHttpClient">`CurlImpersonateHttpClient`</ApiLink>, which uses the `curl-cffi` library. You can switch between them by setting the `http_client` parameter when initializing a crawler class. The default HTTP client is <ApiLinkto="class/ImpitHttpClient">`ImpitHttpClient`</ApiLink>. For more details on anti-blocking features, see our [avoid getting blocked guide](./avoid-blocking).
57
57
58
58
Below are examples of how to configure the HTTP client for the <ApiLinkto="class/ParselCrawler">`ParselCrawler`</ApiLink>:
59
59
@@ -77,18 +77,18 @@ Below are examples of how to configure the HTTP client for the <ApiLink to="clas
77
77
78
78
## Installation requirements
79
79
80
-
Since <ApiLinkto="class/HttpxHttpClient">`HttpxHttpClient`</ApiLink> is the default HTTP client, it's included with the base Crawlee installation and requires no additional packages.
80
+
Since <ApiLinkto="class/ImpitHttpClient">`ImpitHttpClient`</ApiLink> is the default HTTP client, it's included with the base Crawlee installation and requires no additional packages.
81
81
82
82
For <ApiLinkto="class/CurlImpersonateHttpClient">`CurlImpersonateHttpClient`</ApiLink>, you need to install Crawlee with the `curl-impersonate` extra:
83
83
84
84
```sh
85
85
python -m pip install 'crawlee[curl-impersonate]'
86
86
```
87
87
88
-
For <ApiLinkto="class/ImpitHttpClient">`ImpitHttpClient`</ApiLink>, you need to install Crawlee with the `impit` extra:
88
+
For <ApiLinkto="class/HttpxHttpClient">`HttpxHttpClient`</ApiLink>, you need to install Crawlee with the `httpx` extra:
89
89
90
90
```sh
91
-
python -m pip install 'crawlee[impit]'
91
+
python -m pip install 'crawlee[httpx]'
92
92
```
93
93
94
94
Alternatively, you can install all available extras to get access to all HTTP clients and features:
0 commit comments