You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v1.1.1: Update documentation with comprehensive feature coverage
- Add crawl(), parse_content(), and connect_browser() examples to README
- Document all client parameters including browser automation and logging
- Update environment variables for browser credentials
- Fix browser connection example import and URL issues
| **Scrape every website** | `scrape` | Scrape every website using Bright's scraping and unti bot-detection capabilities
15
-
| **Web search** | `search` | Search google and other search engines by query (supports batch searches)
16
-
| **Search chatGPT** | `search_chatGPT` | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
17
-
| **Search linkedin** | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data
18
-
| **Scrape linkedin** | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data
19
-
| **Download functions** | `download_snapshot`, `download_content` | Download content for both sync and async requests
20
-
| **Client class** | `bdclient` | Handles authentication, automatic zone creation and managment, and options for robust error handling
21
-
| **Parallel processing** | **all functions** | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats
22
-
23
10
## Installation
24
11
To install the package, open your terminal:
25
12
@@ -32,15 +19,39 @@ pip install brightdata-sdk
32
19
33
20
Create a [Bright Data](https://brightdata.com/) account and copy your API key
34
21
35
-
### 1. Initialize the Client
22
+
### Initialize the Client
36
23
37
24
```python
38
25
from brightdata import bdclient
39
26
40
27
client = bdclient(api_token="your_api_token_here") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file
| **Scrape every website** | `scrape` | Scrape every website using Bright's scraping and unti bot-detection capabilities
43
+
| **Web search** | `search` | Search google and other search engines by query (supports batch searches)
44
+
| **Web crawling** | `crawl` | Discover and scrape multiple pages from websites with advanced filtering and depth control
45
+
| **Content parsing** | `parse_content` | Extract text, links, images and structured data from API responses (JSON or HTML)
46
+
| **Browser automation** | `connect_browser` | Get WebSocket endpoint for Playwright/Selenium integration with Bright Data's scraping browser
47
+
| **Search chatGPT** | `search_chatGPT` | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
48
+
| **Search linkedin** | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data
49
+
| **Scrape linkedin** | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data
50
+
| **Download functions** | `download_snapshot`, `download_content` | Download content for both sync and async requests
51
+
| **Client class** | `bdclient` | Handles authentication, automatic zone creation and managment, and options for robust error handling
52
+
| **Parallel processing** | **all functions** | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats
53
+
54
+
### Try usig one of the functions
44
55
45
56
#### `Search()`
46
57
```python
@@ -108,6 +119,56 @@ results = client.scrape_linkedin.posts(post_urls) # can also be changed to async
108
119
print(results) # will print the snapshot_id, which can be downloaded using the download_snapshot() function
109
120
```
110
121
122
+
#### `crawl()`
123
+
```python
124
+
# Single URL crawl with filters
125
+
result = client.crawl(
126
+
url="https://example.com/",
127
+
depth=2,
128
+
filter="/product/", # Only crawl URLs containing "/product/"
0 commit comments