Skip to content

Commit 1c05cbb

Browse files
authored
Update README.md
1 parent b3ffeda commit 1c05cbb

File tree

1 file changed

+81
-47
lines changed

1 file changed

+81
-47
lines changed

README.md

Lines changed: 81 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,21 @@
44
```python
55
pip install brightdata-sdk
66
```
7-
<h3 align="center">Python SDK by Bright Data, providing easy-to-use scalable methods for web search & scraping</h3>
7+
<h3 align="center">Python SDK by Bright Data, Easy-to-use scalable methods for web search & scraping</h3>
88
<p></p>
99

1010
## Features
1111

12-
- **Web Scraping**: Scrape websites using Bright Data Web Unlocker API with proxy support
13-
- **Search Engine Results**: Perform web searches using Bright Data SERP API
14-
- **Multiple Search Engines**: Support for Google, Bing, and Yandex
15-
- **Parallel Processing**: Concurrent processing for multiple URLs or queries
16-
- **Robust Error Handling**: Comprehensive error handling with retry logic
17-
- **Zone Management**: Automatic zone creation and management
18-
- **Multiple Output Formats**: JSON, raw HTML, markdown, and more
12+
| Feature | Functions | Description
13+
|--------------------------|-----------------------------|-------------------------------------
14+
| **Scrape every website** | `scrape` | Scrape every website using Bright's scraping and unti bot-detection capabilities
15+
| **Web search** | `search` | Search google and other search engines by query (supports batch searches)
16+
| **Search chatGPT** | `search_chatGPT` | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
17+
| **Search linkedin** | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data
18+
| **Scrape linkedin** | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data
19+
| **Download functions** | `download_snapshot`, `download_content` | Download content for both sync and async requests
20+
| **Client class** | `bdclient` | Handles authentication, automatic zone creation and managment, and options for robust error handling
21+
| **Parallel processing** | **all functions** | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats
1922

2023
## Installation
2124
To install the package, open your terminal:
@@ -37,70 +40,89 @@ from brightdata import bdclient
3740
client = bdclient(api_token="your_api_token_here") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file
3841
```
3942

40-
Or you can configure a custom zone name
43+
### 2. Try usig one of the functions
4144

45+
#### `Search()`
4246
```python
43-
client = bdclient(
44-
api_token="your_token",
45-
auto_create_zones=False, # Else it creates the Zone automatically
46-
web_unlocker_zone="custom_zone",
47-
serp_zone="custom_serp_zone"
48-
)
49-
50-
```
51-
52-
### 2. Search Engine Results
53-
54-
```python
55-
# Single search query
47+
# Simple single query search
5648
result = client.search("pizza restaurants")
5749

58-
# Multiple queries (parallel processing)
50+
# Try using multiple queries (parallel processing), with custom configuration
5951
queries = ["pizza", "restaurants", "delivery"]
60-
results = client.search(queries)
61-
62-
# Different search engines
63-
result = client.search("pizza", search_engine="google") # search_engine can also be set to "yandex" or "bing"
64-
65-
# Custom options
6652
results = client.search(
67-
["pizza", "sushi"],
53+
queries,
54+
search_engine="bing",
6855
country="gb",
6956
format="raw"
7057
)
7158
```
72-
73-
> [!TIP]
74-
> Hover over the "search" or each function in the package, to see all its available parameters.
75-
76-
![Hover-Over1](https://github.com/user-attachments/assets/51324485-5769-48d5-8f13-0b534385142e)
77-
78-
### 3. Scrape Websites
79-
59+
#### `scrape()`
8060
```python
81-
# Single URL
61+
# Simple single URL scrape
8262
result = client.scrape("https://example.com")
8363

84-
# Multiple URLs (parallel processing)
64+
# Multiple URLs (parallel processing) with custom options
8565
urls = ["https://example1.com", "https://example2.com", "https://example3.com"]
86-
results = client.scrape(urls)
87-
88-
# Custom options
89-
result = client.scrape(
90-
"https://example.com",
66+
results = client.scrape(
67+
"urls",
9168
format="raw",
9269
country="gb",
9370
data_format="screenshot"
9471
)
9572
```
73+
#### `search_chatGPT()`
74+
```python
75+
result = client.search_chatGPT(
76+
prompt="what day is it today?"
77+
# prompt=["What are the top 3 programming languages in 2024?", "Best hotels in New York", "Explain quantum computing"],
78+
# additional_prompt=["Can you explain why?", "Are you sure?", ""]
79+
)
80+
81+
client.download_content(result) # In case of timeout error, your snapshot_id is presented and you will downloaded it using download_snapshot()
82+
```
83+
84+
#### `search_linkedin.`
85+
Available functions:
86+
client.**`search_linkedin.posts()`**,client.**`search_linkedin.jobs()`**,client.**`search_linkedin.profiles()`**
87+
```python
88+
# Search LinkedIn profiles by name
89+
first_names = ["James", "Idan"]
90+
last_names = ["Smith", "Vilenski"]
91+
92+
result = client.search_linkedin.profiles(first_names, last_names) # can also be changed to async
93+
# will print the snapshot_id, which can be downloaded using the download_snapshot() function
94+
```
9695

97-
### 4. Download Content
96+
#### `scrape_linkedin.`
97+
Available functions
9898

99+
client.**`scrape_linkedin.posts()`**,client.**`scrape_linkedin.jobs()`**,client.**`scrape_linkedin.profiles()`**,client.**`scrape_linkedin.companies()`**
100+
```python
101+
post_urls = [
102+
"https://www.linkedin.com/posts/orlenchner_scrapecon-activity-7180537307521769472-oSYN?trk=public_profile",
103+
"https://www.linkedin.com/pulse/getting-value-out-sunburst-guillaume-de-b%C3%A9naz%C3%A9?trk=public_profile_article_view"
104+
]
105+
106+
results = client.scrape_linkedin.posts(post_urls) # can also be changed to async
107+
108+
print(results) # will print the snapshot_id, which can be downloaded using the download_snapshot() function
109+
```
110+
111+
**`download_content`** (for sync requests)
99112
```python
100-
# Download scraped content
101113
data = client.scrape("https://example.com")
102114
client.download_content(data)
103115
```
116+
**`download_snapshot`** (for async requests)
117+
```python
118+
# Save this function to seperate file
119+
client.download_snapshot("") # Insert your snapshot_id
120+
```
121+
122+
> [!TIP]
123+
> Hover over the "search" or each function in the package, to see all its available parameters.
124+
125+
![Hover-Over1](https://github.com/user-attachments/assets/51324485-5769-48d5-8f13-0b534385142e)
104126

105127
## Function Parameters
106128
<details>
@@ -185,6 +207,18 @@ zones = client.list_zones()
185207
print(f"Found {len(zones)} zones")
186208
```
187209

210+
Configure a custom zone name
211+
212+
```python
213+
client = bdclient(
214+
api_token="your_token",
215+
auto_create_zones=False, # Else it creates the Zone automatically
216+
web_unlocker_zone="custom_zone",
217+
serp_zone="custom_serp_zone"
218+
)
219+
220+
```
221+
188222
</details>
189223
<details>
190224
<summary>👥 <strong>Client Management</strong></summary>

0 commit comments

Comments
 (0)