Skip to content

Commit 4b13b9b

Browse files
עידן וילנסקיעידן וילנסקי
authored andcommitted
feat: add LinkedIn scraping and search capabilities, restructure ChatGPT API, and improve download functionality
Major Features: - Add comprehensive LinkedIn data scraping (profiles, companies, jobs, posts) - Add LinkedIn search capabilities with keyword and URL-based discovery - Restructure ChatGPT API with sync/async support and rename to search_chatGPT - Refactor download functions into separate module for better organization - Add new example files for LinkedIn scraping and searching API Enhancements: - New scrape_linkedin class with specialized methods for different data types - New search_linkedin class for discovering LinkedIn content - Enhanced ChatGPT API with synchronous and asynchronous processing options - Improved NDJSON response parsing for better data handling - Better parameter validation and error handling across all APIs Code Organization: - Move download functionality to dedicated api/download.py module - Create specialized API modules for LinkedIn and ChatGPT operations - Maintain backward compatibility for existing download methods - Clean up code for production readiness with optimized performance Version bump to 1.0.7
1 parent e19c6a4 commit 4b13b9b

File tree

11 files changed

+1248
-6567
lines changed

11 files changed

+1248
-6567
lines changed

brightdata/__init__.py

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,32 @@
33
44
A comprehensive SDK for Bright Data's Web Scraping and SERP APIs, providing
55
easy-to-use methods for web scraping, search engine result parsing, and data management.
6-
### Functions:
6+
## Functions:
7+
First import the package and create a client:
8+
```python
9+
from brightdata import bdclient
10+
client = bdclient(your-apy-key)
11+
```
12+
Then use the client to call the desired functions:
713
#### scrape()
814
- Scrapes a website using Bright Data Web Unblocker API with proxy support (or multiple websites sequentially)
15+
- syntax: `results = client.scrape(url, country, max_workers, ...)`
16+
#### .scrape_linkedin. class
17+
- Scrapes LinkedIn data including posts, jobs, companies, and profiles, recieve structured data as a result
18+
- syntax: `results = client.scrape_linkedin.posts()/jobs()/companies()/profiles() # insert parameters per function`
919
#### search()
1020
- Performs web searches using Bright Data SERP API with customizable search engines (or multiple search queries sequentially)
11-
#### download_content()
21+
- syntax: `results = client.search(query, search_engine, country, ...)`
22+
#### .search_linkedin. class
23+
- Search LinkedIn data including for specific posts, jobs, profiles. recieve the relevent data as a result
24+
- syntax: `results = client.search_linkedin.posts()/jobs()/profiles() # insert parameters per function`
25+
#### search_chatGPT()
26+
- Interact with ChatGPT using Bright Data's ChatGPT API, sending prompts and receiving responses
27+
- syntax: `results = client.search_chatGPT(prompt, additional_prompt, max_workers, ...)`
28+
#### download_content() / download_snapshot()
1229
- Saves the scraped content to local files in various formats (JSON, CSV, etc.)
30+
- syntax: `client.download_content(results)`
31+
- syntax: `client.download_snapshot(results)`
1332
1433
### Features:
1534
- Web Scraping: Scrape websites using Bright Data Web Unlocker API with proxy support
@@ -32,7 +51,7 @@
3251
APIError
3352
)
3453

35-
__version__ = "1.0.6"
54+
__version__ = "1.0.7"
3655
__author__ = "Bright Data"
3756
__email__ = "[email protected]"
3857

brightdata/api/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
from .scraper import WebScraper
22
from .search import SearchAPI
33
from .chatgpt import ChatGPTAPI
4+
from .linkedin import LinkedInAPI
45

56
__all__ = [
67
'WebScraper',
78
'SearchAPI',
8-
'ChatGPTAPI'
9+
'ChatGPTAPI',
10+
'LinkedInAPI'
911
]

brightdata/api/chatgpt.py

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ def scrape_chatgpt(
2424
countries: List[str],
2525
additional_prompts: List[str],
2626
web_searches: List[bool],
27+
sync: bool = True,
2728
timeout: int = None
2829
) -> Dict[str, Any]:
2930
"""
@@ -34,12 +35,13 @@ def scrape_chatgpt(
3435
- countries: List of country codes matching prompts
3536
- additional_prompts: List of follow-up prompts matching prompts
3637
- web_searches: List of web_search flags matching prompts
38+
- sync: If True, uses synchronous API for immediate results
3739
- timeout: Request timeout in seconds
3840
3941
Returns:
40-
- Dict containing response with snapshot_id
42+
- Dict containing response with snapshot_id or direct data (if sync=True)
4143
"""
42-
url = "https://api.brightdata.com/datasets/v3/trigger"
44+
url = "https://api.brightdata.com/datasets/v3/scrape" if sync else "https://api.brightdata.com/datasets/v3/trigger"
4345
headers = {
4446
"Authorization": f"Bearer {self.api_token}",
4547
"Content-Type": "application/json"
@@ -49,38 +51,59 @@ def scrape_chatgpt(
4951
"include_errors": "true"
5052
}
5153

52-
data = []
53-
for i in range(len(prompts)):
54-
data.append({
54+
data = [
55+
{
5556
"url": "https://chatgpt.com/",
5657
"prompt": prompts[i],
5758
"country": countries[i],
5859
"additional_prompt": additional_prompts[i],
5960
"web_search": web_searches[i]
60-
})
61+
}
62+
for i in range(len(prompts))
63+
]
6164

6265
try:
6366
response = self.session.post(
6467
url,
6568
headers=headers,
6669
params=params,
6770
json=data,
68-
timeout=timeout or self.default_timeout
71+
timeout=timeout or (65 if sync else self.default_timeout)
6972
)
7073

7174
if response.status_code == 401:
7275
raise AuthenticationError("Invalid API token or insufficient permissions")
7376
elif response.status_code != 200:
7477
raise APIError(f"ChatGPT scraping request failed with status {response.status_code}: {response.text}")
7578

76-
result = response.json()
77-
snapshot_id = result.get('snapshot_id')
78-
if snapshot_id:
79-
logger.info(f"ChatGPT scraping job initiated successfully for {len(prompts)} prompt(s)")
80-
print("")
81-
print("Snapshot ID:")
82-
print(snapshot_id)
83-
print("")
79+
if sync:
80+
response_text = response.text
81+
if '\n{' in response_text and response_text.strip().startswith('{'):
82+
json_objects = []
83+
for line in response_text.strip().split('\n'):
84+
if line.strip():
85+
try:
86+
json_objects.append(json.loads(line))
87+
except json.JSONDecodeError:
88+
continue
89+
result = json_objects
90+
else:
91+
try:
92+
result = response.json()
93+
except json.JSONDecodeError:
94+
result = response_text
95+
96+
logger.info(f"ChatGPT data retrieved synchronously for {len(prompts)} prompt(s)")
97+
print(f"Retrieved {len(result) if isinstance(result, list) else 1} ChatGPT response(s)")
98+
else:
99+
result = response.json()
100+
snapshot_id = result.get('snapshot_id')
101+
if snapshot_id:
102+
logger.info(f"ChatGPT scraping job initiated successfully for {len(prompts)} prompt(s)")
103+
print("")
104+
print("Snapshot ID:")
105+
print(snapshot_id)
106+
print("")
84107

85108
return result
86109

0 commit comments

Comments
 (0)