Skip to content

Commit b37ab22

Browse files
authored
Merge pull request #13 from vzucher/master
Fixed Amazon Search Dataset ID and broken links in README
2 parents 60fce88 + 72fa443 commit b37ab22

File tree

8 files changed

+618
-80
lines changed

8 files changed

+618
-80
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,4 +261,4 @@ Thumbs.db
261261
# Project specific
262262
*.log
263263
.cache/
264-
264+
probe

README.md

Lines changed: 197 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -83,11 +83,11 @@ Modern async-first Python SDK for [Bright Data](https://brightdata.com) APIs wit
8383

8484
Perfect for data scientists! Interactive tutorials with examples:
8585

86-
1. **[01_quickstart.ipynb](notebooks/01_quickstart.ipynb)** - Get started in 5 minutes [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/01_quickstart.ipynb)
87-
2. **[02_pandas_integration.ipynb](notebooks/02_pandas_integration.ipynb)** - Work with DataFrames [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/02_pandas_integration.ipynb)
88-
3. **[03_amazon_scraping.ipynb](notebooks/03_amazon_scraping.ipynb)** - Amazon deep dive [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/03_amazon_scraping.ipynb)
89-
4. **[04_linkedin_jobs.ipynb](notebooks/04_linkedin_jobs.ipynb)** - Job market analysis [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/04_linkedin_jobs.ipynb)
90-
5. **[05_batch_processing.ipynb](notebooks/05_batch_processing.ipynb)** - Scale to 1000s of URLs [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/05_batch_processing.ipynb)
86+
1. **[01_quickstart.ipynb](notebooks/01_quickstart.ipynb)** - Get started in 5 minutes [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/main/notebooks/01_quickstart.ipynb)
87+
2. **[02_pandas_integration.ipynb](notebooks/02_pandas_integration.ipynb)** - Work with DataFrames [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/main/notebooks/02_pandas_integration.ipynb)
88+
3. **[03_amazon_scraping.ipynb](notebooks/03_amazon_scraping.ipynb)** - Amazon deep dive [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/main/notebooks/03_amazon_scraping.ipynb)
89+
4. **[04_linkedin_jobs.ipynb](notebooks/04_linkedin_jobs.ipynb)** - Job market analysis [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/main/notebooks/04_linkedin_jobs.ipynb)
90+
5. **[05_batch_processing.ipynb](notebooks/05_batch_processing.ipynb)** - Scale to 1000s of URLs [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/main/notebooks/05_batch_processing.ipynb)
9191

9292
---
9393

@@ -149,9 +149,9 @@ client = BrightDataClient()
149149
result = client.scrape.generic.url("https://example.com")
150150

151151
if result.success:
152-
print(f"Success: {result.success}")
153-
print(f"Data: {result.data[:200]}...")
154-
print(f"Time: {result.elapsed_ms():.2f}ms")
152+
print(f"Success: {result.success}")
153+
print(f"Data: {result.data[:200]}...")
154+
print(f"Time: {result.elapsed_ms():.2f}ms")
155155
else:
156156
print(f"Error: {result.error}")
157157
```
@@ -460,13 +460,14 @@ asyncio.run(scrape_multiple())
460460
## 🆕 What's New in v2 2.0.0
461461

462462
### 🆕 **Latest Updates (December 2025)**
463-
-**Amazon Search API** - NEW parameter-based product discovery
463+
-**Amazon Search API** - NEW parameter-based product discovery with correct dataset
464464
-**LinkedIn Job Search Fixed** - Now builds URLs from keywords internally
465465
-**Trigger Interface** - Manual trigger/poll/fetch control for all platforms
466+
-**29 Sync Wrapper Fixes** - All sync methods work (scrapers + SERP API)
467+
-**Batch Operations Fixed** - Returns List[ScrapeResult] correctly
466468
-**Auto-Create Zones** - Now enabled by default (was opt-in)
467469
-**Improved Zone Names** - `sdk_unlocker`, `sdk_serp`, `sdk_browser`
468-
-**26 Sync Wrapper Fixes** - All platform scrapers now work without context managers
469-
-**Zone Manager Tests Fixed** - All 22 tests passing
470+
-**Full Sync/Async Examples** - README now shows both patterns for all features
470471

471472
### 🎓 **For Data Scientists**
472473
-**5 Jupyter Notebooks** - Complete interactive tutorials
@@ -924,29 +925,199 @@ result = client.search.linkedin.jobs(
924925
)
925926
```
926927

927-
### Sync vs Async Methods
928+
### Sync vs Async Examples - Full Coverage
929+
930+
All SDK methods support **both sync and async** patterns. Choose based on your needs:
931+
932+
#### **Amazon Products**
928933

929934
```python
930-
# Sync wrapper - for simple scripts (blocks until complete)
931-
result = client.scrape.linkedin.profiles(
932-
url="https://linkedin.com/in/johndoe",
933-
timeout=300 # Max wait time in seconds
934-
)
935+
# SYNC - Simple scripts
936+
result = client.scrape.amazon.products(url="https://amazon.com/dp/B123")
935937

936-
# Async method - for concurrent operations (requires async context)
938+
# ASYNC - Concurrent operations
937939
import asyncio
938940

939-
async def scrape_profiles():
941+
async def scrape_amazon():
942+
async with BrightDataClient() as client:
943+
result = await client.scrape.amazon.products_async(url="https://amazon.com/dp/B123")
944+
return result
945+
946+
result = asyncio.run(scrape_amazon())
947+
```
948+
949+
#### **Amazon Search**
950+
951+
```python
952+
# SYNC - Simple keyword search
953+
result = client.search.amazon.products(keyword="laptop", prime_eligible=True)
954+
955+
# ASYNC - Batch keyword searches
956+
async def search_amazon():
957+
async with BrightDataClient() as client:
958+
result = await client.search.amazon.products_async(
959+
keyword="laptop",
960+
min_price=50000,
961+
max_price=200000,
962+
prime_eligible=True
963+
)
964+
return result
965+
966+
result = asyncio.run(search_amazon())
967+
```
968+
969+
#### **LinkedIn Scraping**
970+
971+
```python
972+
# SYNC - Single profile
973+
result = client.scrape.linkedin.profiles(url="https://linkedin.com/in/johndoe")
974+
975+
# ASYNC - Multiple profiles concurrently
976+
async def scrape_linkedin():
977+
async with BrightDataClient() as client:
978+
urls = ["https://linkedin.com/in/person1", "https://linkedin.com/in/person2"]
979+
results = await client.scrape.linkedin.profiles_async(url=urls)
980+
return results
981+
982+
results = asyncio.run(scrape_linkedin())
983+
```
984+
985+
#### **LinkedIn Job Search**
986+
987+
```python
988+
# SYNC - Simple job search
989+
result = client.search.linkedin.jobs(keyword="python", location="NYC", remote=True)
990+
991+
# ASYNC - Advanced search with filters
992+
async def search_jobs():
940993
async with BrightDataClient() as client:
941-
result = await client.scrape.linkedin.profiles_async(
942-
url="https://linkedin.com/in/johndoe",
943-
timeout=300
994+
result = await client.search.linkedin.jobs_async(
995+
keyword="python developer",
996+
location="New York",
997+
experienceLevel="mid",
998+
jobType="full-time",
999+
remote=True
9441000
)
9451001
return result
9461002

947-
result = asyncio.run(scrape_profiles())
1003+
result = asyncio.run(search_jobs())
1004+
```
1005+
1006+
#### **SERP API (Google, Bing, Yandex)**
1007+
1008+
```python
1009+
# SYNC - Quick Google search
1010+
result = client.search.google(query="python tutorial", location="United States")
1011+
1012+
# ASYNC - Multiple search engines concurrently
1013+
async def search_all_engines():
1014+
async with BrightDataClient() as client:
1015+
google = await client.search.google_async(query="python", num_results=10)
1016+
bing = await client.search.bing_async(query="python", num_results=10)
1017+
yandex = await client.search.yandex_async(query="python", num_results=10)
1018+
return google, bing, yandex
1019+
1020+
results = asyncio.run(search_all_engines())
9481021
```
9491022

1023+
#### **Facebook Scraping**
1024+
1025+
```python
1026+
# SYNC - Single profile posts
1027+
result = client.scrape.facebook.posts_by_profile(
1028+
url="https://facebook.com/profile",
1029+
num_of_posts=10
1030+
)
1031+
1032+
# ASYNC - Multiple sources
1033+
async def scrape_facebook():
1034+
async with BrightDataClient() as client:
1035+
profile_posts = await client.scrape.facebook.posts_by_profile_async(
1036+
url="https://facebook.com/zuck",
1037+
num_of_posts=10
1038+
)
1039+
group_posts = await client.scrape.facebook.posts_by_group_async(
1040+
url="https://facebook.com/groups/programming",
1041+
num_of_posts=10
1042+
)
1043+
return profile_posts, group_posts
1044+
1045+
results = asyncio.run(scrape_facebook())
1046+
```
1047+
1048+
#### **Instagram Scraping**
1049+
1050+
```python
1051+
# SYNC - Single profile
1052+
result = client.scrape.instagram.profiles(url="https://instagram.com/instagram")
1053+
1054+
# ASYNC - Profile + posts
1055+
async def scrape_instagram():
1056+
async with BrightDataClient() as client:
1057+
profile = await client.scrape.instagram.profiles_async(
1058+
url="https://instagram.com/instagram"
1059+
)
1060+
posts = await client.scrape.instagram.posts_async(
1061+
url="https://instagram.com/p/ABC123"
1062+
)
1063+
return profile, posts
1064+
1065+
results = asyncio.run(scrape_instagram())
1066+
```
1067+
1068+
#### **ChatGPT**
1069+
1070+
```python
1071+
# SYNC - Single prompt
1072+
result = client.scrape.chatgpt.prompt(prompt="Explain Python", web_search=True)
1073+
1074+
# ASYNC - Batch prompts
1075+
async def ask_chatgpt():
1076+
async with BrightDataClient() as client:
1077+
result = await client.scrape.chatgpt.prompts_async(
1078+
prompts=["What is Python?", "What is JavaScript?"],
1079+
web_searches=[False, True]
1080+
)
1081+
return result
1082+
1083+
result = asyncio.run(ask_chatgpt())
1084+
```
1085+
1086+
#### **Generic Web Scraping**
1087+
1088+
```python
1089+
# SYNC - Single URL
1090+
result = client.scrape.generic.url(url="https://example.com")
1091+
1092+
# ASYNC - Concurrent scraping
1093+
async def scrape_multiple():
1094+
async with BrightDataClient() as client:
1095+
results = await client.scrape.generic.url_async([
1096+
"https://example1.com",
1097+
"https://example2.com",
1098+
"https://example3.com"
1099+
])
1100+
return results
1101+
1102+
results = asyncio.run(scrape_multiple())
1103+
```
1104+
1105+
---
1106+
1107+
### **When to Use Sync vs Async**
1108+
1109+
**Use Sync When:**
1110+
- ✅ Simple scripts or notebooks
1111+
- ✅ Single operations at a time
1112+
- ✅ Learning or prototyping
1113+
- ✅ Sequential workflows
1114+
1115+
**Use Async When:**
1116+
- ✅ Scraping multiple URLs concurrently
1117+
- ✅ Combining multiple API calls
1118+
- ✅ Production applications
1119+
- ✅ Performance-critical operations
1120+
9501121
**Note:** Sync wrappers (e.g., `profiles()`) internally use `asyncio.run()` and cannot be called from within an existing async context. Use `*_async` methods when you're already in an async function.
9511122

9521123
### SSL Certificate Error Handling
@@ -1078,10 +1249,8 @@ pytest tests/ --cov=brightdata --cov-report=html
10781249
- [All examples →](examples/)
10791250

10801251
### Documentation
1081-
- [Quick Start Guide](docs/quickstart.md)
1082-
- [Architecture Overview](docs/architecture.md)
10831252
- [API Reference](docs/api-reference/)
1084-
- [Contributing Guide](docs/contributing.md)
1253+
- [Contributing Guidelines](https://github.com/brightdata/sdk-python/blob/main/CONTRIBUTING.md) (See upstream repo)
10851254

10861255
---
10871256

@@ -1140,7 +1309,7 @@ pip install -e .
11401309

11411310
## 🤝 Contributing
11421311

1143-
Contributions are welcome! Please see [CONTRIBUTING.md](docs/contributing.md) for guidelines.
1312+
Contributions are welcome! Check the [GitHub repository](https://github.com/brightdata/sdk-python) for contribution guidelines.
11441313

11451314
### Development Setup
11461315

@@ -1238,7 +1407,7 @@ if client.test_connection_sync():
12381407
)
12391408

12401409
if fb_posts.success:
1241-
print(f"Scraped {len(fb_posts.data)} Facebook posts")
1410+
print(f"Scraped {len(fb_posts.data)} Facebook posts")
12421411

12431412
# Scrape Instagram profile
12441413
ig_profile = client.scrape.instagram.profiles(
@@ -1269,37 +1438,6 @@ Run the included demo to explore the SDK interactively:
12691438
```bash
12701439
python demo_sdk.py
12711440
```
1272-
1273-
---
1274-
1275-
## 🎯 Roadmap
1276-
1277-
### ✅ Completed
1278-
- [x] Core client with authentication
1279-
- [x] Web Unlocker service
1280-
- [x] Platform scrapers (Amazon, LinkedIn, ChatGPT, Facebook, Instagram)
1281-
- [x] SERP API (Google, Bing, Yandex)
1282-
- [x] Comprehensive test suite (502+ tests)
1283-
- [x] .env file support via python-dotenv
1284-
- [x] SSL error handling with helpful guidance
1285-
- [x] Centralized constants module
1286-
- [x] Function-level monitoring
1287-
- [x] **Dataclass payloads with validation**
1288-
- [x] **Jupyter notebooks for data scientists**
1289-
- [x] **CLI tool (brightdata command)**
1290-
- [x] **Pandas integration examples**
1291-
- [x] **Single shared AsyncEngine (8x efficiency)**
1292-
1293-
### 🚧 In Progress
1294-
- [ ] Browser automation API
1295-
- [ ] Web crawler API
1296-
1297-
### 🔮 Future
1298-
- [ ] Additional platforms (Reddit, Twitter/X, TikTok, YouTube)
1299-
- [ ] Real-time data streaming
1300-
- [ ] Advanced caching strategies
1301-
- [ ] Prometheus metrics export
1302-
13031441
---
13041442

13051443
## 🙏 Acknowledgments

src/brightdata/api/base.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,17 @@ def _execute_sync(self, *args: Any, **kwargs: Any) -> Any:
3838
Execute API operation synchronously.
3939
4040
Wraps async method using asyncio.run() for sync compatibility.
41+
Properly manages engine context.
4142
"""
4243
try:
4344
asyncio.get_running_loop()
4445
raise RuntimeError(
4546
"Cannot call sync method from async context. Use async method instead."
4647
)
4748
except RuntimeError:
48-
return asyncio.run(self._execute_async(*args, **kwargs))
49+
50+
async def _run():
51+
async with self.engine:
52+
return await self._execute_async(*args, **kwargs)
53+
54+
return asyncio.run(_run())

src/brightdata/api/scrape_service.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,4 +214,9 @@ async def url_async(
214214

215215
def url(self, *args, **kwargs) -> Union[ScrapeResult, List[ScrapeResult]]:
216216
"""Scrape URL(s) synchronously."""
217-
return asyncio.run(self.url_async(*args, **kwargs))
217+
218+
async def _run():
219+
async with self._client.engine:
220+
return await self.url_async(*args, **kwargs)
221+
222+
return asyncio.run(_run())

0 commit comments

Comments
 (0)