Skip to content

Commit 3429701

Browse files
committed
fix: Return List[ScrapeResult] for batch operations instead of single result
- Fixed batch scraping to return proper List[ScrapeResult] for multiple URLs - Applied fix to 8 instances across all platform scrapers - Base scraper, Amazon, LinkedIn, Facebook, Instagram all fixed - Resolves critical API contract violation in batch operations
1 parent d9345c9 commit 3429701

File tree

11 files changed

+232
-29
lines changed

11 files changed

+232
-29
lines changed

README.md

Lines changed: 53 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,54 @@
11
# Bright Data Python SDK 🐍
22

3-
[![Tests](https://img.shields.io/badge/tests-502%2B%20passing-brightgreen)](https://github.com/vzucher/brightdata-sdk-python)
3+
[![Tests](https://img.shields.io/badge/tests-502%2B%20passing-brightgreen)](https://github.com/brightdata/sdk-python)
44
[![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
55
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
6-
[![Code Quality](https://img.shields.io/badge/quality-enterprise--grade-gold)](https://github.com/vzucher/brightdata-sdk-python)
6+
[![Code Quality](https://img.shields.io/badge/quality-enterprise--grade-gold)](https://github.com/brightdata/sdk-python)
77
[![Notebooks](https://img.shields.io/badge/jupyter-5%20notebooks-orange)](notebooks/)
88

99
Modern async-first Python SDK for [Bright Data](https://brightdata.com) APIs with **dataclass payloads**, **Jupyter notebooks**, comprehensive platform support, and **CLI tool** - built for data scientists and developers.
1010

1111
---
1212

13+
## 📑 Table of Contents
14+
15+
- [✨ Features](#-features)
16+
- [📓 Jupyter Notebooks](#-jupyter-notebooks-new)
17+
- [📦 Installation](#-installation)
18+
- [🚀 Quick Start](#-quick-start)
19+
- [Authentication](#authentication)
20+
- [Simple Web Scraping](#simple-web-scraping)
21+
- [Using Dataclass Payloads](#using-dataclass-payloads-type-safe-)
22+
- [Pandas Integration](#pandas-integration-for-data-scientists-)
23+
- [Platform-Specific Scraping](#platform-specific-scraping)
24+
- [Search Engine Results (SERP)](#search-engine-results-serp)
25+
- [Async Usage](#async-usage)
26+
- [🆕 What's New in v2.0.0](#-whats-new-in-v2-200)
27+
- [🏗️ Architecture](#️-architecture)
28+
- [📚 API Reference](#-api-reference)
29+
- [Client Initialization](#client-initialization)
30+
- [Connection Testing](#connection-testing)
31+
- [Zone Management](#zone-management)
32+
- [Result Objects](#result-objects)
33+
- [🖥️ CLI Usage](#️-cli-usage)
34+
- [🐼 Pandas Integration](#-pandas-integration)
35+
- [🎨 Dataclass Payloads](#-dataclass-payloads)
36+
- [🔧 Advanced Usage](#-advanced-usage)
37+
- [🧪 Testing](#-testing)
38+
- [🏛️ Design Philosophy](#️-design-philosophy)
39+
- [📖 Documentation](#-documentation)
40+
- [🔧 Troubleshooting](#-troubleshooting)
41+
- [🤝 Contributing](#-contributing)
42+
- [📊 Project Stats](#-project-stats)
43+
- [📝 License](#-license)
44+
- [🔗 Links](#-links)
45+
- [💡 Examples](#-examples)
46+
- [🎯 Roadmap](#-roadmap)
47+
- [🙏 Acknowledgments](#-acknowledgments)
48+
- [🌟 Why Choose This SDK?](#-why-choose-this-sdk)
49+
50+
---
51+
1352
## ✨ Features
1453

1554
### 🎯 **For Data Scientists**
@@ -44,11 +83,11 @@ Modern async-first Python SDK for [Bright Data](https://brightdata.com) APIs wit
4483

4584
Perfect for data scientists! Interactive tutorials with examples:
4685

47-
1. **[01_quickstart.ipynb](notebooks/01_quickstart.ipynb)** - Get started in 5 minutes [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/01_quickstart.ipynb)
48-
2. **[02_pandas_integration.ipynb](notebooks/02_pandas_integration.ipynb)** - Work with DataFrames [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/02_pandas_integration.ipynb)
49-
3. **[03_amazon_scraping.ipynb](notebooks/03_amazon_scraping.ipynb)** - Amazon deep dive [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/03_amazon_scraping.ipynb)
50-
4. **[04_linkedin_jobs.ipynb](notebooks/04_linkedin_jobs.ipynb)** - Job market analysis [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/04_linkedin_jobs.ipynb)
51-
5. **[05_batch_processing.ipynb](notebooks/05_batch_processing.ipynb)** - Scale to 1000s of URLs [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vzucher/brightdata-sdk-python/blob/master/notebooks/05_batch_processing.ipynb)
86+
1. **[01_quickstart.ipynb](notebooks/01_quickstart.ipynb)** - Get started in 5 minutes [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/01_quickstart.ipynb)
87+
2. **[02_pandas_integration.ipynb](notebooks/02_pandas_integration.ipynb)** - Work with DataFrames [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/02_pandas_integration.ipynb)
88+
3. **[03_amazon_scraping.ipynb](notebooks/03_amazon_scraping.ipynb)** - Amazon deep dive [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/03_amazon_scraping.ipynb)
89+
4. **[04_linkedin_jobs.ipynb](notebooks/04_linkedin_jobs.ipynb)** - Job market analysis [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/04_linkedin_jobs.ipynb)
90+
5. **[05_batch_processing.ipynb](notebooks/05_batch_processing.ipynb)** - Scale to 1000s of URLs [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/brightdata/sdk-python/blob/master/notebooks/05_batch_processing.ipynb)
5291

5392
---
5493

@@ -61,8 +100,8 @@ pip install brightdata-sdk
61100
Or install from source:
62101

63102
```bash
64-
git clone https://github.com/vzucher/brightdata-sdk-python.git
65-
cd brightdata-sdk-python
103+
git clone https://github.com/brightdata/sdk-python.git
104+
cd sdk-python
66105
pip install -e .
67106
```
68107

@@ -418,7 +457,7 @@ asyncio.run(scrape_multiple())
418457

419458
---
420459

421-
## 🆕 What's New in v01.12.24
460+
## 🆕 What's New in v2 2.0.0
422461

423462
### 🆕 **Latest Updates (December 2025)**
424463
-**Amazon Search API** - NEW parameter-based product discovery
@@ -1106,8 +1145,8 @@ Contributions are welcome! Please see [CONTRIBUTING.md](docs/contributing.md) fo
11061145
### Development Setup
11071146

11081147
```bash
1109-
git clone https://github.com/vzucher/brightdata-sdk-python.git
1110-
cd brightdata-sdk-python
1148+
git clone https://github.com/brightdata/sdk-python.git
1149+
cd sdk-python
11111150

11121151
# Install with dev dependencies
11131152
pip install -e ".[dev]"
@@ -1147,8 +1186,8 @@ MIT License - see [LICENSE](LICENSE) file for details.
11471186

11481187
- [Bright Data](https://brightdata.com) - Get your API token
11491188
- [API Documentation](https://docs.brightdata.com)
1150-
- [GitHub Repository](https://github.com/vzucher/brightdata-sdk-python)
1151-
- [Issue Tracker](https://github.com/vzucher/brightdata-sdk-python/issues)
1189+
- [GitHub Repository](https://github.com/brightdata/sdk-python)
1190+
- [Issue Tracker](https://github.com/brightdata/sdk-python/issues)
11521191

11531192
---
11541193

docs/architecture.md

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/contributing.md

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/index.md

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/quickstart.md

Lines changed: 0 additions & 2 deletions
This file was deleted.

src/brightdata/scrapers/amazon/scraper.py

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -254,7 +254,29 @@ async def reviews_async(
254254
if is_single and isinstance(result.data, list) and len(result.data) == 1:
255255
result.url = url if isinstance(url, str) else url[0]
256256
result.data = result.data[0]
257-
257+
return result
258+
elif not is_single and isinstance(result.data, list):
259+
from ...models import ScrapeResult
260+
261+
results = []
262+
url_list = url if isinstance(url, list) else [url]
263+
for url_item, data_item in zip(url_list, result.data):
264+
results.append(
265+
ScrapeResult(
266+
success=True,
267+
data=data_item,
268+
url=url_item,
269+
platform=result.platform,
270+
method=result.method,
271+
trigger_sent_at=result.trigger_sent_at,
272+
snapshot_id_received_at=result.snapshot_id_received_at,
273+
snapshot_polled_at=result.snapshot_polled_at,
274+
data_fetched_at=result.data_fetched_at,
275+
snapshot_id=result.snapshot_id,
276+
cost=result.cost / len(result.data) if result.cost else None,
277+
)
278+
)
279+
return results
258280
return result
259281

260282
def reviews(
@@ -499,5 +521,26 @@ async def _scrape_urls(
499521
if is_single and isinstance(result.data, list) and len(result.data) == 1:
500522
result.url = url if isinstance(url, str) else url[0]
501523
result.data = result.data[0]
502-
524+
return result
525+
elif not is_single and isinstance(result.data, list):
526+
from ...models import ScrapeResult
527+
528+
results = []
529+
for url_item, data_item in zip(url_list, result.data):
530+
results.append(
531+
ScrapeResult(
532+
success=True,
533+
data=data_item,
534+
url=url_item,
535+
platform=result.platform,
536+
method=result.method,
537+
trigger_sent_at=result.trigger_sent_at,
538+
snapshot_id_received_at=result.snapshot_id_received_at,
539+
snapshot_polled_at=result.snapshot_polled_at,
540+
data_fetched_at=result.data_fetched_at,
541+
snapshot_id=result.snapshot_id,
542+
cost=result.cost / len(result.data) if result.cost else None,
543+
)
544+
)
545+
return results
503546
return result

src/brightdata/scrapers/base.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,9 +154,32 @@ async def scrape_async(
154154
)
155155

156156
if is_single and isinstance(result.data, list) and len(result.data) == 1:
157+
# Single URL case - unwrap single item from list
157158
result.url = urls
158159
result.data = result.data[0]
159160
return result
161+
elif not is_single and isinstance(result.data, list):
162+
# Multiple URLs case - transform to List[ScrapeResult]
163+
results = []
164+
for i, (url, data_item) in enumerate(zip(url_list, result.data)):
165+
individual_result = ScrapeResult(
166+
success=True,
167+
data=data_item,
168+
url=url,
169+
error=None,
170+
platform=result.platform,
171+
method=result.method,
172+
# Copy timing information from parent
173+
trigger_sent_at=result.trigger_sent_at,
174+
snapshot_id_received_at=result.snapshot_id_received_at,
175+
snapshot_polled_at=result.snapshot_polled_at,
176+
data_fetched_at=result.data_fetched_at,
177+
snapshot_id=result.snapshot_id,
178+
# Divide cost equally across results
179+
cost=result.cost / len(result.data) if result.cost else None,
180+
)
181+
results.append(individual_result)
182+
return results
160183

161184
return result
162185

src/brightdata/scrapers/facebook/scraper.py

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -669,7 +669,28 @@ async def _scrape_urls(
669669
if is_single and isinstance(result.data, list) and len(result.data) == 1:
670670
result.url = url if isinstance(url, str) else url[0]
671671
result.data = result.data[0]
672-
672+
return result
673+
elif not is_single and isinstance(result.data, list):
674+
from ...models import ScrapeResult
675+
676+
results = []
677+
for url_item, data_item in zip(url_list, result.data):
678+
results.append(
679+
ScrapeResult(
680+
success=True,
681+
data=data_item,
682+
url=url_item,
683+
platform=result.platform,
684+
method=result.method,
685+
trigger_sent_at=result.trigger_sent_at,
686+
snapshot_id_received_at=result.snapshot_id_received_at,
687+
snapshot_polled_at=result.snapshot_polled_at,
688+
data_fetched_at=result.data_fetched_at,
689+
snapshot_id=result.snapshot_id,
690+
cost=result.cost / len(result.data) if result.cost else None,
691+
)
692+
)
693+
return results
673694
return result
674695

675696
async def _scrape_with_params(
@@ -737,5 +758,27 @@ async def _scrape_with_params(
737758
if is_single and isinstance(result.data, list) and len(result.data) == 1:
738759
result.url = url if isinstance(url, str) else url[0]
739760
result.data = result.data[0]
740-
761+
return result
762+
elif not is_single and isinstance(result.data, list):
763+
from ...models import ScrapeResult
764+
765+
results = []
766+
url_list = url if isinstance(url, list) else [url]
767+
for url_item, data_item in zip(url_list, result.data):
768+
results.append(
769+
ScrapeResult(
770+
success=True,
771+
data=data_item,
772+
url=url_item,
773+
platform=result.platform,
774+
method=result.method,
775+
trigger_sent_at=result.trigger_sent_at,
776+
snapshot_id_received_at=result.snapshot_id_received_at,
777+
snapshot_polled_at=result.snapshot_polled_at,
778+
data_fetched_at=result.data_fetched_at,
779+
snapshot_id=result.snapshot_id,
780+
cost=result.cost / len(result.data) if result.cost else None,
781+
)
782+
)
783+
return results
741784
return result

src/brightdata/scrapers/instagram/scraper.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,5 +442,26 @@ async def _scrape_urls(
442442
if is_single and isinstance(result.data, list) and len(result.data) == 1:
443443
result.url = url if isinstance(url, str) else url[0]
444444
result.data = result.data[0]
445-
445+
return result
446+
elif not is_single and isinstance(result.data, list):
447+
from ...models import ScrapeResult
448+
449+
results = []
450+
for url_item, data_item in zip(url_list, result.data):
451+
results.append(
452+
ScrapeResult(
453+
success=True,
454+
data=data_item,
455+
url=url_item,
456+
platform=result.platform,
457+
method=result.method,
458+
trigger_sent_at=result.trigger_sent_at,
459+
snapshot_id_received_at=result.snapshot_id_received_at,
460+
snapshot_polled_at=result.snapshot_polled_at,
461+
data_fetched_at=result.data_fetched_at,
462+
snapshot_id=result.snapshot_id,
463+
cost=result.cost / len(result.data) if result.cost else None,
464+
)
465+
)
466+
return results
446467
return result

src/brightdata/scrapers/instagram/search.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,5 +283,26 @@ async def _discover_with_params(
283283
if is_single and isinstance(result.data, list) and len(result.data) == 1:
284284
result.url = url if isinstance(url, str) else url[0]
285285
result.data = result.data[0]
286-
286+
return result
287+
elif not is_single and isinstance(result.data, list):
288+
from ...models import ScrapeResult
289+
290+
results = []
291+
url_list = url if isinstance(url, list) else [url]
292+
for url_item, data_item in zip(url_list, result.data):
293+
results.append(
294+
ScrapeResult(
295+
success=True,
296+
data=data_item,
297+
url=url_item,
298+
platform="instagram",
299+
trigger_sent_at=result.trigger_sent_at,
300+
snapshot_id_received_at=result.snapshot_id_received_at,
301+
snapshot_polled_at=result.snapshot_polled_at,
302+
data_fetched_at=result.data_fetched_at,
303+
snapshot_id=result.snapshot_id,
304+
cost=result.cost / len(result.data) if result.cost else None,
305+
)
306+
)
307+
return results
287308
return result

0 commit comments

Comments
 (0)