
To install the package, open your terminal:
pip install brightdata-sdk
If using macOS, first open a virtual environment for your project
Create a Bright Data account and copy your API key
from brightdata import bdclient
client = bdclient(api_token="your_api_token_here") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file
Add to your code a serp function
results = client.search("best selling shoes")
print(client.parse_content(results))

Feature | Functions | Description |
---|---|---|
Scrape every website | scrape |
Scrape every website using Bright's scraping and unti bot-detection capabilities |
Web search | search |
Search google and other search engines by query (supports batch searches) |
Web crawling | crawl |
Discover and scrape multiple pages from websites with advanced filtering and depth control |
AI-powered extraction | extract |
Extract specific information from websites using natural language queries and OpenAI |
Content parsing | parse_content |
Extract text, links, images and structured data from API responses (JSON or HTML) |
Browser automation | connect_browser |
Get WebSocket endpoint for Playwright/Selenium integration with Bright Data's scraping browser |
Search chatGPT | search_chatGPT |
Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts |
Search linkedin | search_linkedin.posts() , search_linkedin.jobs() , search_linkedin.profiles() |
Search LinkedIn by specific queries, and recieve structured data |
Scrape linkedin | scrape_linkedin.posts() , scrape_linkedin.jobs() , scrape_linkedin.profiles() , scrape_linkedin.companies() |
Scrape LinkedIn and recieve structured data |
Download functions | download_snapshot , download_content |
Download content for both sync and async requests |
Client class | bdclient |
Handles authentication, automatic zone creation and managment, and options for robust error handling |
Parallel processing | all functions | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats |
# Simple single query search
result = client.search("pizza restaurants")
# Try using multiple queries (parallel processing), with custom configuration
queries = ["pizza", "restaurants", "delivery"]
results = client.search(
queries,
search_engine="bing",
country="gb",
format="raw"
)
# Simple single URL scrape
result = client.scrape("https://example.com")
# Multiple URLs (parallel processing) with custom options
urls = ["https://example1.com", "https://example2.com", "https://example3.com"]
results = client.scrape(
"urls",
format="raw",
country="gb",
data_format="screenshot"
)
result = client.search_chatGPT(
prompt="what day is it today?"
# prompt=["What are the top 3 programming languages in 2024?", "Best hotels in New York", "Explain quantum computing"],
# additional_prompt=["Can you explain why?", "Are you sure?", ""]
)
client.download_content(result) # In case of timeout error, your snapshot_id is presented and you will downloaded it using download_snapshot()
Available functions:
client.search_linkedin.posts()
,client.search_linkedin.jobs()
,client.search_linkedin.profiles()
# Search LinkedIn profiles by name
first_names = ["James", "Idan"]
last_names = ["Smith", "Vilenski"]
result = client.search_linkedin.profiles(first_names, last_names) # can also be changed to async
# will print the snapshot_id, which can be downloaded using the download_snapshot() function
Available functions
client.scrape_linkedin.posts()
,client.scrape_linkedin.jobs()
,client.scrape_linkedin.profiles()
,client.scrape_linkedin.companies()
post_urls = [
"https://www.linkedin.com/posts/orlenchner_scrapecon-activity-7180537307521769472-oSYN?trk=public_profile",
"https://www.linkedin.com/pulse/getting-value-out-sunburst-guillaume-de-b%C3%A9naz%C3%A9?trk=public_profile_article_view"
]
results = client.scrape_linkedin.posts(post_urls) # can also be changed to async
print(results) # will print the snapshot_id, which can be downloaded using the download_snapshot() function
# Single URL crawl with filters
result = client.crawl(
url="https://example.com/",
depth=2,
filter="/product/", # Only crawl URLs containing "/product/"
exclude_filter="/ads/", # Exclude URLs containing "/ads/"
custom_output_fields=["markdown", "url", "page_title"]
)
print(f"Crawl initiated. Snapshot ID: {result['snapshot_id']}")
# Download crawl results
data = client.download_snapshot(result['snapshot_id'])
# Parse scraping results
scraped_data = client.scrape("https://example.com")
parsed = client.parse_content(
scraped_data,
extract_text=True,
extract_links=True,
extract_images=True
)
print(f"Title: {parsed['title']}")
print(f"Text length: {len(parsed['text'])}")
print(f"Found {len(parsed['links'])} links")
# Basic extraction (URL in query)
result = client.extract("Extract news headlines from CNN.com")
print(result)
# Using URL parameter with structured output
schema = {
"type": "object",
"properties": {
"headlines": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["headlines"]
}
result = client.extract(
query="Extract main headlines",
url="https://cnn.com",
output_scheme=schema
)
print(result) # Returns structured JSON matching the schema
# For Playwright (default browser_type)
from playwright.sync_api import sync_playwright
client = bdclient(
api_token="your_api_token",
browser_username="username-zone-browser_zone1",
browser_password="your_password"
)
with sync_playwright() as playwright:
browser = playwright.chromium.connect_over_cdp(client.connect_browser())
page = browser.new_page()
page.goto("https://example.com")
print(f"Title: {page.title()}")
browser.close()
download_content
(for sync requests)
data = client.scrape("https://example.com")
client.download_content(data)
download_snapshot
(for async requests)
# Save this function to seperate file
client.download_snapshot("") # Insert your snapshot_id
Tip
Hover over the "search" or each function in the package, to see all its available parameters.
π Search(...)
Searches using the SERP API. Accepts the same arguments as scrape(), plus:
- `query`: Search query string or list of queries
- `search_engine`: "google", "bing", or "yandex"
- Other parameters same as scrape()
π scrape(...)
Scrapes a single URL or list of URLs using the Web Unlocker.
- `url`: Single URL string or list of URLs
- `zone`: Zone identifier (auto-configured if None)
- `format`: "json" or "raw"
- `method`: HTTP method
- `country`: Two-letter country code
- `data_format`: "markdown", "screenshot", etc.
- `async_request`: Enable async processing
- `max_workers`: Max parallel workers (default: 10)
- `timeout`: Request timeout in seconds (default: 30)
π·οΈ crawl(...)
Discover and scrape multiple pages from websites with advanced filtering.
- `url`: Single URL string or list of URLs to crawl (required)
- `ignore_sitemap`: Ignore sitemap when crawling (optional)
- `depth`: Maximum crawl depth relative to entered URL (optional)
- `filter`: Regex to include only certain URLs (e.g. "/product/")
- `exclude_filter`: Regex to exclude certain URLs (e.g. "/ads/")
- `custom_output_fields`: List of output fields to include (optional)
- `include_errors`: Include errors in response (default: True)
π parse_content(...)
Extract and parse useful information from API responses.
- `data`: Response data from scrape(), search(), or crawl() methods
- `extract_text`: Extract clean text content (default: True)
- `extract_links`: Extract all links from content (default: False)
- `extract_images`: Extract image URLs from content (default: False)
π€ extract(...)
Extract specific information from websites using AI-powered natural language processing with OpenAI.
- `query`: Natural language query describing what to extract (required)
- `url`: Single URL or list of URLs to extract from (optional - if not provided, extracts URL from query)
- `output_scheme`: JSON Schema for OpenAI Structured Outputs (optional - enables reliable JSON responses)
- `llm_key`: OpenAI API key (optional - uses OPENAI_API_KEY env variable if not provided)
# Returns: ExtractResult object (string-like with metadata attributes)
# Available attributes: .url, .query, .source_title, .token_usage, .content_length
π connect_browser(...)
Get WebSocket endpoint for browser automation with Bright Data's scraping browser.
# Required client parameters:
- `browser_username`: Username for browser API (format: "username-zone-{zone_name}")
- `browser_password`: Password for browser API authentication
- `browser_type`: "playwright", "puppeteer", or "selenium" (default: "playwright")
# Returns: WebSocket endpoint URL string
πΎ Download_Content(...)
Save content to local file.
- `content`: Content to save
- `filename`: Output filename (auto-generated if None)
- `format`: File format ("json", "csv", "txt", etc.)
βοΈ Configuration Constants
Constant | Default | Description |
---|---|---|
DEFAULT_MAX_WORKERS |
10 |
Max parallel tasks |
DEFAULT_TIMEOUT |
30 |
Request timeout (in seconds) |
CONNECTION_POOL_SIZE |
20 |
Max concurrent HTTP connections |
MAX_RETRIES |
3 |
Retry attempts on failure |
RETRY_BACKOFF_FACTOR |
1.5 |
Exponential backoff multiplier |
π§ Environment Variables
Create a .env
file in your project root:
BRIGHTDATA_API_TOKEN=your_bright_data_api_token
WEB_UNLOCKER_ZONE=your_web_unlocker_zone # Optional
SERP_ZONE=your_serp_zone # Optional
BROWSER_ZONE=your_browser_zone # Optional
BRIGHTDATA_BROWSER_USERNAME=username-zone-name # For browser automation
BRIGHTDATA_BROWSER_PASSWORD=your_browser_password # For browser automation
OPENAI_API_KEY=your_openai_api_key # For extract() function
π Manage Zones
List all active zones
# List all active zones
zones = client.list_zones()
print(f"Found {len(zones)} zones")
Configure a custom zone name
client = bdclient(
api_token="your_token",
auto_create_zones=False, # Else it creates the Zone automatically
web_unlocker_zone="custom_zone",
serp_zone="custom_serp_zone"
)
π₯ Client Management
bdclient Class - Complete parameter list
bdclient(
api_token: str = None, # Your Bright Data API token (required)
auto_create_zones: bool = True, # Auto-create zones if they don't exist
web_unlocker_zone: str = None, # Custom web unlocker zone name
serp_zone: str = None, # Custom SERP zone name
browser_zone: str = None, # Custom browser zone name
browser_username: str = None, # Browser API username (format: "username-zone-{zone_name}")
browser_password: str = None, # Browser API password
browser_type: str = "playwright", # Browser automation tool: "playwright", "puppeteer", "selenium"
log_level: str = "INFO", # Logging level: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"
structured_logging: bool = True, # Use structured JSON logging
verbose: bool = None # Enable verbose logging (overrides log_level if True)
)
β οΈ Error Handling
bdclient Class
The SDK includes built-in input validation and retry logic
In case of zone related problems, use the list_zones() function to check your active zones, and check that your account settings, to verify that your API key have "admin permissions".
For any issues, contact Bright Data support, or open an issue in this repository.