v1.1.1: Update documentation with comprehensive feature coverage

עידן וילנסקי · עידן וילנסקי · commit ef3d82724f3e · 2025-09-03T13:21:08.000+03:00
- Add crawl(), parse_content(), and connect_browser() examples to README
- Document all client parameters including browser automation and logging
- Update environment variables for browser credentials
- Fix browser connection example import and URL issues
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -5,6 +5,8 @@ on:
     branches: [ main, develop ]
   pull_request:
     branches: [ main ]
+  schedule:
+    - cron: '0 2 * * *'
 
 jobs:
   test:
@@ -43,9 +45,10 @@ jobs:
 
   test-pypi-package:
     runs-on: ubuntu-latest
+    if: github.event_name == 'schedule'
     strategy:
       matrix:
-        python-version: ['3.8', '3.11']  # Test on fewer versions for PyPI to save CI time
+        python-version: ['3.8', '3.11']
     
     steps:
     - uses: actions/checkout@v4
diff --git a/README.md b/README.md
@@ -7,19 +7,6 @@ pip install brightdata-sdk
 <h3 align="center">Python SDK by Bright Data, Easy-to-use scalable methods for web search & scraping</h3>
 <p></p>
 
-## Features
-
-| Feature                        | Functions                   | Description
-|--------------------------|-----------------------------|-------------------------------------
-| **Scrape every website** | `scrape`                    | Scrape every website using Bright's scraping and unti bot-detection capabilities
-| **Web search**           | `search`                    | Search google and other search engines by query (supports batch searches)
-| **Search chatGPT**       | `search_chatGPT`            | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
-| **Search linkedin**      | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data
-| **Scrape linkedin**      | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data
-| **Download functions**   | `download_snapshot`, `download_content`  | Download content for both sync and async requests
-| **Client class**         | `bdclient`         | Handles authentication, automatic zone creation and managment, and options for robust error handling
-| **Parallel processing**  | **all functions**  | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats
-
 ## Installation
 To install the package, open your terminal:
 
@@ -32,15 +19,39 @@ pip install brightdata-sdk
 
 Create a [Bright Data](https://brightdata.com/) account and copy your API key
 
-### 1. Initialize the Client
+### Initialize the Client
 
 ```python
 from brightdata import bdclient
 
 client = bdclient(api_token="your_api_token_here") # can also be defined as BRIGHTDATA_API_TOKEN in your .env file
 ```
 
-### 2. Try usig one of the functions
+### Launch first request
+Add to your code a serp function
+```python
+results = client.search("best selling shoes")
+
+print(client.parse_content(results))
+```
+
+## Features
+
+| Feature                        | Functions                   | Description
+|--------------------------|-----------------------------|-------------------------------------
+| **Scrape every website** | `scrape`                    | Scrape every website using Bright's scraping and unti bot-detection capabilities
+| **Web search**           | `search`                    | Search google and other search engines by query (supports batch searches)
+| **Web crawling**         | `crawl`                     | Discover and scrape multiple pages from websites with advanced filtering and depth control
+| **Content parsing**      | `parse_content`             | Extract text, links, images and structured data from API responses (JSON or HTML)
+| **Browser automation**   | `connect_browser`           | Get WebSocket endpoint for Playwright/Selenium integration with Bright Data's scraping browser
+| **Search chatGPT**       | `search_chatGPT`            | Prompt chatGPT and scrape its answers, support multiple inputs and follow-up prompts
+| **Search linkedin**      | `search_linkedin.posts()`, `search_linkedin.jobs()`, `search_linkedin.profiles()` | Search LinkedIn by specific queries, and recieve structured data
+| **Scrape linkedin**      | `scrape_linkedin.posts()`, `scrape_linkedin.jobs()`, `scrape_linkedin.profiles()`, `scrape_linkedin.companies()` | Scrape LinkedIn and recieve structured data
+| **Download functions**   | `download_snapshot`, `download_content`  | Download content for both sync and async requests
+| **Client class**         | `bdclient`         | Handles authentication, automatic zone creation and managment, and options for robust error handling
+| **Parallel processing**  | **all functions**  | All functions use Concurrent processing for multiple URLs or queries, and support multiple Output Formats
+
+### Try usig one of the functions
 
 #### `Search()`
 ```python
@@ -108,6 +119,56 @@ results = client.scrape_linkedin.posts(post_urls) # can also be changed to async
 print(results) # will print the snapshot_id, which can be downloaded using the download_snapshot() function
 ```
 
+#### `crawl()`
+```python
+# Single URL crawl with filters
+result = client.crawl(
+    url="https://example.com/",
+    depth=2,
+    filter="/product/",           # Only crawl URLs containing "/product/"
+    exclude_filter="/ads/",       # Exclude URLs containing "/ads/"
+    custom_output_fields=["markdown", "url", "page_title"]
+)
+print(f"Crawl initiated. Snapshot ID: {result['snapshot_id']}")
+
+# Download crawl results
+data = client.download_snapshot(result['snapshot_id'])
+```
+
+#### `parse_content()`
+```python
+# Parse scraping results
+scraped_data = client.scrape("https://example.com")
+parsed = client.parse_content(
+    scraped_data, 
+    extract_text=True, 
+    extract_links=True, 
+    extract_images=True
+)
+print(f"Title: {parsed['title']}")
+print(f"Text length: {len(parsed['text'])}")
+print(f"Found {len(parsed['links'])} links")
+```
+
+#### `connect_browser()`
+```python
+# For Playwright (default browser_type)
+from playwright.sync_api import sync_playwright
+
+client = bdclient(
+    api_token="your_api_token",
+    browser_username="username-zone-browser_zone1",
+    browser_password="your_password"
+)
+
+with sync_playwright() as playwright:
+    browser = playwright.chromium.connect_over_cdp(client.connect_browser())
+    page = browser.new_page()
+    page.goto("https://example.com")
+    print(f"Title: {page.title()}")
+    browser.close()
+```
+
 **`download_content`** (for sync requests)
 ```python
 data = client.scrape("https://example.com")
@@ -154,6 +215,50 @@ Scrapes a single URL or list of URLs using the Web Unlocker.
 - `timeout`: Request timeout in seconds (default: 30)
 ```
 
+</details>
+<details>
+    <summary>🕷️ <strong>crawl(...)</strong></summary>
+
+Discover and scrape multiple pages from websites with advanced filtering.
+
+```python
+- `url`: Single URL string or list of URLs to crawl (required)
+- `ignore_sitemap`: Ignore sitemap when crawling (optional)
+- `depth`: Maximum crawl depth relative to entered URL (optional)
+- `filter`: Regex to include only certain URLs (e.g. "/product/")
+- `exclude_filter`: Regex to exclude certain URLs (e.g. "/ads/")
+- `custom_output_fields`: List of output fields to include (optional)
+- `include_errors`: Include errors in response (default: True)
+```
+
+</details>
+<details>
+    <summary>🔍 <strong>parse_content(...)</strong></summary>
+
+Extract and parse useful information from API responses.
+
+```python
+- `data`: Response data from scrape(), search(), or crawl() methods
+- `extract_text`: Extract clean text content (default: True)
+- `extract_links`: Extract all links from content (default: False)
+- `extract_images`: Extract image URLs from content (default: False)
+```
+
+</details>
+<details>
+    <summary>🌐 <strong>connect_browser(...)</strong></summary>
+
+Get WebSocket endpoint for browser automation with Bright Data's scraping browser.
+
+```python
+# Required client parameters:
+- `browser_username`: Username for browser API (format: "username-zone-{zone_name}")
+- `browser_password`: Password for browser API authentication
+- `browser_type`: "playwright", "puppeteer", or "selenium" (default: "playwright")
+
+# Returns: WebSocket endpoint URL string
+```
+
 </details>
 <details>
     <summary>💾 <strong>Download_Content(...)</strong></summary>
@@ -191,8 +296,11 @@ Create a `.env` file in your project root:
 
 ```env
 BRIGHTDATA_API_TOKEN=your_bright_data_api_token
-WEB_UNLOCKER_ZONE=your_web_unlocker_zone  # Optional
-SERP_ZONE=your_serp_zone                  # Optional
+WEB_UNLOCKER_ZONE=your_web_unlocker_zone        # Optional
+SERP_ZONE=your_serp_zone                        # Optional
+BROWSER_ZONE=your_browser_zone                  # Optional
+BRIGHTDATA_BROWSER_USERNAME=username-zone-name  # For browser automation
+BRIGHTDATA_BROWSER_PASSWORD=your_browser_password  # For browser automation
 ```
 
 </details>
@@ -223,14 +331,21 @@ client = bdclient(
 <details>
     <summary>👥 <strong>Client Management</strong></summary>
     
-bdclient Class
+bdclient Class - Complete parameter list
     
 ```python
 bdclient(
-    api_token: str = None,
-    auto_create_zones: bool = True,
-    web_unlocker_zone: str = None,
-    serp_zone: str = None,
+    api_token: str = None,                    # Your Bright Data API token (required)
+    auto_create_zones: bool = True,           # Auto-create zones if they don't exist
+    web_unlocker_zone: str = None,            # Custom web unlocker zone name
+    serp_zone: str = None,                    # Custom SERP zone name
+    browser_zone: str = None,                 # Custom browser zone name
+    browser_username: str = None,             # Browser API username (format: "username-zone-{zone_name}")
+    browser_password: str = None,             # Browser API password
+    browser_type: str = "playwright",         # Browser automation tool: "playwright", "puppeteer", "selenium"
+    log_level: str = "INFO",                  # Logging level: "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"
+    structured_logging: bool = True,          # Use structured JSON logging
+    verbose: bool = None                      # Enable verbose logging (overrides log_level if True)
 )
 ```
     
diff --git a/brightdata/__init__.py b/brightdata/__init__.py
@@ -64,7 +64,7 @@
 )
 from .utils import parse_content, parse_multiple, extract_structured_data
 
-__version__ = "1.1.0"
+__version__ = "1.1.1"
 __author__ = "Bright Data"
 __email__ = "support@brightdata.com"
 
diff --git a/brightdata/client.py b/brightdata/client.py
@@ -594,7 +594,7 @@ def connect_browser(self) -> str:
             api_token="your_token",
             browser_username="username-zone-browser_zone1",
             browser_password="your_password",
-            browser_type="playwright"  # or omit for default
+            browser_type="playwright"  # Playwright/ Puppeteer (default)
         )
         endpoint_url = client.connect_browser()  # Returns: wss://...@brd.superproxy.io:9222
         
diff --git a/examples/browser_connection_example.py b/examples/browser_connection_example.py
@@ -2,7 +2,7 @@
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 
 from brightdata import bdclient
-from playwright.sync_api import sync_playwright
+from playwright.sync_api import sync_playwright, Playwright
 
 client = bdclient(
     api_token="your-api-key",
@@ -11,9 +11,23 @@
     browser_zone="your-custom-browser-zone"
 ) # Hover over the function to see browser parameters (can also be taken from .env file)
 
-with sync_playwright() as playwright:
+def scrape(playwright: Playwright, url="https://example.com"):
     browser = playwright.chromium.connect_over_cdp(client.connect_browser()) # Connect to the browser using Bright Data's endpoint
-    page = browser.new_page()
-    page.goto("https://example.com")
-    print(f"Title: {page.title()}")
-    browser.close()
+    try:
+        print(f'Connected! Navigating to {url}...')
+        page = browser.new_page()
+        page.goto(url, timeout=2*60_000)
+        print('Navigated! Scraping page content...')
+        data = page.content()
+        print(f'Scraped! Data: {data}')
+    finally:
+        browser.close()
+
+
+def main():
+    with sync_playwright() as playwright:
+        scrape(playwright)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "brightdata-sdk"
-version = "1.1.0"
+version = "1.1.1"
 description = "Python SDK for Bright Data Web Scraping and SERP APIs"
 authors = [
     {name = "Bright Data", email = "support@brightdata.com"}

Original file line number	Diff line number	Diff line change
`@@ -64,7 +64,7 @@`
`64`	`64`	`)`
`65`	`65`	`from .utils import parse_content, parse_multiple, extract_structured_data`
`66`	`66`
`67`		`-__version__ = "1.1.0"`
	`67`	`+__version__ = "1.1.1"`
`68`	`68`	`__author__ = "Bright Data"`
`69`	`69`	`__email__ = "[email protected]"`
`70`	`70`
Original file line number	Diff line number	Diff line change
`@@ -594,7 +594,7 @@ def connect_browser(self) -> str:`
`594`	`594`	`api_token="your_token",`
`595`	`595`	`browser_username="username-zone-browser_zone1",`
`596`	`596`	`browser_password="your_password",`
`597`		`- browser_type="playwright" # or omit for default`
	`597`	`+ browser_type="playwright" # Playwright/ Puppeteer (default)`
`598`	`598`	`)`
`599`	`599`	`endpoint_url = client.connect_browser() # Returns: wss://[email protected]:9222`
`600`	`600`