v1.1.0: Web Crawling, Content Parsing & Browser Automation
New Features
🕷️ Web Crawling
- crawl() function for discovering and scraping multiple pages from websites
- Advanced filtering with regex patterns for URL inclusion/exclusion
- Configurable crawl depth and sitemap handling
- Custom output schema support
🔍 Content Parsing
- parse_content() function for extracting useful data from API responses
- Support for text extraction, link discovery, and image URL collection
- Handles both JSON responses and raw HTML content
- Structured data extraction from various content formats
🌐 Browser Automation
- connect_browser() function for Playwright/Selenium integration
- WebSocket endpoint generation for scraping browser connections
- Support for multiple browser automation tools (Playwright, Puppeteer, Selenium)
- Seamless authentication with Bright Data's browser service
Improvements
📡 Better Async Handling
- Enhanced download_snapshot() with improved 202 status code handling
- Friendly status messages instead of exceptions for pending snapshots
- Better user experience for asynchronous data processing
🔧 Robust Error Handling
- Fixed zone creation error handling with proper exception propagation
- Added retry logic for network failures and temporary errors
- Improved zone management reliability
🐍 Python Support Update
- Updated to support Python 3.8+ (removed Python 3.7)
- Updated CI/CD pipeline for modern Python versions
- Added BeautifulSoup4 as core dependency
Dependencies
- Added: beautifulsoup4>=4.9.0 for content parsing
- Updated: Python compatibility to >=3.8
Examples
New example files demonstrate the enhanced functionality:
examples/crawl_example.py- Web crawling usageexamples/browser_connection_example.py- Browser automation setupexamples/parse_content_example.py- Content parsing workflows