watercrawl · amirasaran · Apr 30, 2025 · Apr 30, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,20 +5,42 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [0.1.0] - 2024-12-29
+## [0.7.0] - 2025-04-30
 
 ### Added
-- Initial release of the WaterCrawl Python client
-- Basic API client functionality with request handling
-- Support for synchronous and asynchronous crawling
-- Comprehensive crawling options and configurations
-- Built-in request monitoring and result downloading
-- Session management and request handling
-- MIT License
-- Basic documentation
+- New search API functionality with methods:
+  - `create_search_request`: Create search requests with customizable options
+  - `monitor_search_request`: Monitor the progress of search operations
+  - `get_search_request`: Retrieve details of a search request
+  - `stop_search_request`: Cancel an ongoing search operation
+- Comprehensive test suite for all search-related API methods
+- Enhanced error handling and retry mechanisms for more reliable API interactions
+- Better diagnostic capabilities for API errors
+- Updated documentation with complete examples of all API methods
 
 ### Changed
-- None (initial release)
+- Improved test suite with better error handling, diagnostic information, and retry logic
+- Enhanced documentation with comprehensive examples in README
+
+### Deprecated
+- None
+
+### Removed
+- None
+
+### Fixed
+- Fixed sitemap-related tests to handle actual API responses correctly
+
+### Security
+- None
+
+## [0.6.1] - 2025-04-20
+
+### Added
+- Added support for specifying `page_number` and `page_size` parameters to results endpoints for improved pagination and control over result sets.
+
+### Changed
+- None
 
 ### Deprecated
 - None
@@ -30,8 +52,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - None
 
 ### Security
-- Basic API key authentication
-- Secure request handling with HTTPS
+- None
 
 ## [0.6.0] - 2025-04-20
 
@@ -55,13 +76,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Security
 - None
 
-## [0.6.1] - 2025-04-20
+## [0.1.0] - 2024-12-29
 
 ### Added
-- Added support for specifying `page_number` and `page_size` parameters to results endpoints for improved pagination and control over result sets.
+- Initial release of the WaterCrawl Python client
+- Basic API client functionality with request handling
+- Support for synchronous and asynchronous crawling
+- Comprehensive crawling options and configurations
+- Built-in request monitoring and result downloading
+- Session management and request handling
+- MIT License
+- Basic documentation
 
 ### Changed
-- None
+- None (initial release)
 
 ### Deprecated
 - None
@@ -73,4 +101,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - None
 
 ### Security
-- None
+- Basic API key authentication
+- Secure request handling with HTTPS
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2024 WaterCrawl-Plugin
+Copyright (c) 2025 WaterCrawl-Python
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -10,7 +10,6 @@ A Python client library for interacting with the WaterCrawl API - a powerful web
 pip install watercrawl-py
 ```
 
-
 ## Quick Start
 
 ```python
@@ -33,7 +32,220 @@ crawl_request = client.create_crawl_request(
 # Monitor and download results
 for result in client.monitor_crawl_request(crawl_request['uuid']):
     if result['type'] == 'result':
-        print(result['data']) // it is a result object per page
+        print(result['data'])  # it is a result object per page
+```
+
+## API Examples
+
+### Client Initialization
+
+```python
+from watercrawl import WaterCrawlAPIClient
+
+# Initialize with default base URL
+client = WaterCrawlAPIClient('your-api-key')
+
+# Or specify a custom base URL
+client = WaterCrawlAPIClient('your-api-key', base_url='https://custom-app.watercrawl.dev/')
+```
+
+### Crawling Operations
+
+#### List all crawl requests
+
+```python
+# Get the first page of requests (default page size: 10)
+requests = client.get_crawl_requests_list()
+
+# Specify page number and size
+requests = client.get_crawl_requests_list(page=2, page_size=20)
+```
+
+#### Get a specific crawl request
+
+```python
+request = client.get_crawl_request('request-uuid')
+```
+
+#### Create a crawl request
+
+```python
+# Simple request with just a URL
+request = client.create_crawl_request(url='https://example.com')
+
+# Advanced request with a single URL
+request = client.create_crawl_request(
+    url='https://example.com',
+    spider_options={
+        "max_depth": 1, # maximum depth to crawl
+        "page_limit": 1, # maximum number of pages to crawl
+        "allowed_domains": [], # allowed domains to crawl
+        "exclude_paths": [], # exclude paths
+        "include_paths": [] # include paths
+    },
+    page_options={
+        "exclude_tags": [], # exclude tags from the page
+        "include_tags": [], # include tags from the page
+        "wait_time": 1000, # wait time in milliseconds after page load
+        "include_html": False, # the result will include HTML
+        "only_main_content": True, # only main content of the page automatically remove headers, footers, etc.
+        "include_links": False, # if True the result will include links
+        "timeout": 15000, # timeout in milliseconds
+        "accept_cookies_selector": None, # accept cookies selector e.g. "#accept-cookies"
+        "locale": "en-US", # locale
+        "extra_headers": {}, # extra headers e.g. {"Authorization": "Bearer your_token"}
+        "actions": [] # actions to perform {"type": "screenshot"} or {"type": "pdf"}
+    },
+    plugin_options={}
+)
+```
+
+#### Stop a crawl request
+
+```python
+client.stop_crawl_request('request-uuid')
+```
+
+#### Download a crawl request result
+
+```python
+# Download the crawl request as a ZIP file
+zip_data = client.download_crawl_request('request-uuid')
+
+# Save to a file
+with open('crawl_results.zip', 'wb') as f:
+    f.write(zip_data)
+```
+
+#### Monitor a crawl request
+
+```python
+# Monitor with automatic result download (default)
+for event in client.monitor_crawl_request('request-uuid'):
+    if event['type'] == 'state':
+        print(f"Crawl state: {event['data']['status']}")
+    elif event['type'] == 'result':
+        print(f"Received result for: {event['data']['url']}")
+
+# Monitor without downloading results will return result as url instead of result object
+for event in client.monitor_crawl_request('request-uuid', download=False):
+    print(f"Event type: {event['type']}")
+```
+
+#### Get crawl request results
+
+```python
+# Get the first page of results
+results = client.get_crawl_request_results('request-uuid')
+
+# Specify page number and size
+results = client.get_crawl_request_results('request-uuid', page=2, page_size=20)
+```
+
+#### Quick URL scraping
+
+```python
+# Synchronous scraping (default)
+result = client.scrape_url('https://example.com')
+
+# With page options
+result = client.scrape_url(
+    'https://example.com',
+    page_options={}
+)
+
+# Asynchronous scraping
+request = client.scrape_url('https://example.com', sync=False)
+# Later check for results with get_crawl_request
+```
+
+### Sitemap Operations
+
+#### Download a sitemap
+
+```python
+# Download using a crawl request object
+crawl_request = client.get_crawl_request('request-uuid')
+sitemap = client.download_sitemap(crawl_request)
+
+# you need to give crawl request uuid or crawl request object
+sitemap = client.download_sitemap('request-uuid')
+
+# Process sitemap entries
+for entry in sitemap:
+    print(f"URL: {entry['url']}, Title: {entry['title']}")
+```
+
+#### Download sitemap as graph data
+
+```python
+# you need to give crawl request uuid or crawl request object
+graph_data = client.download_sitemap_graph('request-uuid')
+```
+
+#### Download sitemap as markdown
+
+```python
+# you need to give crawl request uuid or crawl request object
+markdown = client.download_sitemap_markdown('request-uuid')
+```
+
+### Search Operations
+
+#### Create a search request
+
+```python
+# Simple search
+search = client.create_search_request(query="python programming")
+
+# Search with options and limited results
+search = client.create_search_request(
+    query="python tutorial", 
+    search_options={
+        "language": null, # language code e.g. "en" or "fr" or "es"
+        "country": null, # country code e.g. "us" or "fr" or "es"
+        "time_renge": "any", # time range e.g. "any" or "hour" or "day" or "week" or "month" or "year"
+        "search_type": "web", # search type e.g. "web" now just web is supported
+        "depth": "basic" # depth e.g. "basic" or "advanced" or "ultimate"
+    },
+    result_limit=5, # limit the number of results
+    sync=True, # wait for results
+    download=True # download results
+)
+
+# Asynchronous search
+search = client.create_search_request(
+    query="machine learning",
+    search_options={},
+    result_limit=5, # limit the number of results
+    sync=False, # Don't wait for results
+    download=False # Don't download results
+)
+```
+
+#### Monitor a search request
+
+```python
+# Monitor with automatic result download the event type just state for now
+for event in client.monitor_search_request('search-uuid'):
+    if event['type'] == 'state':
+        print(f"Search state: {event['status']}")
+
+# Monitor without downloading results
+for event in client.monitor_search_request('search-uuid', download=False):
+    print(f"Event: {event}")
+```
+
+#### Get search request details
+
+```python
+search = client.get_search_request('search-uuid', download=True)
+```
+
+#### Stop a search request
+
+```python
+client.stop_search_request('search-uuid')
 ```
 
 ## Features
@@ -43,6 +255,7 @@ for result in client.monitor_crawl_request(crawl_request['uuid']):
 - Comprehensive crawling options and configurations
 - Built-in request monitoring and result downloading
 - Efficient session management and request handling
+- Support for sitemaps and search operations
 
 ## Documentation