Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official SDKs for the ScrapeGraph AI API - Intelligent web scraping powered by AI. Extract structured data from any webpage with natural language prompts.
Official SDKs for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.

Get your [API key](https://scrapegraphai.com)!

## πŸš€ Quick Links

- [Python SDK Documentation](scrapegraph-py/README.md)
- [JavaScript SDK Documentation](scrapegraph-js/README.md)
- [API Documentation](https://docs.scrapegraphai.com)
- [API Documentation](https://docs.scrapegraphai.com)
- [Website](https://scrapegraphai.com)

## πŸ“¦ Installation
Expand All @@ -34,31 +34,31 @@ npm install scrapegraph-js

## 🎯 Core Features

- πŸ€– **AI-Powered Extraction**: Use natural language to describe what data you want
- πŸ€– **AI-Powered Extraction & Search**: Use natural language to extract data or search the web
- πŸ“Š **Structured Output**: Get clean, structured data with optional schema validation
- πŸ”„ **Multiple Formats**: Extract data as JSON, Markdown, or custom schemas
- ⚑ **High Performance**: Concurrent processing and automatic retries
- πŸ”’ **Enterprise Ready**: Production-grade security and rate limiting

## πŸ› οΈ Available Endpoints

### πŸ” SmartScraper
Extract structured data from any webpage using natural language prompts.
### πŸ€– SmartScraper
Using AI to extract structured data from any webpage or HTML content with natural language prompts.

### πŸ” SearchScraper
Perform AI-powered web searches with structured results and reference URLs.

### πŸ“ Markdownify
Convert any webpage into clean, formatted markdown.

### πŸ’» LocalScraper
Extract information from a local HTML file using AI.


## 🌟 Key Benefits

- πŸ“ **Natural Language Queries**: No complex selectors or XPath needed
- 🎯 **Precise Extraction**: AI understands context and structure
- πŸ”„ **Adaptive Scraping**: Works with dynamic and static content
- πŸ”„ **Adaptive Processing**: Works with both web content and direct HTML
- πŸ“Š **Schema Validation**: Ensure data consistency with Pydantic/TypeScript
- ⚑ **Async Support**: Handle multiple requests efficiently
- πŸ” **Source Attribution**: Get reference URLs for search results

## πŸ’‘ Use Cases

Expand All @@ -67,13 +67,14 @@ Extract information from a local HTML file using AI.
- πŸ“° **Content Aggregation**: Convert articles to structured formats
- πŸ” **Data Mining**: Extract specific information from multiple sources
- πŸ“± **App Integration**: Feed clean data into your applications
- 🌐 **Web Research**: Perform AI-powered searches with structured results

## πŸ“– Documentation

For detailed documentation and examples, visit:
- [Python SDK Guide](scrapegraph-py/README.md)
- [JavaScript SDK Guide](scrapegraph-js/README.md)
- [API Documentation](https://docs.scrapegraphai.com)
- [API Documentation](https://docs.scrapegraphai.com)

## πŸ’¬ Support & Feedback

Expand Down
14 changes: 14 additions & 0 deletions scrapegraph-py/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
## [1.9.0-beta.7](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.6...v1.9.0-beta.7) (2025-02-03)
## [1.10.2](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.10.1...v1.10.2) (2025-01-22)


Expand All @@ -18,6 +19,19 @@

### Features

* add optional headers to request ([bb851d7](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/bb851d785d121b039d5e968327fb930955a3fd92))
* merged localscraper into smartscraper ([503dbd1](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/503dbd19b8cec4d2ff4575786b0eec25db2e80e6))
* modified icons ([bcb9b0b](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/bcb9b0b731b057d242fdf80b43d96879ff7a2764))
* searchscraper ([2e04e5a](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/2e04e5a1bbd207a7ceeea594878bdea542a7a856))
* updated readmes ([bfdbea0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/bfdbea038918d79df2e3e9442e25d5f08bbccbbc))


### chore

* refactor examples ([8e00846](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/8e008465f7280c53e2faab7a92f02871ffc5b867))
* **tests:** updated tests ([9149ce8](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/9149ce85a78b503098f80910c20de69831030378))

## [1.9.0-beta.6](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.5...v1.9.0-beta.6) (2025-01-08)
* add integration for sql ([2543b5a](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/2543b5a9b84826de5c583d38fe89cf21aad077e6))


Expand Down
81 changes: 53 additions & 28 deletions scrapegraph-py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)

<p align="left">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
Expand All @@ -20,7 +20,7 @@ pip install scrapegraph-py

## πŸš€ Features

- πŸ€– AI-powered web scraping
- πŸ€– AI-powered web scraping and search
- πŸ”„ Both sync and async clients
- πŸ“Š Structured output with Pydantic schemas
- πŸ” Detailed logging
Expand All @@ -40,21 +40,36 @@ client = Client(api_key="your-api-key-here")
## πŸ“š Available Endpoints

### πŸ” SmartScraper
### πŸ€– SmartScraper

Scrapes any webpage using AI to extract specific information.
Extract structured data from any webpage or HTML content using AI.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Basic usage
# Using a URL
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)

# Or using HTML content
html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
</body>
</html>
"""

response = client.smartscraper(
website_html=html_content,
user_prompt="Extract the company description"
)

print(response)
```

Expand All @@ -80,46 +95,56 @@ response = client.smartscraper(

</details>

### πŸ“ Markdownify
### πŸ” SearchScraper

Converts any webpage into clean, formatted markdown.
Perform AI-powered web searches with structured results and reference URLs.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
website_url="https://example.com"
response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?"
)

print(response)
print(f"Answer: {response['result']}")
print(f"Sources: {response['reference_urls']}")
```

### πŸ’» LocalScraper

Extracts information from HTML content using AI.
<details>
<summary>Output Schema (Optional)</summary>

```python
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: [email protected]</p>
</div>
</body>
</html>
"""
class PythonVersionInfo(BaseModel):
version: str = Field(description="The latest Python version number")
release_date: str = Field(description="When this version was released")
major_features: list[str] = Field(description="List of main features")

response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?",
output_schema=PythonVersionInfo
)
```

</details>

response = client.localscraper(
user_prompt="Extract the company description",
website_html=html_content
### πŸ“ Markdownify

Converts any webpage into clean, formatted markdown.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
website_url="https://example.com"
)

print(response)
Expand Down Expand Up @@ -177,7 +202,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
## πŸ”— Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://docs.scrapegraphai.com)
- [Documentation](https://docs.scrapegraphai.com)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

---
Expand Down
46 changes: 46 additions & 0 deletions scrapegraph-py/examples/async/async_searchscraper_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"""
Example of using the async searchscraper functionality to search for information concurrently.
"""

import asyncio

from scrapegraph_py import AsyncClient
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")


async def main():
# Initialize async client
sgai_client = AsyncClient(api_key="your-api-key-here")

# List of search queries
queries = [
"What is the latest version of Python and what are its main features?",
"What are the key differences between Python 2 and Python 3?",
"What is Python's GIL and how does it work?",
]

# Create tasks for concurrent execution
tasks = [sgai_client.searchscraper(user_prompt=query) for query in queries]

# Execute requests concurrently
responses = await asyncio.gather(*tasks, return_exceptions=True)

# Process results
for i, response in enumerate(responses):
if isinstance(response, Exception):
print(f"\nError for query {i+1}: {response}")
else:
print(f"\nSearch {i+1}:")
print(f"Query: {queries[i]}")
print(f"Result: {response['result']}")
print("Reference URLs:")
for url in response["reference_urls"]:
print(f"- {url}")

await sgai_client.close()


if __name__ == "__main__":
asyncio.run(main())
Loading