Skip to content

Commit 80970b2

Browse files
update server to support adk requirements
1 parent d7db52b commit 80970b2

File tree

2 files changed

+274
-4
lines changed

2 files changed

+274
-4
lines changed

README.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ A production-ready [Model Context Protocol](https://modelcontextprotocol.io/intr
1313
- [Quick Start](#quick-start)
1414
- [Available Tools](#available-tools)
1515
- [Setup Instructions](#setup-instructions)
16+
- [Local Usage](#local-usage)
17+
- [Google ADK Integration](#google-adk-integration)
1618
- [Example Use Cases](#example-use-cases)
1719
- [Error Handling](#error-handling)
1820
- [Common Issues](#common-issues)
@@ -212,6 +214,247 @@ Add the ScrapeGraphAI MCP server on the settings:
212214

213215
![Cursor MCP Integration](assets/cursor_mcp.png)
214216

217+
## Local Usage
218+
219+
To run the MCP server locally for development or testing, follow these steps:
220+
221+
### Prerequisites
222+
223+
- Python 3.10 or higher
224+
- pip or uv package manager
225+
- ScrapeGraph API key
226+
227+
### Installation
228+
229+
1. **Clone the repository** (if you haven't already):
230+
231+
```bash
232+
git clone https://github.com/ScrapeGraphAI/scrapegraph-mcp
233+
cd scrapegraph-mcp
234+
```
235+
236+
2. **Install the package**:
237+
238+
```bash
239+
# Using pip
240+
pip install -e .
241+
242+
# Or using uv (faster)
243+
uv pip install -e .
244+
```
245+
246+
3. **Set your API key**:
247+
248+
```bash
249+
# macOS/Linux
250+
export SGAI_API_KEY=your-api-key-here
251+
252+
# Windows (PowerShell)
253+
$env:SGAI_API_KEY="your-api-key-here"
254+
255+
# Windows (CMD)
256+
set SGAI_API_KEY=your-api-key-here
257+
```
258+
259+
### Running the Server Locally
260+
261+
You can run the server directly:
262+
263+
```bash
264+
# Using the installed command
265+
scrapegraph-mcp
266+
267+
# Or using Python module
268+
python -m scrapegraph_mcp.server
269+
```
270+
271+
The server will start and communicate via stdio (standard input/output), which is the standard MCP transport method.
272+
273+
### Testing with MCP Inspector
274+
275+
Test your local server using the MCP Inspector tool:
276+
277+
```bash
278+
npx @modelcontextprotocol/inspector python -m scrapegraph_mcp.server
279+
```
280+
281+
This provides a web interface to test all available tools interactively.
282+
283+
### Configuring Claude Desktop for Local Server
284+
285+
To use your locally running server with Claude Desktop, update your configuration file:
286+
287+
**macOS/Linux** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
288+
289+
```json
290+
{
291+
"mcpServers": {
292+
"scrapegraph-mcp-local": {
293+
"command": "python",
294+
"args": [
295+
"-m",
296+
"scrapegraph_mcp.server"
297+
],
298+
"env": {
299+
"SGAI_API_KEY": "your-api-key-here"
300+
}
301+
}
302+
}
303+
}
304+
```
305+
306+
**Windows** (`%APPDATA%\Claude\claude_desktop_config.json`):
307+
308+
```json
309+
{
310+
"mcpServers": {
311+
"scrapegraph-mcp-local": {
312+
"command": "python",
313+
"args": [
314+
"-m",
315+
"scrapegraph_mcp.server"
316+
],
317+
"env": {
318+
"SGAI_API_KEY": "your-api-key-here"
319+
}
320+
}
321+
}
322+
}
323+
```
324+
325+
**Note**: Make sure Python is in your PATH. You can verify by running `python --version` in your terminal.
326+
327+
### Configuring Cursor for Local Server
328+
329+
In Cursor's MCP settings, add a new server with:
330+
331+
- **Command**: `python`
332+
- **Args**: `["-m", "scrapegraph_mcp.server"]`
333+
- **Environment Variables**: `{"SGAI_API_KEY": "your-api-key-here"}`
334+
335+
### Troubleshooting Local Setup
336+
337+
**Server not starting:**
338+
- Verify Python is installed: `python --version`
339+
- Check that the package is installed: `pip list | grep scrapegraph-mcp`
340+
- Ensure API key is set: `echo $SGAI_API_KEY` (macOS/Linux) or `echo %SGAI_API_KEY%` (Windows)
341+
342+
**Tools not appearing:**
343+
- Check Claude Desktop logs:
344+
- macOS: `~/Library/Logs/Claude/`
345+
- Windows: `%APPDATA%\Claude\Logs\`
346+
- Verify the server starts without errors when run directly
347+
- Check that the configuration JSON is valid
348+
349+
**Import errors:**
350+
- Reinstall the package: `pip install -e . --force-reinstall`
351+
- Verify dependencies: `pip install -r requirements.txt` (if available)
352+
353+
## Google ADK Integration
354+
355+
The ScrapeGraph MCP server can be integrated with [Google ADK (Agent Development Kit)](https://github.com/google/adk) to create AI agents with web scraping capabilities.
356+
357+
### Prerequisites
358+
359+
- Python 3.10 or higher
360+
- Google ADK installed
361+
- ScrapeGraph API key
362+
363+
### Installation
364+
365+
1. **Install Google ADK** (if not already installed):
366+
367+
```bash
368+
pip install google-adk
369+
```
370+
371+
2. **Set your API key**:
372+
373+
```bash
374+
export SGAI_API_KEY=your-api-key-here
375+
```
376+
377+
### Basic Integration Example
378+
379+
Create an agent file (e.g., `agent.py`) with the following configuration:
380+
381+
```python
382+
import os
383+
from google.adk.agents import LlmAgent
384+
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
385+
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
386+
from mcp import StdioServerParameters
387+
388+
# Path to the scrapegraph-mcp server directory
389+
SCRAPEGRAPH_MCP_PATH = "/path/to/scrapegraph-mcp"
390+
391+
# Path to the server.py file
392+
SERVER_SCRIPT_PATH = os.path.join(
393+
SCRAPEGRAPH_MCP_PATH,
394+
"src",
395+
"scrapegraph_mcp",
396+
"server.py"
397+
)
398+
399+
root_agent = LlmAgent(
400+
model='gemini-2.0-flash',
401+
name='scrapegraph_assistant_agent',
402+
instruction='Help the user with web scraping and data extraction using ScrapeGraph AI. '
403+
'You can convert webpages to markdown, extract structured data using AI, '
404+
'perform web searches, crawl multiple pages, and automate complex scraping workflows.',
405+
tools=[
406+
MCPToolset(
407+
connection_params=StdioConnectionParams(
408+
server_params=StdioServerParameters(
409+
command='python3',
410+
args=[
411+
SERVER_SCRIPT_PATH,
412+
],
413+
env={
414+
'SGAI_API_KEY': os.getenv('SGAI_API_KEY'),
415+
},
416+
),
417+
timeout=300.0,)
418+
),
419+
# Optional: Filter which tools from the MCP server are exposed
420+
# tool_filter=['markdownify', 'smartscraper', 'searchscraper']
421+
)
422+
],
423+
)
424+
```
425+
426+
### Configuration Options
427+
428+
**Timeout Settings:**
429+
- Default timeout is 5 seconds, which may be too short for web scraping operations
430+
- Recommended: Set `timeout=300.0
431+
- Adjust based on your use case (crawling operations may need even longer timeouts)
432+
433+
**Tool Filtering:**
434+
- By default, all 8 tools are exposed to the agent
435+
- Use `tool_filter` to limit which tools are available:
436+
```python
437+
tool_filter=['markdownify', 'smartscraper', 'searchscraper']
438+
```
439+
440+
**API Key Configuration:**
441+
- Set via environment variable: `export SGAI_API_KEY=your-key`
442+
- Or pass directly in `env` dict: `'SGAI_API_KEY': 'your-key-here'`
443+
- Environment variable approach is recommended for security
444+
445+
### Usage Example
446+
447+
Once configured, your agent can use natural language to interact with web scraping tools:
448+
449+
```python
450+
# The agent can now handle queries like:
451+
# - "Convert https://example.com to markdown"
452+
# - "Extract all product prices from this e-commerce page"
453+
# - "Search for recent AI research papers and summarize them"
454+
# - "Crawl this documentation site and extract all API endpoints"
455+
```
456+
For more information about Google ADK, visit the [official documentation](https://github.com/google/adk).
457+
215458
## Example Use Cases
216459

217460
The server enables sophisticated queries across various scraping scenarios:

src/scrapegraph_mcp/server.py

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
import json
1313
import logging
1414
import os
15-
from typing import Any, Dict, Optional, List, Union
15+
from typing import Any, Dict, Optional, List, Union, Annotated
1616

1717
import httpx
1818
from fastmcp import Context, FastMCP
@@ -916,7 +916,16 @@ def smartscraper(
916916
website_url: Optional[str] = None,
917917
website_html: Optional[str] = None,
918918
website_markdown: Optional[str] = None,
919-
output_schema: Optional[Union[str, Dict[str, Any]]] = None,
919+
output_schema: Optional[Annotated[Union[str, Dict[str, Any]], Field(
920+
default=None,
921+
description="JSON schema dict or JSON string defining the expected output structure",
922+
json_schema_extra={
923+
"oneOf": [
924+
{"type": "string"},
925+
{"type": "object"}
926+
]
927+
}
928+
)]] = None,
920929
number_of_scrolls: Optional[int] = None,
921930
total_pages: Optional[int] = None,
922931
render_heavy_js: Optional[bool] = None,
@@ -1157,8 +1166,26 @@ def agentic_scrapper(
11571166
url: str,
11581167
ctx: Context,
11591168
user_prompt: Optional[str] = None,
1160-
output_schema: Optional[Union[str, Dict[str, Any]]] = None,
1161-
steps: Optional[Union[str, List[str]]] = None,
1169+
output_schema: Optional[Annotated[Union[str, Dict[str, Any]], Field(
1170+
default=None,
1171+
description="Desired output structure as a JSON schema dict or JSON string",
1172+
json_schema_extra={
1173+
"oneOf": [
1174+
{"type": "string"},
1175+
{"type": "object"}
1176+
]
1177+
}
1178+
)]] = None,
1179+
steps: Optional[Annotated[Union[str, List[str]], Field(
1180+
default=None,
1181+
description="Step-by-step instructions for the agent as a list of strings or JSON array string",
1182+
json_schema_extra={
1183+
"oneOf": [
1184+
{"type": "string"},
1185+
{"type": "array", "items": {"type": "string"}}
1186+
]
1187+
}
1188+
)]] = None,
11621189
ai_extraction: Optional[bool] = None,
11631190
persistent_session: Optional[bool] = None,
11641191
timeout_seconds: Optional[float] = None

0 commit comments

Comments
 (0)