Skip to content

Conversation

@pranavkafle
Copy link

Summary

  • expose per-engine parameter schemas as MCP resources (serpapi://engines, serpapi://engines/<engine>)
  • document engine resources in README

Why

  • The MCP tool existed but lacked in-context engine parameter documentation, which made it hard for LLMs to select correct params.
  • With resources, the MCP can work across all engines using the correct per-engine parameters.

Data provenance

  • Engine schemas were generated from the SerpApi Playground JSON at https://serpapi.com/playground.
  • The parameter catalog is embedded in the page’s data-react-props attribute (HTML-encoded JSON).
  • Data was normalized to keep the schema concise and consistent.

Testing

  • Verified all engines in engines/ can be called via MCP.
  • Deprecated engines were excluded from the dataset.
  • MCP server verified via Docker + MCP Inspector.
  • Full test log will be attached as a PR comment (not committed).

@pranavkafle
Copy link
Author

Test summary

  • Ran one pass across all engines using MCP search tool with params from the engine resources.
  • Retries were performed only when params were invalid (e.g., format corrections).
  • Deprecated engines were excluded from the dataset.

Full test log (engine-test-results.md):
https://gist.github.com/pranavkafle/8ca95aeaff74b4a8548435a1e2356c67

@pranavkafle
Copy link
Author

Hi @vladm-serpapi , I just wanted to get your eyes on this contribution of mine. I wanted to make sure that this awesome MCP can use not just a few listed engines but all the engines in serpAPI and has the proper documentation about the required parameters as MCP resources.

I am happy to rework or answer any questions you may have! Thanks!

@vladm-serpapi
Copy link
Contributor

Hi @vladm-serpapi , I just wanted to get your eyes on this contribution of mine. I wanted to make sure that this awesome MCP can use not just a few listed engines but all the engines in serpAPI and has the proper documentation about the required parameters as MCP resources.

I am happy to rework or answer any questions you may have! Thanks!

Hey @pranavkafle , thanks for the work! I will take a look at it this week or early next week to see the best way to integrate that.

@vladm-serpapi
Copy link
Contributor

Just following up here, I am still working through the review.

@pranavkafle
Copy link
Author

Thanks for the follow-up @vladm-serpapi , please feel free to ask any questions or flag anything. Happy to rework/revise if needed.

Copy link
Contributor

@vladm-serpapi vladm-serpapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good overall and actually will add a lot of value to the MCP users. The integration approach is also sensible.

There were a few questions I wanted to cover before merging:

  • Where were the engine JSON schemas sourced from and what was the workflow for that? I think SerpApi has some functionality to expose the parameter information, but I am not aware if it's actually publicly accessible, so curious where the current JSON files were sourced from.
  • Given the ongoing API updates, it would be ideal to generate the schema engines on the fly or during the build time. For instance, we could pull schemas from the publicly running API and include them into the Docker image (serialized into engines/... directory). That approach would allow us to keep the schemas in sync with the existing API endpoint parameters and engines.

@pranavkafle Thanks for all the work on contributing the PR! Let me know your considerations on the above and I'll be happy to push this forward faster and get it merged asap.

@pranavkafle pranavkafle force-pushed the feature/engine-resources branch from 26afa3d to 40e5e0a Compare January 14, 2026 15:35
@pranavkafle
Copy link
Author

Hi @vladm-serpapi, thanks for the review and great questions!

1. Engine Schema Source:

The engine JSON schemas are sourced from the SerpApi Playground (https://serpapi.com/playground). The playground page embeds all engine parameter metadata in a data-react-props attribute, which is publicly accessible.

I've added a build-engines.py script that handles this extraction:

def fetch_props(url: str) -> dict[str, object]:
    """Fetch playground HTML and extract React props."""
    req = Request(url, headers={"User-Agent": USER_AGENT})
    with urlopen(req, timeout=TIMEOUT_SECONDS) as resp:
        page_html = resp.read().decode("utf-8", errors="ignore")
    soup = BeautifulSoup(page_html, "html.parser")
    node = soup.find(attrs={"data-react-props": True})
    if not node:
        raise RuntimeError("Failed to locate data-react-props in playground HTML.")
    return json.loads(html.unescape(node["data-react-props"]))

The script normalizes the extracted data (converts HTML descriptions to markdown, filters relevant fields like type, options, required) and outputs individual JSON files per engine to engines/.

2. Build-Time Generation:

Per your suggestion, I've committed this in the latest changes. The Dockerfile now generates fresh engine schemas at build time:

RUN uv sync
ENV PATH="/app/.venv/bin:$PATH"
RUN python /app/build-engines.py

This means:

  • Every Docker image build pulls the latest engine schemas from the playground
  • Schemas stay in sync with API updates automatically

We could probably use a GitHub Action to generate the engine schemas on a schedule in the future.

Let me know if you'd like any adjustments to this approach or if anything else is needed to get this merged!

@vladm-serpapi
Copy link
Contributor

Hi @vladm-serpapi, thanks for the review and great questions!

1. Engine Schema Source:

The engine JSON schemas are sourced from the SerpApi Playground (https://serpapi.com/playground). The playground page embeds all engine parameter metadata in a data-react-props attribute, which is publicly accessible.

I've added a build-engines.py script that handles this extraction:

def fetch_props(url: str) -> dict[str, object]:
    """Fetch playground HTML and extract React props."""
    req = Request(url, headers={"User-Agent": USER_AGENT})
    with urlopen(req, timeout=TIMEOUT_SECONDS) as resp:
        page_html = resp.read().decode("utf-8", errors="ignore")
    soup = BeautifulSoup(page_html, "html.parser")
    node = soup.find(attrs={"data-react-props": True})
    if not node:
        raise RuntimeError("Failed to locate data-react-props in playground HTML.")
    return json.loads(html.unescape(node["data-react-props"]))

The script normalizes the extracted data (converts HTML descriptions to markdown, filters relevant fields like type, options, required) and outputs individual JSON files per engine to engines/.

2. Build-Time Generation:

Per your suggestion, I've committed this in the latest changes. The Dockerfile now generates fresh engine schemas at build time:

RUN uv sync
ENV PATH="/app/.venv/bin:$PATH"
RUN python /app/build-engines.py

This means:

  • Every Docker image build pulls the latest engine schemas from the playground
  • Schemas stay in sync with API updates automatically

We could probably use a GitHub Action to generate the engine schemas on a schedule in the future.

Let me know if you'd like any adjustments to this approach or if anything else is needed to get this merged!

Interesting approach. Let me circle back with the team on that. I think we'll likely enable some direct generation based on what's available. Thanks for providing an expanded info!

@pranavkafle
Copy link
Author

Thanks for the update, @vladm-serpapi.

While the Playground scraping works for now, I agree that a direct, structured source would be much more reliable. If SerpApi has a canonical JSON schema or an internal metadata endpoint you'd prefer I use, let me know.

I’m happy to update the script to point to a more stable 'source of truth' to keep the MCP robust. Looking forward to the team's feedback!

Copy link
Contributor

@vladm-serpapi vladm-serpapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, just minor adjustments. I've left a few comments related to formatting and some versions / sanitization logic.

I'll ask one of the team members to take a final look and I think we're good to merge. Thanks for the contribution!

@pranavkafle
Copy link
Author

pranavkafle commented Jan 16, 2026

Thanks for the feedback! I’ve applied the requested changes: uv format reformatting, markdownify>=0.14.1 bump (+ lockfile update), and engine-name sanitization in src/server.py. uv format --check now passes. Please re-review when you have a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants