Skip to content

Conversation

@GabrielDrapor
Copy link
Contributor

@GabrielDrapor GabrielDrapor commented Aug 15, 2025

PR Type

Enhancement


Description

  • Replace complex LLM-based manifest generation with API-based approach

  • Add GitHub workflow for automated manifest generation via PR

  • Simplify script to use chatxiv.org API instead of local processing

  • Add validation step to correct installations against README


Diagram Walkthrough

flowchart LR
  A["Repository URL"] --> B["API Call"]
  B --> C["Generate Manifest"]
  C --> D["Validate Installations"]
  D --> E["Save to Registry"]
  F["GitHub Workflow"] --> G["Create PR"]
  E --> G
Loading

File Walkthrough

Relevant files
Enhancement
get_manifest.py
Complete rewrite using API-based approach                               

scripts/get_manifest.py

  • Replaced 900+ line complex LLM-based generator with 200+ line API
    client
  • Added extract_json_from_content() for parsing API responses
  • Implemented validate_installations() to verify against README
  • Simplified manifest generation using chatxiv.org API
+202/-879
generate-manifest.yml
New GitHub workflow for manifest generation                           

.github/workflows/generate-manifest.yml

  • Added workflow for automated manifest generation
  • Includes manual trigger with repository URL input
  • Creates PR with generated manifest automatically
  • Sets up Python environment and API authentication
+81/-0   

Summary by CodeRabbit

  • New Features

    • Added a manual GitHub Actions workflow to generate an MCP manifest for a given repository and automatically open a pull request with the result.
  • Refactor

    • Streamlined the manifest generation tool to use an external API, reducing complexity and dependencies.
    • Simplified command-line usage and output handling for faster, more reliable manifest creation.
  • Chores

    • Standardized environment variable usage for API access.
    • Improved status messaging during manifest generation and validation.

@coderabbitai
Copy link

coderabbitai bot commented Aug 15, 2025

Walkthrough

Introduces a GitHub Actions workflow to generate an MCP manifest for a given repository and open a PR. Replaces the previous LLM-heavy manifest script with a simplified, external API-driven implementation featuring JSON parsing, repository name derivation, API calls for generation/validation, and file output.

Changes

Cohort / File(s) Summary
CI workflow for manifest PRs
.github/workflows/generate-manifest.yml
Adds a manually triggered workflow (workflow_dispatch) taking repo_url, setting up Python 3.11, running scripts/get_manifest.py with ANYON_API_KEY, deriving repo/branch names, and creating a PR via peter-evans/create-pull-request@v5.
Manifest generator refactor
scripts/get_manifest.py
Replaces class-based LLM pipeline with functions: extract_json_from_content, get_repo_name_from_url, generate_manifest (ANYON API), validate_installations (API recheck), save_manifest (writes to mcp-registry/servers), and a CLI main(). Removes dotenv/logging/LLM logic.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant GitHub Actions as Workflow
  participant Repo as Target Repo
  participant Script as get_manifest.py
  participant ANYON as External API
  participant PR as create-pull-request

  User->>Workflow: workflow_dispatch (repo_url)
  Workflow->>Repo: actions/checkout
  Workflow->>Script: python scripts/get_manifest.py --repo_url
  Script->>ANYON: generate_manifest(repo_url)
  ANYON-->>Script: manifest content
  Script->>ANYON: validate_installations(manifest, repo_url)
  ANYON-->>Script: validated manifest
  Script->>Repo: write mcp-registry/servers/<repo>.json
  Workflow->>PR: create pull request (branch add-manifest-<repo>)
Loading
sequenceDiagram
  participant Main as main()
  participant Gen as generate_manifest()
  participant Val as validate_installations()
  participant FS as Filesystem
  participant API as ANYON API

  Main->>Gen: repo_url
  Gen->>API: request manifest
  API-->>Gen: content (possibly JSON in code block)
  Gen-->>Main: manifest dict or None
  Main->>Val: manifest, repo_url
  Val->>API: request installation validation
  API-->>Val: updated/confirmed manifest
  Val-->>Main: validated manifest
  Main->>FS: save_manifest(manifest)
  FS-->>Main: success/failure
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I thump my paws—new scripts arise,
A workflow hums beneath the skies.
Manifests bloom from API light,
Branches sprout and PRs take flight.
In tidy burrows, files now rest—
A rabbit nods: “Refactor, manifest!” 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch Jiarui/smart-registry-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@qodo-merge-pro
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 Security concerns

External API key handling:
The script and workflow depend on ANYON_API_KEY. Ensure the secret is scoped, not exposed in logs, and avoid printing full API errors that might leak response details. Also, the user-supplied repo_url is interpolated into prompts and filenames; validate it is a GitHub URL and sanitize to prevent malicious input influencing paths or PR metadata.

⚡ Recommended focus areas for review

Possible Issue

The API response parsing assumes an OpenAI-compatible shape and single content field; if the provider returns tool messages, array content parts, or different keys, data["choices"][0]["message"]["content"] may fail. Consider defensive checks and supporting content-as-list.

    data = response.json()
    content = data["choices"][0]["message"]["content"]

    return extract_json_from_content(content)

except requests.RequestException as e:
    print(f"API request failed: {e}")
    return None
except (KeyError, IndexError) as e:
    print(f"Unexpected API response format: {e}")
    return None
Robustness

extract_json_from_content only matches fenced blocks labeled json with exact backticks and newlines; responses with ```JSON, missing trailing newline, or extra prose will fail. Broaden regex and add fallback to strip code fences and attempt tolerant JSON parsing.

def extract_json_from_content(content: str) -> Optional[dict]:
    """Extract JSON from the API response content."""
    # Look for JSON code block
    json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
    if json_match:
        try:
            return json.loads(json_match.group(1))
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON: {e}")
            return None

    # Try to find JSON without code block markers
    try:
        return json.loads(content)
    except json.JSONDecodeError:
        print(f"Could not extract valid JSON from response: {content}")
        return None
Workflow Safety

The workflow commits directly to a branch with write permissions and runs on user-provided URLs without validation. Add basic URL validation/sanitization and consider restricting to GitHub repos to avoid abuse or path traversal in file naming.

generate-manifest:
  runs-on: ubuntu-latest
  permissions:
    contents: write
    pull-requests: write

  steps:
    - name: Checkout repository
      uses: actions/checkout@v4
      with:
        token: ${{ secrets.GITHUB_TOKEN }}

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install requests

    - name: Generate manifest
      env:
        ANYON_API_KEY: ${{ secrets.ANYON_API_KEY }}
      run: |
        python scripts/get_manifest.py "${{ github.event.inputs.repo_url }}"

    - name: Extract repo name for branch
      id: repo-info
      run: |
        REPO_URL="${{ github.event.inputs.repo_url }}"
        REPO_NAME=$(echo "$REPO_URL" | sed 's/.*github\.com[:/]//' | sed 's/\.git$//' | tr '/' '-')
        echo "repo_name=$REPO_NAME" >> $GITHUB_OUTPUT
        echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT

    - name: Create Pull Request

@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Aug 15, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Normalize API response content

Add defensive parsing for API responses that may return a list of content parts
or dicts, not just a plain string. Normalize content into a string before
passing to extract_json_from_content to avoid KeyError/TypeError when providers
return structured content.

scripts/get_manifest.py [85-88]

-url = "https://anyon.chatxiv.org/api/v1/openai/v1/chat/completions"
-headers = {
-    "Authorization": f"Bearer {api_key}",
-    "Content-Type": "application/json"
-}
+...
+data = response.json()
+content = data.get("choices", [{}])[0].get("message", {}).get("content")
+# Normalize content to string
+if isinstance(content, list):
+    # extract text fields and join
+    parts = []
+    for c in content:
+        if isinstance(c, dict) and c.get("type") == "text" and isinstance(c.get("text"), str):
+            parts.append(c["text"])
+        elif isinstance(c, str):
+            parts.append(c)
+    content = "\n".join(parts)
+elif isinstance(content, dict):
+    # some providers wrap content differently
+    content = content.get("text") or ""
+if not isinstance(content, str):
+    raise ValueError(f"Unexpected API content format: {type(content)}")
+return extract_json_from_content(content)
 
-payload = {
-    "model": "x",
-    "messages": [
-        {
-            "role": "user",
-            "content": [
-                {
-                    "type": "text",
-                    "text": f"help me generate manifest json for this repo: {repo_url}"
-                }
-            ]
-        }
-    ]
-}
-
-try:
-    print(f"Generating manifest for {repo_url}...")
-    response = requests.post(url, headers=headers, json=payload)
-    response.raise_for_status()
-    
-    data = response.json()
-    content = data["choices"][0]["message"]["content"]
-    
-    return extract_json_from_content(content)
-

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that data["choices"][0]["message"]["content"] is brittle and could fail. It proposes a more robust way to parse the API response, handling different possible content structures and making the script more resilient to API variations.

Medium
Validate installations type before applying

Guard against non-dict manifest["installations"] and ensure validated
installations are a dict before assignment. This prevents type mismatches later
when saving or consuming the manifest.

scripts/get_manifest.py [151-170]

 current_installations = manifest.get("installations", {})
+if not isinstance(current_installations, (dict, list)):
+    current_installations = {}
+...
+data = response.json()
+content = data.get("choices", [{}])[0].get("message", {}).get("content")
+# normalize content as in generate_manifest
+if isinstance(content, list):
+    parts = []
+    for c in content:
+        if isinstance(c, dict) and c.get("type") == "text" and isinstance(c.get("text"), str):
+            parts.append(c["text"])
+        elif isinstance(c, str):
+            parts.append(c)
+    content = "\n".join(parts)
+elif isinstance(content, dict):
+    content = content.get("text") or ""
+validated_data = extract_json_from_content(content or "")
+if isinstance(validated_data, dict) and isinstance(validated_data.get("installations"), dict):
+    print("✓ Installations validated and corrected")
+    manifest["installations"] = validated_data["installations"]
+else:
+    print("⚠ Validation failed or wrong format, keeping original installations")
+return manifest
 
-payload = {
-    "model": "x",
-    "messages": [
-        {
-            "role": "user",
-            "content": [
-                {
-                    "type": "text",
-                    "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
-
-Repository: {repo_url}
-
-Current manifest installations:
-{json.dumps(current_installations, indent=2)}
-
-...
-7. The response should be in this exact format: {{"installations": {{...}}}}
-
-Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones."""
-                }
-            ]
-        }
-    ]
-}
-
-try:
-    print("Validating installations against README...")
-    response = requests.post(url, headers=headers, json=payload)
-    response.raise_for_status()
-    
-    data = response.json()
-    content = data["choices"][0]["message"]["content"]
-    
-    validated_data = extract_json_from_content(content)
-    if validated_data and "installations" in validated_data:
-        print("✓ Installations validated and corrected")
-        manifest["installations"] = validated_data["installations"]
-        return manifest
-    else:
-        print("⚠ Validation failed, keeping original installations")
-        return manifest
-

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the type of validated_data and validated_data["installations"] should be checked before assignment to prevent potential TypeError exceptions, improving the script's robustness.

Low
General
Harden JSON extraction heuristics

Make JSON extraction more resilient by trimming surrounding whitespace and
allowing for code fences with optional language and varying newlines. This
avoids false negatives when the API formats code blocks differently.

scripts/get_manifest.py [21-34]

-json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
+json_match = re.search(r'```(?:json)?\s*(.*?)\s*```', content, re.DOTALL | re.IGNORECASE)
 if json_match:
+    block = json_match.group(1).strip()
     try:
-        return json.loads(json_match.group(1))
+        return json.loads(block)
     except json.JSONDecodeError as e:
-        print(f"Error parsing JSON: {e}")
-        return None
-
-# Try to find JSON without code block markers
+        print(f"Error parsing JSON from fenced block: {e}")
+        # fall through to try raw content
+# Try to find JSON object/array within the text
+content_stripped = content.strip()
+# Heuristic: find first '{' or '[' and last matching bracket
+start = min((i for i in [content_stripped.find("{"), content_stripped.find("[")] if i != -1), default=-1)
+if start != -1:
+    candidate = content_stripped[start:]
+    try:
+        return json.loads(candidate)
+    except json.JSONDecodeError:
+        pass
 try:
-    return json.loads(content)
+    return json.loads(content_stripped)
 except json.JSONDecodeError:
-    print(f"Could not extract valid JSON from response: {content}")
+    print("Could not extract valid JSON from response.")
     return None
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion improves the extract_json_from_content function by making the regex for code blocks more flexible and adding better heuristics for finding JSON in raw text, which increases the likelihood of successfully parsing the API's response.

Medium
Organization
best practice
Migrate argparse to Click with help option

Replace argparse with Click to align with our CLI standards and provide
consistent help behavior. Add @click.help_option("-h", "--help") and enhance the
docstring with a brief example using a backslash-escaped block.

scripts/get_manifest.py [196-223]

-def main():
-    parser = argparse.ArgumentParser(description="Generate MCP manifest JSON from repository URL")
-    parser.add_argument("repo_url", help="Repository URL to generate manifest for")
+import click
+
+@click.command()
+@click.help_option("-h", "--help")
+@click.argument("repo_url")
+def main(repo_url: str):
+    """Generate MCP manifest JSON from repository URL.
     
-    args = parser.parse_args()
+    Example:
     
+    \b
+        scripts/get_manifest.py https://github.com/owner/repo
+    """
     # Step 1: Generate initial manifest
     print("Step 1: Generating initial manifest...")
+    manifest = generate_manifest(repo_url)
+    if not manifest:
+        print("Failed to generate manifest")
+        sys.exit(1)
+    
+    # Step 2: Validate and correct installations
+    print("Step 2: Validating installations against README...")
+    manifest = validate_installations(manifest, repo_url)
+    
+    # Step 3: Save manifest
+    print("Step 3: Saving manifest...")
+    if not save_manifest(manifest, repo_url):
+        print("Failed to save manifest")
+        sys.exit(1)
+    
+    print("✓ Manifest generation completed successfully!")
 
+if __name__ == "__main__":
+    main()
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - When implementing command-line interfaces with Click, use consistent help option patterns and provide clear, structured help text with examples. Include both short (-h) and long (--help) options, and format examples using backslash-escaped blocks for proper display.

Low
Security
Sanitize output filename safely

Sanitize the derived filename to remove characters that are invalid or risky on
filesystems. This prevents path traversal or write failures when unusual repo
names or inputs are provided.

scripts/get_manifest.py [37-49]

 def get_repo_name_from_url(repo_url: str) -> str:
-    """Extract repository name from URL for filename."""
+    """Extract a safe repository name from URL for filename."""
     # Remove .git suffix if present
-    if repo_url.endswith('.git'):
+    if repo_url.endswith(".git"):
         repo_url = repo_url[:-4]
-    
     # Extract owner/repo from URL
     match = re.search(r'github\.com[:/]([^/]+/[^/]+)', repo_url)
     if match:
-        return match.group(1).replace('/', '-')
-    
-    # Fallback to last part of URL
-    return repo_url.split('/')[-1]
+        candidate = match.group(1).replace("/", "-")
+    else:
+        candidate = repo_url.split("/")[-1]
+    # Sanitize filename: allow alphanum, dash, underscore, dot; replace others with '-'
+    safe = re.sub(r"[^A-Za-z0-9._-]", "-", candidate)
+    # Collapse repeated dashes and trim
+    safe = re.sub(r"-{2,}", "-", safe).strip("-")
+    return safe or "manifest"
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion addresses a critical path traversal vulnerability by properly sanitizing the filename derived from the user-provided repo_url, preventing malicious file writes.

High
Prevent sensitive content leakage

Avoid printing full API content on JSON parse failures as it can leak secrets
and blow up logs. Log a concise error and return None, optionally truncating
content. This prevents accidental exposure of repository data or tokens embedded
in responses.

scripts/get_manifest.py [18-34]

 def extract_json_from_content(content: str) -> Optional[dict]:
     """Extract JSON from the API response content."""
     # Look for JSON code block
-    json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
+    json_match = re.search(r'```json\s*(.*?)\s*```', content, re.DOTALL)
     if json_match:
         try:
             return json.loads(json_match.group(1))
         except json.JSONDecodeError as e:
-            print(f"Error parsing JSON: {e}")
+            print(f"Error parsing JSON from fenced block: {e}")
             return None
-      
     # Try to find JSON without code block markers
     try:
         return json.loads(content)
-    except json.JSONDecodeError:
-        print(f"Could not extract valid JSON from response: {content}")
+    except json.JSONDecodeError as e:
+        preview = (content[:300] + "...") if isinstance(content, str) and len(content) > 300 else content
+        print(f"Could not extract valid JSON from response. Error: {e}. Preview: {preview}")
         return None
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential information disclosure vulnerability by preventing the full API response from being logged, which is a good security practice.

Medium
Organization
best practice
Send status logs to stderr

Route informational/status messages to stderr so stdout remains clean for data
(e.g., the manifest JSON) if needed. Use sys.stderr.write or logging for
user-facing progress updates.

scripts/get_manifest.py [81-219]

-print(f"Generating manifest for {repo_url}...")
+sys.stderr.write(f"Generating manifest for {repo_url}...\n")
 ...
-print("Validating installations against README...")
+sys.stderr.write("Validating installations against README...\n")
 ...
-print("✓ Installations validated and corrected")
+sys.stderr.write("✓ Installations validated and corrected\n")
 ...
-print("⚠ Validation failed, keeping original installations")
+sys.stderr.write("⚠ Validation failed, keeping original installations\n")
 ...
-print("Step 1: Generating initial manifest...")
+sys.stderr.write("Step 1: Generating initial manifest...\n")
 ...
-print("Step 2: Validating installations against README...")
+sys.stderr.write("Step 2: Validating installations against README...\n")
 ...
-print("Step 3: Saving manifest...")
+sys.stderr.write("Step 3: Saving manifest...\n")
 ...
-print("✓ Manifest generation completed successfully!")
+sys.stderr.write("✓ Manifest generation completed successfully!\n")

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Prefer stderr when printing application status/log messages from subprocess-like interactions or API-driven workflows, reserving stdout for data output.

Low
General
Pin action to stable major

Pin the action to a specific major and minor digest-compatible version to avoid
breaking changes from upstream updates. Use v5 for setup-python which is the
latest supported major, ensuring a stable CI environment.

.github/workflows/generate-manifest.yml [24-27]

 - name: Set up Python
-  uses: actions/setup-python@v4
+  uses: actions/setup-python@v5
   with:
     python-version: '3.11'
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly recommends pinning the GitHub Action to a major version (v5) for improved stability and to avoid unexpected breaking changes from the v4 tag.

Low
  • More

@github-actions
Copy link
Contributor

Summary

  • Introduces .github/workflows/generate-manifest.yml to let users trigger manifest generation via workflow_dispatch.
  • Replaces the large, dependency-heavy scripts/get_manifest.py with a lightweight CLI that calls the chatxiv API, validates the installations field, and saves the manifest to mcp-registry/servers/.

Review
Nice improvement—automation is clearer and the script has far fewer external deps. A few quick thoughts:

  • scripts/get_manifest.py
    • Consider adding simple retry/back-off around the requests.post calls to handle transient API/network errors.
    • extract_json_from_content assumes triple-back-tick JSON blocks; fall back patterns are helpful, but a stricter JSON schema validation step would reduce bad PRs.
    • model: "x" is a placeholder—surfacing it as an arg/env var avoids future hard-coding.
  • .github/workflows/generate-manifest.yml
    • The job installs only requests; if the script later grows (e.g., adds jsonschema) remember to update this list.
    • The action deletes the branch automatically; good practice, but confirm your repo settings allow force-pushes to temporary branches.

Overall, the PR is solid and heads in the right direction.


View workflow run

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
scripts/get_manifest.py (5)

80-96: Add HTTP timeout to avoid hanging on network calls

Requests without timeouts can hang indefinitely on network issues.

Apply this diff:

-        response = requests.post(url, headers=headers, json=payload)
+        response = requests.post(url, headers=headers, json=payload, timeout=60)

116-150: Standardize validator request to string content for better API compatibility

Mirror the earlier change: use a plain string for message content and read model from ANYON_MODEL.

Apply this diff:

-    payload = {
-        "model": "x",
-        "messages": [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
+    model = os.getenv("ANYON_MODEL")
+    if not model:
+        print("Error: ANYON_MODEL environment variable not set, skipping validation")
+        return manifest
+
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
 
 Repository: {repo_url}
 
 Current manifest installations:
 {json.dumps(current_installations, indent=2)}
 
 IMPORTANT INSTRUCTIONS:
 1. Access the README.md from the repository URL: {repo_url}
 2. Compare the current installations against the exact commands and configurations shown in the README.md
 3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README.
 4. Pay special attention to:
    - Exact command names (npx, uvx, docker, python, etc.)
    - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]"
    - Proper environment variable names and formats
    - Installation type matching the command used
 5. Fix any discrepancies between the manifest and the README
 6. Return ONLY a valid JSON object with the corrected installations field
 7. The response should be in this exact format: {{"installations": {{...}}}}
 
 Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones."""
-                    }
-                ]
-            }
-        ]
-    }
+            }
+        ]
+    }

151-171: Add timeout and narrow exception handling during validation call

Better to set a timeout and catch request-specific failures without masking other errors.

Apply this diff:

-        response = requests.post(url, headers=headers, json=payload)
+        response = requests.post(url, headers=headers, json=payload, timeout=60)
         response.raise_for_status()
@@
-    except Exception as e:
-        print(f"Error validating installations: {e}")
-        return manifest
+    except requests.RequestException as e:
+        print(f"Error validating installations (network): {e}")
+        return manifest
+    except (KeyError, TypeError, ValueError) as e:
+        print(f"Error validating installations (response parsing): {e}")
+        return manifest

175-183: Anchor output path to repo root to avoid CWD surprises

Writing relative to CWD may deposit files in unexpected locations if the script is run from another directory. Resolve path relative to the repository root (parent of scripts/).

Apply this diff:

-    # Create directory if it doesn't exist
-    servers_dir = Path("mcp-registry/servers")
+    # Create directory if it doesn't exist (relative to repo root)
+    script_dir = Path(__file__).parent
+    repo_root = script_dir.parent
+    servers_dir = repo_root / "mcp-registry" / "servers"

196-223: Run ruff formatting and optionally add unit tests for parsing helpers

  • Please run ruff (per guidelines) to ensure consistent formatting.
  • Consider unit tests for extract_json_from_content and get_repo_name_from_url (edge cases: SSH URLs, CRLF code fences, multi-part content).

I can generate targeted tests for these helpers if you want.

.github/workflows/generate-manifest.yml (2)

29-33: Install jsonschema to enable schema validation step (if added)

If we add a schema validation step, ensure the dependency is present.

Apply this diff:

       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          pip install requests
+          pip install requests jsonschema

34-47: Add a manifest schema validation step to catch issues before opening a PR

Leverage scripts/validate_manifest.py to fail fast on invalid JSON/schema.

Apply this diff to insert a validation step after generation:

       - name: Generate manifest
         env:
           ANYON_API_KEY: ${{ secrets.ANYON_API_KEY }}
         run: |
           python scripts/get_manifest.py "${{ github.event.inputs.repo_url }}"
 
+      - name: Validate manifest schema
+        run: |
+          python scripts/validate_manifest.py
+
       - name: Extract repo name for branch
         id: repo-info
         run: |
           REPO_URL="${{ github.event.inputs.repo_url }}"
           REPO_NAME=$(echo "$REPO_URL" | sed 's/.*github\.com[:/]//' | sed 's/\.git$//' | tr '/' '-')
           echo "repo_name=$REPO_NAME" >> $GITHUB_OUTPUT
           echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c237095 and 55fa64b.

📒 Files selected for processing (2)
  • .github/workflows/generate-manifest.yml (1 hunks)
  • scripts/get_manifest.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

Always format Python code with ruff.

Files:

  • scripts/get_manifest.py
🧬 Code Graph Analysis (1)
scripts/get_manifest.py (2)
scripts/validate_manifest.py (1)
  • main (64-97)
scripts/categorization.py (1)
  • main (221-224)
🪛 actionlint (1.7.7)
.github/workflows/generate-manifest.yml

25-25: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


49-49: the runner of "peter-evans/create-pull-request@v5" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)
.github/workflows/generate-manifest.yml

[error] 54-54: trailing spaces

(trailing-spaces)


[error] 56-56: trailing spaces

(trailing-spaces)


[error] 58-58: trailing spaces

(trailing-spaces)


[error] 63-63: trailing spaces

(trailing-spaces)


[error] 65-65: trailing spaces

(trailing-spaces)


[error] 67-67: trailing spaces

(trailing-spaces)


[error] 70-70: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)


[error] 76-76: trailing spaces

(trailing-spaces)


[error] 78-78: trailing spaces

(trailing-spaces)

🔇 Additional comments (2)
scripts/get_manifest.py (1)

65-78: Parameterize model and send string content; confirm ANYON model & API shape

The file scripts/get_manifest.py currently hard-codes model "x" and sends message content as an array of {type,text} objects. That will likely 400 or be parsed incorrectly. Parameterize the model (ANYON_MODEL) and send plain string content. Apply the same change in both places that build payloads.

  • Locations to change:
    • scripts/get_manifest.py — generate_manifest(...) payload
    • scripts/get_manifest.py — validate_installations(...) payload

Apply this diff (update both payloads):

-    payload = {
-        "model": "x",
-        "messages": [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": f"help me generate manifest json for this repo: {repo_url}"
-                    }
-                ]
-            }
-        ]
-    }
+    model = os.getenv("ANYON_MODEL")
+    if not model:
+        print("Error: ANYON_MODEL environment variable not set")
+        return None
+
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": f"Help me generate a valid MCP manifest JSON for this repo: {repo_url}. Return only the JSON object."
+            }
+        ]
+    }

And for validate_installations (replace its similar payload):

-    payload = {
-        "model": "x",
-        "messages": [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
-
-Repository: {repo_url}
-
-Current manifest installations:
-{json.dumps(current_installations, indent=2)}
-
-IMPORTANT INSTRUCTIONS:
-1. Access the README.md from the repository URL: {repo_url}
-2. Compare the current installations against the exact commands and configurations shown in the README.md
-3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README.
-4. Pay special attention to:
-   - Exact command names (npx, uvx, docker, python, etc.)
-   - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]"
-   - Proper environment variable names and formats
-   - Installation type matching the command used
-5. Fix any discrepancies between the manifest and the README
-6. Return ONLY a valid JSON object with the corrected installations field
-7. The response should be in this exact format: {{"installations": {{...}}}}
-
-Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones."""
-                    }
-                ]
-            }
-        ]
-    }
+    model = os.getenv("ANYON_MODEL")
+    if not model:
+        print("Error: ANYON_MODEL environment variable not set, skipping validation")
+        return manifest
+
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": f\"\"\"Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
+
+Repository: {repo_url}
+
+Current manifest installations:
+{json.dumps(current_installations, indent=2)}
+
+IMPORTANT INSTRUCTIONS:
+1. Access the README.md from the repository URL: {repo_url}
+2. Compare the current installations against the exact commands and configurations shown in the README.md
+3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README.
+4. Pay special attention to:
+   - Exact command names (npx, uvx, docker, python, etc.)
+   - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]"
+   - Proper environment variable names and formats
+   - Installation type matching the command used
+5. Fix any discrepancies between the manifest and the README
+6. Return ONLY a valid JSON object with the corrected installations field
+7. The response should be in this exact format: {{"installations": {{...}}}}
+
+Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones.\"\"\"
+            }
+        ]
+    }

Note: I could not find public API docs for anyon.chatxiv.org; the path used in the script (/api/v1/openai/v1/chat/completions) looks like an OpenAI-compatible chat/completions endpoint, but I could not confirm supported model IDs or any deviations from the OpenAI payload schema. Please confirm the exact model identifier(s) the host accepts (e.g., gpt-4o, gpt-4o-mini, etc.) and whether the endpoint expects standard OpenAI Chat Completions payloads.

.github/workflows/generate-manifest.yml (1)

36-37: Ensure ANYON_API_KEY secret is configured

The workflow depends on ANYON_API_KEY being available in repository secrets or org-level secrets.

If not already set, add ANYON_API_KEY under Settings → Secrets and variables → Actions.

Comment on lines +25 to +27
uses: actions/setup-python@v4
with:
python-version: '3.11'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update setup-python action to v5 (Node16 deprecation fix)

actionlint flagged v4 as too old; v5 is required on modern runners.

Apply this diff:

-      - name: Set up Python
-        uses: actions/setup-python@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
         with:
           python-version: '3.11'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
🧰 Tools
🪛 actionlint (1.7.7)

25-25: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In .github/workflows/generate-manifest.yml around lines 25 to 27, update the
setup-python GitHub Action version from v4 to v5 to address actionlint/runner
deprecation; change the uses line to use actions/setup-python@v5 and keep the
python-version input unchanged so the workflow runs on the modern runner without
Node16-related warnings.

echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT
- name: Create Pull Request
uses: peter-evans/create-pull-request@v5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bump create-pull-request action to v6 (Node16 deprecation fix)

actionlint flagged v5 as too old; v6 uses the supported runtime.

Apply this diff:

-      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v5
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v6
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uses: peter-evans/create-pull-request@v5
- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
🧰 Tools
🪛 actionlint (1.7.7)

49-49: the runner of "peter-evans/create-pull-request@v5" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
In .github/workflows/generate-manifest.yml at line 49, the workflow uses
peter-evans/create-pull-request@v5 which is flagged as outdated; update the
action reference to peter-evans/create-pull-request@v6 to use the supported
Node16 runtime and resolve the actionlint warning, then commit the change to the
workflow file.

Comment on lines +52 to +60
commit-message: |
feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}
Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove trailing spaces in commit-message block (yamllint errors)

YAMLlint reports trailing spaces. Remove them to unblock CI.

Apply this diff:

           commit-message: |
             feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}
-            
+
             Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}
-            
+
             🤖 Generated with [Claude Code](https://claude.ai/code)
-            
+
             Co-Authored-By: Claude <[email protected]>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
commit-message: |
feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}
Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}"
commit-message: |
feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}
Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}"
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 54-54: trailing spaces

(trailing-spaces)


[error] 56-56: trailing spaces

(trailing-spaces)


[error] 58-58: trailing spaces

(trailing-spaces)

🤖 Prompt for AI Agents
.github/workflows/generate-manifest.yml around lines 52 to 60: the
commit-message block contains trailing spaces at the ends of lines causing
yamllint errors; remove all trailing whitespace characters from each line in
that multi-line scalar (including the blank line after the second line and any
spaces at the end of the emoji/Co-Authored lines), save the file, and rerun the
linter/CI to confirm the YAML no longer reports trailing-space issues.

Comment on lines +61 to +80
body: |
## Summary
This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}
## Changes
- Added new manifest JSON file in `mcp-registry/servers/`
- Manifest was automatically generated using the chatxiv.org API
## Test plan
- [ ] Verify the generated JSON is valid
- [ ] Check that all required fields are present
- [ ] Validate installation instructions work correctly
---
🤖 Generated with [Claude Code](https://claude.ai/code)
branch: ${{ steps.repo-info.outputs.branch_name }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove trailing spaces in PR body (yamllint errors)

Clean up blank lines with trailing whitespace.

Apply this diff:

           body: |
             ## Summary
-            
+
             This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}
-            
+
             ## Changes
-            
+
             - Added new manifest JSON file in `mcp-registry/servers/`
             - Manifest was automatically generated using the chatxiv.org API
-            
+
             ## Test plan
-            
+
             - [ ] Verify the generated JSON is valid
             - [ ] Check that all required fields are present
             - [ ] Validate installation instructions work correctly
-            
+
             ---
-            
+
             🤖 Generated with [Claude Code](https://claude.ai/code)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
body: |
## Summary
This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}
## Changes
- Added new manifest JSON file in `mcp-registry/servers/`
- Manifest was automatically generated using the chatxiv.org API
## Test plan
- [ ] Verify the generated JSON is valid
- [ ] Check that all required fields are present
- [ ] Validate installation instructions work correctly
---
🤖 Generated with [Claude Code](https://claude.ai/code)
branch: ${{ steps.repo-info.outputs.branch_name }}
body: |
## Summary
This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}
## Changes
- Added new manifest JSON file in `mcp-registry/servers/`
- Manifest was automatically generated using the chatxiv.org API
## Test plan
- [ ] Verify the generated JSON is valid
- [ ] Check that all required fields are present
- [ ] Validate installation instructions work correctly
---
🤖 Generated with [Claude Code](https://claude.ai/code)
branch: ${{ steps.repo-info.outputs.branch_name }}
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 63-63: trailing spaces

(trailing-spaces)


[error] 65-65: trailing spaces

(trailing-spaces)


[error] 67-67: trailing spaces

(trailing-spaces)


[error] 70-70: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)


[error] 76-76: trailing spaces

(trailing-spaces)


[error] 78-78: trailing spaces

(trailing-spaces)

🤖 Prompt for AI Agents
.github/workflows/generate-manifest.yml around lines 61 to 80: the PR body
contains blank lines and lines with trailing whitespace causing yamllint errors;
remove any trailing spaces at the ends of lines and clean up unnecessary blank
lines in the multiline body block (preserve content and indentation, but ensure
no lines end with spaces and there are no extra blank lines).

Comment on lines +18 to +34
def extract_json_from_content(content: str) -> Optional[dict]:
"""Extract JSON from the API response content."""
# Look for JSON code block
json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
if json_match:
try:
raw_url = self._convert_to_raw_url(repo_url)
response = requests.get(raw_url)

if response.status_code != 200 and "main" in raw_url:
logger.warning(
f"Failed to fetch README.md from {repo_url} with {raw_url}. Status code: {response.status_code}"
)
raw_url = raw_url.replace("/main/", "/master/")
response = requests.get(raw_url)

if response.status_code != 200:
raise ValueError(
f"Failed to fetch README.md from {repo_url} with {raw_url}. Status code: {response.status_code}"
)

return response.text
except Exception as e:
logger.error(f"Error fetching README from {repo_url}: {e}")
return ""

def _convert_to_raw_url(self, repo_url: str) -> str:
"""Convert GitHub URL to raw content URL for README.md."""
if "github.com" not in repo_url:
raise ValueError(f"Invalid GitHub URL: {repo_url}")

# Handle subdirectory URLs (tree format)
if "/tree/" in repo_url:
# For URLs like github.com/user/repo/tree/branch/path/to/dir
parts = repo_url.split("/tree/")
base_url = parts[0].replace("github.com", "raw.githubusercontent.com")
path_parts = parts[1].split("/", 1)

if len(path_parts) > 1:
branch = path_parts[0]
subdir = path_parts[1]
return f"{base_url}/{branch}/{subdir}/README.md"
else:
branch = path_parts[0]
return f"{base_url}/{branch}/README.md"

# Handle direct file URLs
if "/blob/" in repo_url:
raw_url = repo_url.replace("/blob/", "/raw/")
if raw_url.endswith(".md"):
return raw_url
else:
return f"{raw_url}/README.md"

# Handle repository root URLs
raw_url = repo_url.replace("github.com", "raw.githubusercontent.com")
return f"{raw_url.rstrip('/')}/main/README.md"

@staticmethod
async def categorize_servers_with_llms(name, description) -> str:
"""Categorize a server based on name and description.
Args:
name: Server name
description: Server description
Returns:
Category string
"""
agent = CategorizationAgent()

result = await agent.execute(server_name=name, server_description=description, include_examples=True)

return result["category"]

def extract_with_llms(self, repo_url: str, readme_content: str) -> Dict:
"""Extract manifest information using OpenAI with OpenRouter.
Args:
repo_url: GitHub repository URL
readme_content: Content of the README file
Returns:
Dictionary containing the extracted manifest information
"""
# Initialize the complete manifest dictionary
complete_manifest = {}

# Step 1: Extract basic information (display_name, license, tags)
basic_info = self._extract_basic_info(repo_url, readme_content)
complete_manifest.update(basic_info)

# Step 2: Extract arguments
arguments = self._extract_arguments(repo_url, readme_content)
if arguments:
complete_manifest["arguments"] = arguments

# Step 3: Extract installations
installations = self._extract_installations(repo_url, readme_content)
if installations:
# post process
arguments = complete_manifest.get("arguments", {})
if arguments:
for install_type, installation in installations.items():
new_installation, replacement = validate_arguments_in_installation(installation, arguments)
if replacement:
installations[install_type] = new_installation
complete_manifest["installations"] = installations

# Step 4: Extract examples
examples = self._extract_examples(repo_url, readme_content)
if examples:
complete_manifest["examples"] = examples

return complete_manifest

def _call_llm(self, repo_url: str, readme_content: str, schema: Dict, prompt: str) -> Dict:
"""Generic helper method to call LLM with common retry pattern.
Args:
repo_url: GitHub repository URL
readme_content: README content
schema: JSON schema for the function call
prompt: User prompt for extraction
system_prompt: System prompt for extraction
Returns:
Extracted information or default value if failed
"""
system_prompt = "You are a helpful assistant that extracts information from a GitHub repository about a server."

max_retries = 3
retry_count = 0

# Extract required fields from schema if available
required_fields = schema.get("parameters", {}).get("required", [])

while retry_count < max_retries:
try:
completion = self.client.chat.completions.create(
extra_headers={"HTTP-Referer": os.environ.get("SITE_URL", "https://mcpm.sh"), "X-Title": "MCPM"},
model="anthropic/claude-3.7-sonnet",
messages=[
{"role": "system", "content": system_prompt},
{
"role": "user",
"content": f"GitHub URL: {repo_url}\n\nREADME Content:\n{readme_content}\n\n{prompt}",
},
],
tools=[{"type": "function", "function": schema}],
temperature=0,
tool_choice="required",
)

if not completion.choices or not completion.choices[0].message.tool_calls:
logger.warning(f"Retry {retry_count + 1}/{max_retries}: No tool calls in response")
retry_count += 1
continue

tool_call = completion.choices[0].message.tool_calls[0]
result = json.loads(tool_call.function.arguments)

# Validate required fields if specified
if required_fields:
missing_fields = [field for field in required_fields if field not in result]
if missing_fields:
logger.warning(f"Retry {retry_count + 1}/{max_retries}: Missing fields: {missing_fields}")
retry_count += 1
continue

return result

except Exception as e:
logger.error(f"Error extracting data with LLM (try {retry_count + 1}/{max_retries}): {e}")
retry_count += 1

logger.error(f"All {max_retries} attempts to extract data failed")

return {field: None for field in required_fields}

def _extract_basic_info(self, repo_url: str, readme_content: str) -> Dict:
"""Extract basic information (display_name, license, tags) using LLM."""
schema = {
"name": "extract_basic_info",
"description": "Extract basic manifest information",
"parameters": {
"type": "object",
"required": ["display_name", "tags"],
"properties": {
"display_name": {"type": "string", "description": "Human-readable server name"},
"license": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}},
},
"additionalProperties": False,
},
}

return self._call_llm(
repo_url=repo_url,
readme_content=readme_content,
schema=schema,
prompt=(
"Extract the display_name, license, and tags from the README file. "
"The display_name should be a human-readable server name close to the name of the repository. "
"The tags should be a list of tags that describe the server."
),
)

def _extract_arguments(self, repo_url: str, readme_content: str) -> Dict:
"""Extract arguments information using LLM."""
schema = {
"name": "extract_arguments",
"description": "Extract arguments information",
"required": ["arguments"],
"parameters": {
"type": "object",
"properties": {
"arguments": {
"type": "array",
"description": "An array of configuration arguments required by the server",
"items": {
"type": "object",
"required": ["key", "description"],
"properties": {
"key": {"type": "string", "description": "The name of the argument"},
"description": {"type": "string", "description": "Description of the argument"},
"required": {"type": "boolean", "description": "Whether this argument is required"},
"example": {"type": "string", "description": "Example value"},
},
},
}
},
},
}

result = self._call_llm(
repo_url=repo_url,
readme_content=readme_content,
schema=schema,
prompt=(
"""Extract the configuration arguments required by this server from the README file.
The arguments should be a list of arguments that are required when running the server.
It can often be found in the usage section of the README file.
<Example>
<README> Docker
{
"mcpServers": {
"brave-search": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"BRAVE_API_KEY",
"mcp/brave-search"
],
"env": {
"BRAVE_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}
NPX
{
"mcpServers": {
"brave-search": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-brave-search"
],
"env": {
"BRAVE_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}
<README/>
From the example README, you should get:
{
"arguments": [
{
"key": "BRAVE_API_KEY",
"description": "The API key for the Brave Search server",
"required": true,
"example": "YOUR_API_KEY_HERE"
return json.loads(json_match.group(1))
except json.JSONDecodeError as e:
print(f"Error parsing JSON: {e}")
return None

# Try to find JSON without code block markers
try:
return json.loads(content)
except json.JSONDecodeError:
print(f"Could not extract valid JSON from response: {content}")
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Harden JSON extraction: support list-based content, CRLF fences, and safer logging

APIs often return message content as an array of parts; passing a list to re.search will crash. The regex is also strict on LF only. This patch supports array content, tolerates CRLF/code-fence variants, and avoids dumping the full response on failures.

Apply this diff:

-def extract_json_from_content(content: str) -> Optional[dict]:
-    """Extract JSON from the API response content."""
-    # Look for JSON code block
-    json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
-    if json_match:
-        try:
-            return json.loads(json_match.group(1))
-        except json.JSONDecodeError as e:
-            print(f"Error parsing JSON: {e}")
-            return None
-    
-    # Try to find JSON without code block markers
-    try:
-        return json.loads(content)
-    except json.JSONDecodeError:
-        print(f"Could not extract valid JSON from response: {content}")
-        return None
+def extract_json_from_content(content: str) -> Optional[dict]:
+    """Extract JSON from the API response content."""
+    # Coerce array-of-content-parts into a single string (Anthropic/OpenAI multi-part)
+    if not isinstance(content, str):
+        try:
+            content = "".join(
+                part.get("text", "") if isinstance(part, dict) else str(part)
+                for part in content
+            )
+        except Exception:
+            print("Unexpected content type in API response; cannot parse JSON.")
+            return None
+
+    # Look for JSON code block (tolerate CRLF and language hints)
+    json_match = re.search(r'```json[^\n]*\r?\n(.*?)\r?\n```', content, re.DOTALL | re.IGNORECASE)
+    if not json_match:
+        json_match = re.search(r'```\s*[\w-]*\s*\r?\n(.*?)\r?\n```', content, re.DOTALL)
+    if json_match:
+        try:
+            return json.loads(json_match.group(1))
+        except json.JSONDecodeError as e:
+            print(f"Error parsing JSON from fenced block: {e}")
+            # fall through to raw parsing
+
+    # Try to parse raw JSON
+    try:
+        return json.loads(content)
+    except json.JSONDecodeError:
+        preview = content[:500].replace("\n", " ")
+        print(f"Could not extract valid JSON from response. Preview: {preview}...")
+        return None

@GabrielDrapor GabrielDrapor merged commit ec67763 into main Aug 15, 2025
9 of 11 checks passed
@GabrielDrapor GabrielDrapor deleted the Jiarui/smart-registry-workflow branch August 15, 2025 06:39
mcpm-semantic-release bot pushed a commit that referenced this pull request Aug 15, 2025
# [2.7.0](v2.6.1...v2.7.0) (2025-08-15)

### Features

* add script and workflow for contributing registry ([#233](#233)) ([ec67763](ec67763))
@mcpm-semantic-release
Copy link

🎉 This PR is included in version 2.7.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants