feat: add script and workflow for contributing registry #233

GabrielDrapor · 2025-08-15T06:27:22Z

PR Type

Enhancement

Description

Replace complex LLM-based manifest generation with API-based approach
Add GitHub workflow for automated manifest generation via PR
Simplify script to use chatxiv.org API instead of local processing
Add validation step to correct installations against README

Diagram Walkthrough

flowchart LR
  A["Repository URL"] --> B["API Call"]
  B --> C["Generate Manifest"]
  C --> D["Validate Installations"]
  D --> E["Save to Registry"]
  F["GitHub Workflow"] --> G["Create PR"]
  E --> G

File Walkthrough

Relevant files

Enhancement

get_manifest.py `Complete rewrite using API-based approach` scripts/get_manifest.py Replaced 900+ line complex LLM-based generator with 200+ line API client Added `extract_json_from_content()` for parsing API responses Implemented `validate_installations()` to verify against README Simplified manifest generation using chatxiv.org API	+202/-879
generate-manifest.yml `New GitHub workflow for manifest generation` .github/workflows/generate-manifest.yml Added workflow for automated manifest generation Includes manual trigger with repository URL input Creates PR with generated manifest automatically Sets up Python environment and API authentication	+81/-0

Summary by CodeRabbit

New Features
- Added a manual GitHub Actions workflow to generate an MCP manifest for a given repository and automatically open a pull request with the result.
Refactor
- Streamlined the manifest generation tool to use an external API, reducing complexity and dependencies.
- Simplified command-line usage and output handling for faster, more reliable manifest creation.
Chores
- Standardized environment variable usage for API access.
- Improved status messaging during manifest generation and validation.

coderabbitai · 2025-08-15T06:27:27Z

Walkthrough

Introduces a GitHub Actions workflow to generate an MCP manifest for a given repository and open a PR. Replaces the previous LLM-heavy manifest script with a simplified, external API-driven implementation featuring JSON parsing, repository name derivation, API calls for generation/validation, and file output.

Changes

Cohort / File(s)	Summary
CI workflow for manifest PRs `.github/workflows/generate-manifest.yml`	Adds a manually triggered workflow (workflow_dispatch) taking repo_url, setting up Python 3.11, running scripts/get_manifest.py with ANYON_API_KEY, deriving repo/branch names, and creating a PR via peter-evans/create-pull-request@v5.
Manifest generator refactor `scripts/get_manifest.py`	Replaces class-based LLM pipeline with functions: extract_json_from_content, get_repo_name_from_url, generate_manifest (ANYON API), validate_installations (API recheck), save_manifest (writes to mcp-registry/servers), and a CLI main(). Removes dotenv/logging/LLM logic.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant GitHub Actions as Workflow
  participant Repo as Target Repo
  participant Script as get_manifest.py
  participant ANYON as External API
  participant PR as create-pull-request

  User->>Workflow: workflow_dispatch (repo_url)
  Workflow->>Repo: actions/checkout
  Workflow->>Script: python scripts/get_manifest.py --repo_url
  Script->>ANYON: generate_manifest(repo_url)
  ANYON-->>Script: manifest content
  Script->>ANYON: validate_installations(manifest, repo_url)
  ANYON-->>Script: validated manifest
  Script->>Repo: write mcp-registry/servers/<repo>.json
  Workflow->>PR: create pull request (branch add-manifest-<repo>)

sequenceDiagram
  participant Main as main()
  participant Gen as generate_manifest()
  participant Val as validate_installations()
  participant FS as Filesystem
  participant API as ANYON API

  Main->>Gen: repo_url
  Gen->>API: request manifest
  API-->>Gen: content (possibly JSON in code block)
  Gen-->>Main: manifest dict or None
  Main->>Val: manifest, repo_url
  Val->>API: request installation validation
  API-->>Val: updated/confirmed manifest
  Val-->>Main: validated manifest
  Main->>FS: save_manifest(manifest)
  FS-->>Main: success/failure

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I thump my paws—new scripts arise,
A workflow hums beneath the skies.
Manifests bloom from API light,
Branches sprout and PRs take flight.
In tidy burrows, files now rest—
A rabbit nods: “Refactor, manifest!” 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch Jiarui/smart-registry-workflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

qodo-merge-pro · 2025-08-15T06:27:48Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 Security concerns External API key handling: The script and workflow depend on `ANYON_API_KEY`. Ensure the secret is scoped, not exposed in logs, and avoid printing full API errors that might leak response details. Also, the user-supplied `repo_url` is interpolated into prompts and filenames; validate it is a GitHub URL and sanitize to prevent malicious input influencing paths or PR metadata.
⚡ Recommended focus areas for review Possible Issue The API response parsing assumes an OpenAI-compatible shape and single content field; if the provider returns tool messages, array content parts, or different keys, `data["choices"][0]["message"]["content"]` may fail. Consider defensive checks and supporting content-as-list. data = response.json() content = data["choices"][0]["message"]["content"] return extract_json_from_content(content) except requests.RequestException as e: print(f"API request failed: {e}") return None except (KeyError, IndexError) as e: print(f"Unexpected API response format: {e}") return None Robustness `extract_json_from_content` only matches fenced blocks labeled json with exact backticks and newlines; responses with ```JSON, missing trailing newline, or extra prose will fail. Broaden regex and add fallback to strip code fences and attempt tolerant JSON parsing. def extract_json_from_content(content: str) -> Optional[dict]: """Extract JSON from the API response content.""" # Look for JSON code block json_match = re.search(r'```json\n(.?)\n```', content, re.DOTALL) if json_match: try: return json.loads(json_match.group(1)) except json.JSONDecodeError as e: print(f"Error parsing JSON: {e}") return None # Try to find JSON without code block markers try: return json.loads(content) except json.JSONDecodeError: print(f"Could not extract valid JSON from response: {content}") return None Workflow Safety* The workflow commits directly to a branch with write permissions and runs on user-provided URLs without validation. Add basic URL validation/sanitization and consider restricting to GitHub repos to avoid abuse or path traversal in file naming. generate-manifest: runs-on: ubuntu-latest permissions: contents: write pull-requests: write steps: - name: Checkout repository uses: actions/checkout@v4 with: token: ${{ secrets.GITHUB_TOKEN }} - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install dependencies run: \| python -m pip install --upgrade pip pip install requests - name: Generate manifest env: ANYON_API_KEY: ${{ secrets.ANYON_API_KEY }} run: \| python scripts/get_manifest.py "${{ github.event.inputs.repo_url }}" - name: Extract repo name for branch id: repo-info run: \| REPO_URL="${{ github.event.inputs.repo_url }}" REPO_NAME=$(echo "$REPO_URL" \| sed 's/.*github\.com[:/]//' \| sed 's/\.git$//' \| tr '/' '-') echo "repo_name=$REPO_NAME" >> $GITHUB_OUTPUT echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT - name: Create Pull Request

qodo-merge-pro · 2025-08-15T06:29:15Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Normalize API response content Add defensive parsing for API responses that may return a list of content parts or dicts, not just a plain string. Normalize `content` into a string before passing to `extract_json_from_content` to avoid KeyError/TypeError when providers return structured content. scripts/get_manifest.py [85-88] -url = "https://anyon.chatxiv.org/api/v1/openai/v1/chat/completions" -headers = { - "Authorization": f"Bearer {api_key}", - "Content-Type": "application/json" -} +... +data = response.json() +content = data.get("choices", [{}])[0].get("message", {}).get("content") +# Normalize content to string +if isinstance(content, list): + # extract text fields and join + parts = [] + for c in content: + if isinstance(c, dict) and c.get("type") == "text" and isinstance(c.get("text"), str): + parts.append(c["text"]) + elif isinstance(c, str): + parts.append(c) + content = "\n".join(parts) +elif isinstance(content, dict): + # some providers wrap content differently + content = content.get("text") or "" +if not isinstance(content, str): + raise ValueError(f"Unexpected API content format: {type(content)}") +return extract_json_from_content(content) -payload = { - "model": "x", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": f"help me generate manifest json for this repo: {repo_url}" - } - ] - } - ] -} - -try: - print(f"Generating manifest for {repo_url}...") - response = requests.post(url, headers=headers, json=payload) - response.raise_for_status() - - data = response.json() - content = data["choices"][0]["message"]["content"] - - return extract_json_from_content(content) - `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that `data["choices"][0]["message"]["content"]` is brittle and could fail. It proposes a more robust way to parse the API response, handling different possible content structures and making the script more resilient to API variations.	Medium
Possible issue	Validate installations type before applying Guard against non-dict `manifest["installations"]` and ensure validated installations are a dict before assignment. This prevents type mismatches later when saving or consuming the manifest. scripts/get_manifest.py [151-170] current_installations = manifest.get("installations", {}) +if not isinstance(current_installations, (dict, list)): + current_installations = {} +... +data = response.json() +content = data.get("choices", [{}])[0].get("message", {}).get("content") +# normalize content as in generate_manifest +if isinstance(content, list): + parts = [] + for c in content: + if isinstance(c, dict) and c.get("type") == "text" and isinstance(c.get("text"), str): + parts.append(c["text"]) + elif isinstance(c, str): + parts.append(c) + content = "\n".join(parts) +elif isinstance(content, dict): + content = content.get("text") or "" +validated_data = extract_json_from_content(content or "") +if isinstance(validated_data, dict) and isinstance(validated_data.get("installations"), dict): + print("✓ Installations validated and corrected") + manifest["installations"] = validated_data["installations"] +else: + print("⚠ Validation failed or wrong format, keeping original installations") +return manifest -payload = { - "model": "x", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository. - -Repository: {repo_url} - -Current manifest installations: -{json.dumps(current_installations, indent=2)} - -... -7. The response should be in this exact format: {{"installations": {{...}}}} - -Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones.""" - } - ] - } - ] -} - -try: - print("Validating installations against README...") - response = requests.post(url, headers=headers, json=payload) - response.raise_for_status() - - data = response.json() - content = data["choices"][0]["message"]["content"] - - validated_data = extract_json_from_content(content) - if validated_data and "installations" in validated_data: - print("✓ Installations validated and corrected") - manifest["installations"] = validated_data["installations"] - return manifest - else: - print("⚠ Validation failed, keeping original installations") - return manifest - `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly points out that the type of `validated_data` and `validated_data["installations"]` should be checked before assignment to prevent potential `TypeError` exceptions, improving the script's robustness.	Low
General	Harden JSON extraction heuristics Make JSON extraction more resilient by trimming surrounding whitespace and allowing for code fences with optional language and varying newlines. This avoids false negatives when the API formats code blocks differently. scripts/get_manifest.py [21-34] -json_match = re.search(r'```json\n(.?)\n```', content, re.DOTALL) +json_match = re.search(r'```(?:json)?\s(.?)\s```', content, re.DOTALL \| re.IGNORECASE) if json_match: + block = json_match.group(1).strip() try: - return json.loads(json_match.group(1)) + return json.loads(block) except json.JSONDecodeError as e: - print(f"Error parsing JSON: {e}") - return None - -# Try to find JSON without code block markers + print(f"Error parsing JSON from fenced block: {e}") + # fall through to try raw content +# Try to find JSON object/array within the text +content_stripped = content.strip() +# Heuristic: find first '{' or '[' and last matching bracket +start = min((i for i in [content_stripped.find("{"), content_stripped.find("[")] if i != -1), default=-1) +if start != -1: + candidate = content_stripped[start:] + try: + return json.loads(candidate) + except json.JSONDecodeError: + pass try: - return json.loads(content) + return json.loads(content_stripped) except json.JSONDecodeError: - print(f"Could not extract valid JSON from response: {content}") + print("Could not extract valid JSON from response.") return None Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion improves the `extract_json_from_content` function by making the regex for code blocks more flexible and adding better heuristics for finding JSON in raw text, which increases the likelihood of successfully parsing the API's response.	Medium
Organization best practice	Migrate argparse to Click with help option Replace argparse with Click to align with our CLI standards and provide consistent help behavior. Add @click.help_option("-h", "--help") and enhance the docstring with a brief example using a backslash-escaped block. scripts/get_manifest.py [196-223] -def main(): - parser = argparse.ArgumentParser(description="Generate MCP manifest JSON from repository URL") - parser.add_argument("repo_url", help="Repository URL to generate manifest for") +import click + +@click.command() +@click.help_option("-h", "--help") +@click.argument("repo_url") +def main(repo_url: str): + """Generate MCP manifest JSON from repository URL. - args = parser.parse_args() + Example: + \b + scripts/get_manifest.py https://github.com/owner/repo + """ # Step 1: Generate initial manifest print("Step 1: Generating initial manifest...") + manifest = generate_manifest(repo_url) + if not manifest: + print("Failed to generate manifest") + sys.exit(1) + + # Step 2: Validate and correct installations + print("Step 2: Validating installations against README...") + manifest = validate_installations(manifest, repo_url) + + # Step 3: Save manifest + print("Step 3: Saving manifest...") + if not save_manifest(manifest, repo_url): + print("Failed to save manifest") + sys.exit(1) + + print("✓ Manifest generation completed successfully!") +if __name__ == "__main__": + main() + `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: Relevant best practice - When implementing command-line interfaces with Click, use consistent help option patterns and provide clear, structured help text with examples. Include both short (-h) and long (--help) options, and format examples using backslash-escaped blocks for proper display.	Low
Security	Sanitize output filename safely Sanitize the derived filename to remove characters that are invalid or risky on filesystems. This prevents path traversal or write failures when unusual repo names or inputs are provided. scripts/get_manifest.py [37-49] def get_repo_name_from_url(repo_url: str) -> str: - """Extract repository name from URL for filename.""" + """Extract a safe repository name from URL for filename.""" # Remove .git suffix if present - if repo_url.endswith('.git'): + if repo_url.endswith(".git"): repo_url = repo_url[:-4] - # Extract owner/repo from URL match = re.search(r'github\.com[:/]([^/]+/[^/]+)', repo_url) if match: - return match.group(1).replace('/', '-') - - # Fallback to last part of URL - return repo_url.split('/')[-1] + candidate = match.group(1).replace("/", "-") + else: + candidate = repo_url.split("/")[-1] + # Sanitize filename: allow alphanum, dash, underscore, dot; replace others with '-' + safe = re.sub(r"[^A-Za-z0-9._-]", "-", candidate) + # Collapse repeated dashes and trim + safe = re.sub(r"-{2,}", "-", safe).strip("-") + return safe or "manifest" Apply / Chat Suggestion importance[1-10]: 9 __ Why: This suggestion addresses a critical path traversal vulnerability by properly sanitizing the filename derived from the user-provided `repo_url`, preventing malicious file writes.	High
Security	Prevent sensitive content leakage Avoid printing full API content on JSON parse failures as it can leak secrets and blow up logs. Log a concise error and return None, optionally truncating content. This prevents accidental exposure of repository data or tokens embedded in responses. scripts/get_manifest.py [18-34] def extract_json_from_content(content: str) -> Optional[dict]: """Extract JSON from the API response content.""" # Look for JSON code block - json_match = re.search(r'```json\n(.?)\n```', content, re.DOTALL) + json_match = re.search(r'```json\s(.?)\s```', content, re.DOTALL) if json_match: try: return json.loads(json_match.group(1)) except json.JSONDecodeError as e: - print(f"Error parsing JSON: {e}") + print(f"Error parsing JSON from fenced block: {e}") return None - # Try to find JSON without code block markers try: return json.loads(content) - except json.JSONDecodeError: - print(f"Could not extract valid JSON from response: {content}") + except json.JSONDecodeError as e: + preview = (content[:300] + "...") if isinstance(content, str) and len(content) > 300 else content + print(f"Could not extract valid JSON from response. Error: {e}. Preview: {preview}") return None Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a potential information disclosure vulnerability by preventing the full API response from being logged, which is a good security practice.	Medium
Organization best practice	Send status logs to stderr Route informational/status messages to stderr so stdout remains clean for data (e.g., the manifest JSON) if needed. Use sys.stderr.write or logging for user-facing progress updates. scripts/get_manifest.py [81-219] -print(f"Generating manifest for {repo_url}...") +sys.stderr.write(f"Generating manifest for {repo_url}...\n") ... -print("Validating installations against README...") +sys.stderr.write("Validating installations against README...\n") ... -print("✓ Installations validated and corrected") +sys.stderr.write("✓ Installations validated and corrected\n") ... -print("⚠ Validation failed, keeping original installations") +sys.stderr.write("⚠ Validation failed, keeping original installations\n") ... -print("Step 1: Generating initial manifest...") +sys.stderr.write("Step 1: Generating initial manifest...\n") ... -print("Step 2: Validating installations against README...") +sys.stderr.write("Step 2: Validating installations against README...\n") ... -print("Step 3: Saving manifest...") +sys.stderr.write("Step 3: Saving manifest...\n") ... -print("✓ Manifest generation completed successfully!") +sys.stderr.write("✓ Manifest generation completed successfully!\n") `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Prefer stderr when printing application status/log messages from subprocess-like interactions or API-driven workflows, reserving stdout for data output.	Low
General	Pin action to stable major Pin the action to a specific major and minor digest-compatible version to avoid breaking changes from upstream updates. Use v5 for setup-python which is the latest supported major, ensuring a stable CI environment. .github/workflows/generate-manifest.yml [24-27] - name: Set up Python - uses: actions/setup-python@v4 + uses: actions/setup-python@v5 with: python-version: '3.11' Apply / Chat Suggestion importance[1-10]: 5 __ Why: The suggestion correctly recommends pinning the GitHub Action to a major version (`v5`) for improved stability and to avoid unexpected breaking changes from the `v4` tag.	Low
More

github-actions · 2025-08-15T06:33:37Z

Summary

Introduces .github/workflows/generate-manifest.yml to let users trigger manifest generation via workflow_dispatch.
Replaces the large, dependency-heavy scripts/get_manifest.py with a lightweight CLI that calls the chatxiv API, validates the installations field, and saves the manifest to mcp-registry/servers/.

Review
Nice improvement—automation is clearer and the script has far fewer external deps. A few quick thoughts:

scripts/get_manifest.py
- Consider adding simple retry/back-off around the requests.post calls to handle transient API/network errors.
- extract_json_from_content assumes triple-back-tick JSON blocks; fall back patterns are helpful, but a stricter JSON schema validation step would reduce bad PRs.
- model: "x" is a placeholder—surfacing it as an arg/env var avoids future hard-coding.
.github/workflows/generate-manifest.yml
- The job installs only requests; if the script later grows (e.g., adds jsonschema) remember to update this list.
- The action deletes the branch automatically; good practice, but confirm your repo settings allow force-pushes to temporary branches.

Overall, the PR is solid and heads in the right direction.

View workflow run

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (7)

scripts/get_manifest.py (5)

80-96: Add HTTP timeout to avoid hanging on network calls

Requests without timeouts can hang indefinitely on network issues.

Apply this diff:

-        response = requests.post(url, headers=headers, json=payload)
+        response = requests.post(url, headers=headers, json=payload, timeout=60)

116-150: Standardize validator request to string content for better API compatibility

Mirror the earlier change: use a plain string for message content and read model from ANYON_MODEL.

Apply this diff:

-    payload = {
-        "model": "x",
-        "messages": [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
+    model = os.getenv("ANYON_MODEL")
+    if not model:
+        print("Error: ANYON_MODEL environment variable not set, skipping validation")
+        return manifest
+
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
 
 Repository: {repo_url}
 
 Current manifest installations:
 {json.dumps(current_installations, indent=2)}
 
 IMPORTANT INSTRUCTIONS:
 1. Access the README.md from the repository URL: {repo_url}
 2. Compare the current installations against the exact commands and configurations shown in the README.md
 3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README.
 4. Pay special attention to:
    - Exact command names (npx, uvx, docker, python, etc.)
    - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]"
    - Proper environment variable names and formats
    - Installation type matching the command used
 5. Fix any discrepancies between the manifest and the README
 6. Return ONLY a valid JSON object with the corrected installations field
 7. The response should be in this exact format: {{"installations": {{...}}}}
 
 Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones."""
-                    }
-                ]
-            }
-        ]
-    }
+            }
+        ]
+    }

151-171: Add timeout and narrow exception handling during validation call

Better to set a timeout and catch request-specific failures without masking other errors.

Apply this diff:

-        response = requests.post(url, headers=headers, json=payload)
+        response = requests.post(url, headers=headers, json=payload, timeout=60)
         response.raise_for_status()
@@
-    except Exception as e:
-        print(f"Error validating installations: {e}")
-        return manifest
+    except requests.RequestException as e:
+        print(f"Error validating installations (network): {e}")
+        return manifest
+    except (KeyError, TypeError, ValueError) as e:
+        print(f"Error validating installations (response parsing): {e}")
+        return manifest

175-183: Anchor output path to repo root to avoid CWD surprises

Writing relative to CWD may deposit files in unexpected locations if the script is run from another directory. Resolve path relative to the repository root (parent of scripts/).

Apply this diff:

-    # Create directory if it doesn't exist
-    servers_dir = Path("mcp-registry/servers")
+    # Create directory if it doesn't exist (relative to repo root)
+    script_dir = Path(__file__).parent
+    repo_root = script_dir.parent
+    servers_dir = repo_root / "mcp-registry" / "servers"

196-223: Run ruff formatting and optionally add unit tests for parsing helpers

Please run ruff (per guidelines) to ensure consistent formatting.
Consider unit tests for extract_json_from_content and get_repo_name_from_url (edge cases: SSH URLs, CRLF code fences, multi-part content).

I can generate targeted tests for these helpers if you want.

.github/workflows/generate-manifest.yml (2)

29-33: Install jsonschema to enable schema validation step (if added)

If we add a schema validation step, ensure the dependency is present.

Apply this diff:

       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          pip install requests
+          pip install requests jsonschema

34-47: Add a manifest schema validation step to catch issues before opening a PR

Leverage scripts/validate_manifest.py to fail fast on invalid JSON/schema.

Apply this diff to insert a validation step after generation:

       - name: Generate manifest
         env:
           ANYON_API_KEY: ${{ secrets.ANYON_API_KEY }}
         run: |
           python scripts/get_manifest.py "${{ github.event.inputs.repo_url }}"
 
+      - name: Validate manifest schema
+        run: |
+          python scripts/validate_manifest.py
+
       - name: Extract repo name for branch
         id: repo-info
         run: |
           REPO_URL="${{ github.event.inputs.repo_url }}"
           REPO_NAME=$(echo "$REPO_URL" | sed 's/.*github\.com[:/]//' | sed 's/\.git$//' | tr '/' '-')
           echo "repo_name=$REPO_NAME" >> $GITHUB_OUTPUT
           echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c237095 and 55fa64b.

📒 Files selected for processing (2)

.github/workflows/generate-manifest.yml (1 hunks)
scripts/get_manifest.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

Always format Python code with ruff.

Files:

scripts/get_manifest.py

🧬 Code Graph Analysis (1)

scripts/get_manifest.py (2)

scripts/validate_manifest.py (1)

main (64-97)

scripts/categorization.py (1)

main (221-224)

🪛 actionlint (1.7.7)

.github/workflows/generate-manifest.yml

25-25: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

49-49: the runner of "peter-evans/create-pull-request@v5" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 YAMLlint (1.37.1)

.github/workflows/generate-manifest.yml

[error] 54-54: trailing spaces

(trailing-spaces)

[error] 56-56: trailing spaces

(trailing-spaces)

[error] 58-58: trailing spaces

(trailing-spaces)

[error] 63-63: trailing spaces

(trailing-spaces)

[error] 65-65: trailing spaces

(trailing-spaces)

[error] 67-67: trailing spaces

(trailing-spaces)

[error] 70-70: trailing spaces

(trailing-spaces)

[error] 72-72: trailing spaces

(trailing-spaces)

[error] 76-76: trailing spaces

(trailing-spaces)

[error] 78-78: trailing spaces

(trailing-spaces)

🔇 Additional comments (2)

scripts/get_manifest.py (1)

65-78: Parameterize model and send string content; confirm ANYON model & API shape

The file scripts/get_manifest.py currently hard-codes model "x" and sends message content as an array of {type,text} objects. That will likely 400 or be parsed incorrectly. Parameterize the model (ANYON_MODEL) and send plain string content. Apply the same change in both places that build payloads.

Locations to change:
- scripts/get_manifest.py — generate_manifest(...) payload
- scripts/get_manifest.py — validate_installations(...) payload

Apply this diff (update both payloads):

-    payload = {
-        "model": "x",
-        "messages": [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": f"help me generate manifest json for this repo: {repo_url}"
-                    }
-                ]
-            }
-        ]
-    }
+    model = os.getenv("ANYON_MODEL")
+    if not model:
+        print("Error: ANYON_MODEL environment variable not set")
+        return None
+
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": f"Help me generate a valid MCP manifest JSON for this repo: {repo_url}. Return only the JSON object."
+            }
+        ]
+    }

And for validate_installations (replace its similar payload):

-    payload = {
-        "model": "x",
-        "messages": [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
-
-Repository: {repo_url}
-
-Current manifest installations:
-{json.dumps(current_installations, indent=2)}
-
-IMPORTANT INSTRUCTIONS:
-1. Access the README.md from the repository URL: {repo_url}
-2. Compare the current installations against the exact commands and configurations shown in the README.md
-3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README.
-4. Pay special attention to:
-   - Exact command names (npx, uvx, docker, python, etc.)
-   - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]"
-   - Proper environment variable names and formats
-   - Installation type matching the command used
-5. Fix any discrepancies between the manifest and the README
-6. Return ONLY a valid JSON object with the corrected installations field
-7. The response should be in this exact format: {{"installations": {{...}}}}
-
-Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones."""
-                    }
-                ]
-            }
-        ]
-    }
+    model = os.getenv("ANYON_MODEL")
+    if not model:
+        print("Error: ANYON_MODEL environment variable not set, skipping validation")
+        return manifest
+
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": f\"\"\"Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository.
+
+Repository: {repo_url}
+
+Current manifest installations:
+{json.dumps(current_installations, indent=2)}
+
+IMPORTANT INSTRUCTIONS:
+1. Access the README.md from the repository URL: {repo_url}
+2. Compare the current installations against the exact commands and configurations shown in the README.md
+3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README.
+4. Pay special attention to:
+   - Exact command names (npx, uvx, docker, python, etc.)
+   - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]"
+   - Proper environment variable names and formats
+   - Installation type matching the command used
+5. Fix any discrepancies between the manifest and the README
+6. Return ONLY a valid JSON object with the corrected installations field
+7. The response should be in this exact format: {{"installations": {{...}}}}
+
+Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones.\"\"\"
+            }
+        ]
+    }

Note: I could not find public API docs for anyon.chatxiv.org; the path used in the script (/api/v1/openai/v1/chat/completions) looks like an OpenAI-compatible chat/completions endpoint, but I could not confirm supported model IDs or any deviations from the OpenAI payload schema. Please confirm the exact model identifier(s) the host accepts (e.g., gpt-4o, gpt-4o-mini, etc.) and whether the endpoint expects standard OpenAI Chat Completions payloads.

.github/workflows/generate-manifest.yml (1)

36-37: Ensure ANYON_API_KEY secret is configured

The workflow depends on ANYON_API_KEY being available in repository secrets or org-level secrets.

If not already set, add ANYON_API_KEY under Settings → Secrets and variables → Actions.

coderabbitai · 2025-08-15T06:39:23Z

.github/workflows/generate-manifest.yml

+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'


⚠️ Potential issue

Update setup-python action to v5 (Node16 deprecation fix)

actionlint flagged v4 as too old; v5 is required on modern runners.

Apply this diff:

- - name: Set up Python - uses: actions/setup-python@v4 + - name: Set up Python + uses: actions/setup-python@v5 with: python-version: '3.11'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: actions/setup-python@v4

with:

python-version: '3.11'

- name: Set up Python

uses: actions/setup-python@v5

with:

python-version: '3.11'

🧰 Tools

🪛 actionlint (1.7.7)

25-25: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In .github/workflows/generate-manifest.yml around lines 25 to 27, update the setup-python GitHub Action version from v4 to v5 to address actionlint/runner deprecation; change the uses line to use actions/setup-python@v5 and keep the python-version input unchanged so the workflow runs on the modern runner without Node16-related warnings.

coderabbitai · 2025-08-15T06:39:24Z

.github/workflows/generate-manifest.yml

+          echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT
+
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v5


⚠️ Potential issue

Bump create-pull-request action to v6 (Node16 deprecation fix)

actionlint flagged v5 as too old; v6 uses the supported runtime.

Apply this diff:

- - name: Create Pull Request - uses: peter-evans/create-pull-request@v5 + - name: Create Pull Request + uses: peter-evans/create-pull-request@v6

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: peter-evans/create-pull-request@v5

- name: Create Pull Request

uses: peter-evans/create-pull-request@v6

🧰 Tools

🪛 actionlint (1.7.7)

49-49: the runner of "peter-evans/create-pull-request@v5" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In .github/workflows/generate-manifest.yml at line 49, the workflow uses peter-evans/create-pull-request@v5 which is flagged as outdated; update the action reference to peter-evans/create-pull-request@v6 to use the supported Node16 runtime and resolve the actionlint warning, then commit the change to the workflow file.

coderabbitai · 2025-08-15T06:39:24Z

.github/workflows/generate-manifest.yml

+          commit-message: |
+            feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}
+            
+            Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}
+            
+            🤖 Generated with [Claude Code](https://claude.ai/code)
+            
+            Co-Authored-By: Claude <[email protected]>
+          title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}"


⚠️ Potential issue

Remove trailing spaces in commit-message block (yamllint errors)

YAMLlint reports trailing spaces. Remove them to unblock CI.

Apply this diff:

commit-message: | feat: add manifest for ${{ steps.repo-info.outputs.repo_name }} - + Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }} - + 🤖 Generated with [Claude Code](https://claude.ai/code) - + Co-Authored-By: Claude <[email protected]>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

commit-message: |

feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}

Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}"

commit-message: |

feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}

Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}"

🧰 Tools

🪛 YAMLlint (1.37.1)

[error] 54-54: trailing spaces

(trailing-spaces)

[error] 56-56: trailing spaces

(trailing-spaces)

[error] 58-58: trailing spaces

(trailing-spaces)

🤖 Prompt for AI Agents

.github/workflows/generate-manifest.yml around lines 52 to 60: the commit-message block contains trailing spaces at the ends of lines causing yamllint errors; remove all trailing whitespace characters from each line in that multi-line scalar (including the blank line after the second line and any spaces at the end of the emoji/Co-Authored lines), save the file, and rerun the linter/CI to confirm the YAML no longer reports trailing-space issues.

coderabbitai · 2025-08-15T06:39:24Z

.github/workflows/generate-manifest.yml

+          body: |
+            ## Summary
+            
+            This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}
+            
+            ## Changes
+            
+            - Added new manifest JSON file in `mcp-registry/servers/`
+            - Manifest was automatically generated using the chatxiv.org API
+            
+            ## Test plan
+            
+            - [ ] Verify the generated JSON is valid
+            - [ ] Check that all required fields are present
+            - [ ] Validate installation instructions work correctly
+            
+            ---
+            
+            🤖 Generated with [Claude Code](https://claude.ai/code)
+          branch: ${{ steps.repo-info.outputs.branch_name }}


⚠️ Potential issue

Remove trailing spaces in PR body (yamllint errors)

Clean up blank lines with trailing whitespace.

Apply this diff:

body: | ## Summary - + This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }} - + ## Changes - + - Added new manifest JSON file in `mcp-registry/servers/` - Manifest was automatically generated using the chatxiv.org API - + ## Test plan - + - [ ] Verify the generated JSON is valid - [ ] Check that all required fields are present - [ ] Validate installation instructions work correctly - + --- - + 🤖 Generated with [Claude Code](https://claude.ai/code)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

body: |

## Summary

This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}

## Changes

- Added new manifest JSON file in `mcp-registry/servers/`

- Manifest was automatically generated using the chatxiv.org API

## Test plan

- [ ] Verify the generated JSON is valid

- [ ] Check that all required fields are present

- [ ] Validate installation instructions work correctly

---

🤖 Generated with [Claude Code](https://claude.ai/code)

branch: ${{ steps.repo-info.outputs.branch_name }}

body: |

## Summary

This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}

## Changes

- Added new manifest JSON file in `mcp-registry/servers/`

- Manifest was automatically generated using the chatxiv.org API

## Test plan

- [ ] Verify the generated JSON is valid

- [ ] Check that all required fields are present

- [ ] Validate installation instructions work correctly

---

🤖 Generated with [Claude Code](https://claude.ai/code)

branch: ${{ steps.repo-info.outputs.branch_name }}

🧰 Tools

🪛 YAMLlint (1.37.1)

[error] 63-63: trailing spaces

(trailing-spaces)

[error] 65-65: trailing spaces

(trailing-spaces)

[error] 67-67: trailing spaces

(trailing-spaces)

[error] 70-70: trailing spaces

(trailing-spaces)

[error] 72-72: trailing spaces

(trailing-spaces)

[error] 76-76: trailing spaces

(trailing-spaces)

[error] 78-78: trailing spaces

(trailing-spaces)

🤖 Prompt for AI Agents

.github/workflows/generate-manifest.yml around lines 61 to 80: the PR body contains blank lines and lines with trailing whitespace causing yamllint errors; remove any trailing spaces at the ends of lines and clean up unnecessary blank lines in the multiline body block (preserve content and indentation, but ensure no lines end with spaces and there are no extra blank lines).

coderabbitai · 2025-08-15T06:39:24Z

scripts/get_manifest.py

+def extract_json_from_content(content: str) -> Optional[dict]:
+    """Extract JSON from the API response content."""
+    # Look for JSON code block
+    json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
+    if json_match:
        try:
-            raw_url = self._convert_to_raw_url(repo_url)
-            response = requests.get(raw_url)
-
-            if response.status_code != 200 and "main" in raw_url:
-                logger.warning(
-                    f"Failed to fetch README.md from {repo_url} with {raw_url}. Status code: {response.status_code}"
-                )
-                raw_url = raw_url.replace("/main/", "/master/")
-                response = requests.get(raw_url)
-
-            if response.status_code != 200:
-                raise ValueError(
-                    f"Failed to fetch README.md from {repo_url} with {raw_url}. Status code: {response.status_code}"
-                )
-
-            return response.text
-        except Exception as e:
-            logger.error(f"Error fetching README from {repo_url}: {e}")
-            return ""
-
-    def _convert_to_raw_url(self, repo_url: str) -> str:
-        """Convert GitHub URL to raw content URL for README.md."""
-        if "github.com" not in repo_url:
-            raise ValueError(f"Invalid GitHub URL: {repo_url}")
-
-        # Handle subdirectory URLs (tree format)
-        if "/tree/" in repo_url:
-            # For URLs like github.com/user/repo/tree/branch/path/to/dir
-            parts = repo_url.split("/tree/")
-            base_url = parts[0].replace("github.com", "raw.githubusercontent.com")
-            path_parts = parts[1].split("/", 1)
-
-            if len(path_parts) > 1:
-                branch = path_parts[0]
-                subdir = path_parts[1]
-                return f"{base_url}/{branch}/{subdir}/README.md"
-            else:
-                branch = path_parts[0]
-                return f"{base_url}/{branch}/README.md"
-
-        # Handle direct file URLs
-        if "/blob/" in repo_url:
-            raw_url = repo_url.replace("/blob/", "/raw/")
-            if raw_url.endswith(".md"):
-                return raw_url
-            else:
-                return f"{raw_url}/README.md"
-
-        # Handle repository root URLs
-        raw_url = repo_url.replace("github.com", "raw.githubusercontent.com")
-        return f"{raw_url.rstrip('/')}/main/README.md"
-
-    @staticmethod
-    async def categorize_servers_with_llms(name, description) -> str:
-        """Categorize a server based on name and description.
-
-        Args:
-            name: Server name
-            description: Server description
-
-        Returns:
-            Category string
-        """
-        agent = CategorizationAgent()
-
-        result = await agent.execute(server_name=name, server_description=description, include_examples=True)
-
-        return result["category"]
-
-    def extract_with_llms(self, repo_url: str, readme_content: str) -> Dict:
-        """Extract manifest information using OpenAI with OpenRouter.
-
-        Args:
-            repo_url: GitHub repository URL
-            readme_content: Content of the README file
-
-        Returns:
-            Dictionary containing the extracted manifest information
-        """
-        # Initialize the complete manifest dictionary
-        complete_manifest = {}
-
-        # Step 1: Extract basic information (display_name, license, tags)
-        basic_info = self._extract_basic_info(repo_url, readme_content)
-        complete_manifest.update(basic_info)
-
-        # Step 2: Extract arguments
-        arguments = self._extract_arguments(repo_url, readme_content)
-        if arguments:
-            complete_manifest["arguments"] = arguments
-
-        # Step 3: Extract installations
-        installations = self._extract_installations(repo_url, readme_content)
-        if installations:
-            # post process
-            arguments = complete_manifest.get("arguments", {})
-            if arguments:
-                for install_type, installation in installations.items():
-                    new_installation, replacement = validate_arguments_in_installation(installation, arguments)
-                    if replacement:
-                        installations[install_type] = new_installation
-            complete_manifest["installations"] = installations
-
-        # Step 4: Extract examples
-        examples = self._extract_examples(repo_url, readme_content)
-        if examples:
-            complete_manifest["examples"] = examples
-
-        return complete_manifest
-
-    def _call_llm(self, repo_url: str, readme_content: str, schema: Dict, prompt: str) -> Dict:
-        """Generic helper method to call LLM with common retry pattern.
-
-        Args:
-            repo_url: GitHub repository URL
-            readme_content: README content
-            schema: JSON schema for the function call
-            prompt: User prompt for extraction
-            system_prompt: System prompt for extraction
-
-        Returns:
-            Extracted information or default value if failed
-        """
-        system_prompt = "You are a helpful assistant that extracts information from a GitHub repository about a server."
-
-        max_retries = 3
-        retry_count = 0
-
-        # Extract required fields from schema if available
-        required_fields = schema.get("parameters", {}).get("required", [])
-
-        while retry_count < max_retries:
-            try:
-                completion = self.client.chat.completions.create(
-                    extra_headers={"HTTP-Referer": os.environ.get("SITE_URL", "https://mcpm.sh"), "X-Title": "MCPM"},
-                    model="anthropic/claude-3.7-sonnet",
-                    messages=[
-                        {"role": "system", "content": system_prompt},
-                        {
-                            "role": "user",
-                            "content": f"GitHub URL: {repo_url}\n\nREADME Content:\n{readme_content}\n\n{prompt}",
-                        },
-                    ],
-                    tools=[{"type": "function", "function": schema}],
-                    temperature=0,
-                    tool_choice="required",
-                )
-
-                if not completion.choices or not completion.choices[0].message.tool_calls:
-                    logger.warning(f"Retry {retry_count + 1}/{max_retries}: No tool calls in response")
-                    retry_count += 1
-                    continue
-
-                tool_call = completion.choices[0].message.tool_calls[0]
-                result = json.loads(tool_call.function.arguments)
-
-                # Validate required fields if specified
-                if required_fields:
-                    missing_fields = [field for field in required_fields if field not in result]
-                    if missing_fields:
-                        logger.warning(f"Retry {retry_count + 1}/{max_retries}: Missing fields: {missing_fields}")
-                        retry_count += 1
-                        continue
-
-                return result
-
-            except Exception as e:
-                logger.error(f"Error extracting data with LLM (try {retry_count + 1}/{max_retries}): {e}")
-                retry_count += 1
-
-        logger.error(f"All {max_retries} attempts to extract data failed")
-
-        return {field: None for field in required_fields}
-
-    def _extract_basic_info(self, repo_url: str, readme_content: str) -> Dict:
-        """Extract basic information (display_name, license, tags) using LLM."""
-        schema = {
-            "name": "extract_basic_info",
-            "description": "Extract basic manifest information",
-            "parameters": {
-                "type": "object",
-                "required": ["display_name", "tags"],
-                "properties": {
-                    "display_name": {"type": "string", "description": "Human-readable server name"},
-                    "license": {"type": "string"},
-                    "tags": {"type": "array", "items": {"type": "string"}},
-                },
-                "additionalProperties": False,
-            },
-        }
-
-        return self._call_llm(
-            repo_url=repo_url,
-            readme_content=readme_content,
-            schema=schema,
-            prompt=(
-                "Extract the display_name, license, and tags from the README file. "
-                "The display_name should be a human-readable server name close to the name of the repository. "
-                "The tags should be a list of tags that describe the server."
-            ),
-        )
-
-    def _extract_arguments(self, repo_url: str, readme_content: str) -> Dict:
-        """Extract arguments information using LLM."""
-        schema = {
-            "name": "extract_arguments",
-            "description": "Extract arguments information",
-            "required": ["arguments"],
-            "parameters": {
-                "type": "object",
-                "properties": {
-                    "arguments": {
-                        "type": "array",
-                        "description": "An array of configuration arguments required by the server",
-                        "items": {
-                            "type": "object",
-                            "required": ["key", "description"],
-                            "properties": {
-                                "key": {"type": "string", "description": "The name of the argument"},
-                                "description": {"type": "string", "description": "Description of the argument"},
-                                "required": {"type": "boolean", "description": "Whether this argument is required"},
-                                "example": {"type": "string", "description": "Example value"},
-                            },
-                        },
-                    }
-                },
-            },
-        }
-
-        result = self._call_llm(
-            repo_url=repo_url,
-            readme_content=readme_content,
-            schema=schema,
-            prompt=(
-                """Extract the configuration arguments required by this server from the README file.
-The arguments should be a list of arguments that are required when running the server.
-It can often be found in the usage section of the README file.
-<Example>
-<README> Docker
-{
-  "mcpServers": {
-    "brave-search": {
-      "command": "docker",
-      "args": [
-        "run",
-        "-i",
-        "--rm",
-        "-e",
-        "BRAVE_API_KEY",
-        "mcp/brave-search"
-      ],
-      "env": {
-        "BRAVE_API_KEY": "YOUR_API_KEY_HERE"
-      }
-    }
-  }
-}
-NPX
-{
-  "mcpServers": {
-    "brave-search": {
-      "command": "npx",
-      "args": [
-        "-y",
-        "@modelcontextprotocol/server-brave-search"
-      ],
-      "env": {
-        "BRAVE_API_KEY": "YOUR_API_KEY_HERE"
-      }
-    }
-  }
-}
-<README/>
-From the example README, you should get:
-{
-  "arguments": [
-    {
-      "key": "BRAVE_API_KEY",
-      "description": "The API key for the Brave Search server",
-      "required": true,
-      "example": "YOUR_API_KEY_HERE"
+            return json.loads(json_match.group(1))
+        except json.JSONDecodeError as e:
+            print(f"Error parsing JSON: {e}")
+            return None
+
+    # Try to find JSON without code block markers
+    try:
+        return json.loads(content)
+    except json.JSONDecodeError:
+        print(f"Could not extract valid JSON from response: {content}")
+        return None


⚠️ Potential issue

Harden JSON extraction: support list-based content, CRLF fences, and safer logging

APIs often return message content as an array of parts; passing a list to re.search will crash. The regex is also strict on LF only. This patch supports array content, tolerates CRLF/code-fence variants, and avoids dumping the full response on failures.

Apply this diff:

-def extract_json_from_content(content: str) -> Optional[dict]: - """Extract JSON from the API response content.""" - # Look for JSON code block - json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL) - if json_match: - try: - return json.loads(json_match.group(1)) - except json.JSONDecodeError as e: - print(f"Error parsing JSON: {e}") - return None - - # Try to find JSON without code block markers - try: - return json.loads(content) - except json.JSONDecodeError: - print(f"Could not extract valid JSON from response: {content}") - return None +def extract_json_from_content(content: str) -> Optional[dict]: + """Extract JSON from the API response content.""" + # Coerce array-of-content-parts into a single string (Anthropic/OpenAI multi-part) + if not isinstance(content, str): + try: + content = "".join( + part.get("text", "") if isinstance(part, dict) else str(part) + for part in content + ) + except Exception: + print("Unexpected content type in API response; cannot parse JSON.") + return None + + # Look for JSON code block (tolerate CRLF and language hints) + json_match = re.search(r'```json[^\n]*\r?\n(.*?)\r?\n```', content, re.DOTALL | re.IGNORECASE) + if not json_match: + json_match = re.search(r'```\s*[\w-]*\s*\r?\n(.*?)\r?\n```', content, re.DOTALL) + if json_match: + try: + return json.loads(json_match.group(1)) + except json.JSONDecodeError as e: + print(f"Error parsing JSON from fenced block: {e}") + # fall through to raw parsing + + # Try to parse raw JSON + try: + return json.loads(content) + except json.JSONDecodeError: + preview = content[:500].replace("\n", " ") + print(f"Could not extract valid JSON from response. Preview: {preview}...") + return None

# [2.7.0](v2.6.1...v2.7.0) (2025-08-15) ### Features * add script and workflow for contributing registry ([#233](#233)) ([ec67763](ec67763))

mcpm-semantic-release · 2025-08-15T06:40:50Z

🎉 This PR is included in version 2.7.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

feat: add script and workflow for contributing registry

55fa64b

qodo-merge-pro bot added Possible security concern Review effort 2/5 labels Aug 15, 2025

GabrielDrapor added the codex-review label Aug 15, 2025

github-actions bot added codex-review-in-progress and removed codex-review labels Aug 15, 2025

github-actions bot added codex-review-completed and removed codex-review-in-progress labels Aug 15, 2025

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

GabrielDrapor merged commit ec67763 into main Aug 15, 2025
9 of 11 checks passed

GabrielDrapor deleted the Jiarui/smart-registry-workflow branch August 15, 2025 06:39

mcpm-semantic-release bot pushed a commit that referenced this pull request Aug 15, 2025

chore(release): 2.7.0 [skip ci]

8915d2c

# [2.7.0](v2.6.1...v2.7.0) (2025-08-15) ### Features * add script and workflow for contributing registry ([#233](#233)) ([ec67763](ec67763))

	uses: peter-evans/create-pull-request@v5
	- name: Create Pull Request
	uses: peter-evans/create-pull-request@v6

feat: add script and workflow for contributing registry #233

feat: add script and workflow for contributing registry #233

Uh oh!

Conversation

GabrielDrapor commented Aug 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

qodo-merge-pro bot commented Aug 15, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mcpm-semantic-release bot commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GabrielDrapor commented Aug 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

qodo-merge-pro bot commented Aug 15, 2025 •

edited

Loading