-
Notifications
You must be signed in to change notification settings - Fork 87
feat: add script and workflow for contributing registry #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces a GitHub Actions workflow to generate an MCP manifest for a given repository and open a PR. Replaces the previous LLM-heavy manifest script with a simplified, external API-driven implementation featuring JSON parsing, repository name derivation, API calls for generation/validation, and file output. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant GitHub Actions as Workflow
participant Repo as Target Repo
participant Script as get_manifest.py
participant ANYON as External API
participant PR as create-pull-request
User->>Workflow: workflow_dispatch (repo_url)
Workflow->>Repo: actions/checkout
Workflow->>Script: python scripts/get_manifest.py --repo_url
Script->>ANYON: generate_manifest(repo_url)
ANYON-->>Script: manifest content
Script->>ANYON: validate_installations(manifest, repo_url)
ANYON-->>Script: validated manifest
Script->>Repo: write mcp-registry/servers/<repo>.json
Workflow->>PR: create pull request (branch add-manifest-<repo>)
sequenceDiagram
participant Main as main()
participant Gen as generate_manifest()
participant Val as validate_installations()
participant FS as Filesystem
participant API as ANYON API
Main->>Gen: repo_url
Gen->>API: request manifest
API-->>Gen: content (possibly JSON in code block)
Gen-->>Main: manifest dict or None
Main->>Val: manifest, repo_url
Val->>API: request installation validation
API-->>Val: updated/confirmed manifest
Val-->>Main: validated manifest
Main->>FS: save_manifest(manifest)
FS-->>Main: success/failure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||||||||||||||
|
Summary
Review
Overall, the PR is solid and heads in the right direction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (7)
scripts/get_manifest.py (5)
80-96: Add HTTP timeout to avoid hanging on network callsRequests without timeouts can hang indefinitely on network issues.
Apply this diff:
- response = requests.post(url, headers=headers, json=payload) + response = requests.post(url, headers=headers, json=payload, timeout=60)
116-150: Standardize validator request to string content for better API compatibilityMirror the earlier change: use a plain string for message content and read model from ANYON_MODEL.
Apply this diff:
- payload = { - "model": "x", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository. + model = os.getenv("ANYON_MODEL") + if not model: + print("Error: ANYON_MODEL environment variable not set, skipping validation") + return manifest + + payload = { + "model": model, + "messages": [ + { + "role": "user", + "content": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository. Repository: {repo_url} Current manifest installations: {json.dumps(current_installations, indent=2)} IMPORTANT INSTRUCTIONS: 1. Access the README.md from the repository URL: {repo_url} 2. Compare the current installations against the exact commands and configurations shown in the README.md 3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README. 4. Pay special attention to: - Exact command names (npx, uvx, docker, python, etc.) - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]" - Proper environment variable names and formats - Installation type matching the command used 5. Fix any discrepancies between the manifest and the README 6. Return ONLY a valid JSON object with the corrected installations field 7. The response should be in this exact format: {{"installations": {{...}}}} Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones.""" - } - ] - } - ] - } + } + ] + }
151-171: Add timeout and narrow exception handling during validation callBetter to set a timeout and catch request-specific failures without masking other errors.
Apply this diff:
- response = requests.post(url, headers=headers, json=payload) + response = requests.post(url, headers=headers, json=payload, timeout=60) response.raise_for_status() @@ - except Exception as e: - print(f"Error validating installations: {e}") - return manifest + except requests.RequestException as e: + print(f"Error validating installations (network): {e}") + return manifest + except (KeyError, TypeError, ValueError) as e: + print(f"Error validating installations (response parsing): {e}") + return manifest
175-183: Anchor output path to repo root to avoid CWD surprisesWriting relative to CWD may deposit files in unexpected locations if the script is run from another directory. Resolve path relative to the repository root (parent of scripts/).
Apply this diff:
- # Create directory if it doesn't exist - servers_dir = Path("mcp-registry/servers") + # Create directory if it doesn't exist (relative to repo root) + script_dir = Path(__file__).parent + repo_root = script_dir.parent + servers_dir = repo_root / "mcp-registry" / "servers"
196-223: Run ruff formatting and optionally add unit tests for parsing helpers
- Please run ruff (per guidelines) to ensure consistent formatting.
- Consider unit tests for extract_json_from_content and get_repo_name_from_url (edge cases: SSH URLs, CRLF code fences, multi-part content).
I can generate targeted tests for these helpers if you want.
.github/workflows/generate-manifest.yml (2)
29-33: Install jsonschema to enable schema validation step (if added)If we add a schema validation step, ensure the dependency is present.
Apply this diff:
- name: Install dependencies run: | python -m pip install --upgrade pip - pip install requests + pip install requests jsonschema
34-47: Add a manifest schema validation step to catch issues before opening a PRLeverage scripts/validate_manifest.py to fail fast on invalid JSON/schema.
Apply this diff to insert a validation step after generation:
- name: Generate manifest env: ANYON_API_KEY: ${{ secrets.ANYON_API_KEY }} run: | python scripts/get_manifest.py "${{ github.event.inputs.repo_url }}" + - name: Validate manifest schema + run: | + python scripts/validate_manifest.py + - name: Extract repo name for branch id: repo-info run: | REPO_URL="${{ github.event.inputs.repo_url }}" REPO_NAME=$(echo "$REPO_URL" | sed 's/.*github\.com[:/]//' | sed 's/\.git$//' | tr '/' '-') echo "repo_name=$REPO_NAME" >> $GITHUB_OUTPUT echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
.github/workflows/generate-manifest.yml(1 hunks)scripts/get_manifest.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
Always format Python code with
ruff.
Files:
scripts/get_manifest.py
🧬 Code Graph Analysis (1)
scripts/get_manifest.py (2)
scripts/validate_manifest.py (1)
main(64-97)scripts/categorization.py (1)
main(221-224)
🪛 actionlint (1.7.7)
.github/workflows/generate-manifest.yml
25-25: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
49-49: the runner of "peter-evans/create-pull-request@v5" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 YAMLlint (1.37.1)
.github/workflows/generate-manifest.yml
[error] 54-54: trailing spaces
(trailing-spaces)
[error] 56-56: trailing spaces
(trailing-spaces)
[error] 58-58: trailing spaces
(trailing-spaces)
[error] 63-63: trailing spaces
(trailing-spaces)
[error] 65-65: trailing spaces
(trailing-spaces)
[error] 67-67: trailing spaces
(trailing-spaces)
[error] 70-70: trailing spaces
(trailing-spaces)
[error] 72-72: trailing spaces
(trailing-spaces)
[error] 76-76: trailing spaces
(trailing-spaces)
[error] 78-78: trailing spaces
(trailing-spaces)
🔇 Additional comments (2)
scripts/get_manifest.py (1)
65-78: Parameterize model and send string content; confirm ANYON model & API shapeThe file scripts/get_manifest.py currently hard-codes model "x" and sends message content as an array of {type,text} objects. That will likely 400 or be parsed incorrectly. Parameterize the model (ANYON_MODEL) and send plain string content. Apply the same change in both places that build payloads.
- Locations to change:
- scripts/get_manifest.py — generate_manifest(...) payload
- scripts/get_manifest.py — validate_installations(...) payload
Apply this diff (update both payloads):
- payload = { - "model": "x", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": f"help me generate manifest json for this repo: {repo_url}" - } - ] - } - ] - } + model = os.getenv("ANYON_MODEL") + if not model: + print("Error: ANYON_MODEL environment variable not set") + return None + + payload = { + "model": model, + "messages": [ + { + "role": "user", + "content": f"Help me generate a valid MCP manifest JSON for this repo: {repo_url}. Return only the JSON object." + } + ] + }And for validate_installations (replace its similar payload):
- payload = { - "model": "x", - "messages": [ - { - "role": "user", - "content": [ - { - "type": "text", - "text": f"""Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository. - -Repository: {repo_url} - -Current manifest installations: -{json.dumps(current_installations, indent=2)} - -IMPORTANT INSTRUCTIONS: -1. Access the README.md from the repository URL: {repo_url} -2. Compare the current installations against the exact commands and configurations shown in the README.md -3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README. -4. Pay special attention to: - - Exact command names (npx, uvx, docker, python, etc.) - - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]" - - Proper environment variable names and formats - - Installation type matching the command used -5. Fix any discrepancies between the manifest and the README -6. Return ONLY a valid JSON object with the corrected installations field -7. The response should be in this exact format: {{"installations": {{...}}}} - -Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones.""" - } - ] - } - ] - } + model = os.getenv("ANYON_MODEL") + if not model: + print("Error: ANYON_MODEL environment variable not set, skipping validation") + return manifest + + payload = { + "model": model, + "messages": [ + { + "role": "user", + "content": f\"\"\"Please carefully validate and correct the installations field in this manifest by checking the original README.md from the repository. + +Repository: {repo_url} + +Current manifest installations: +{json.dumps(current_installations, indent=2)} + +IMPORTANT INSTRUCTIONS: +1. Access the README.md from the repository URL: {repo_url} +2. Compare the current installations against the exact commands and configurations shown in the README.md +3. Ensure the command, args, and env variables exactly match what's documented in the README. Remove the installation methods which are not mentioned in README. +4. Pay special attention to: + - Exact command names (npx, uvx, docker, python, etc.) + - Correct package names and arguments (e.g., for npx command, it should usually be "-y [package_name]" + - Proper environment variable names and formats + - Installation type matching the command used +5. Fix any discrepancies between the manifest and the README +6. Return ONLY a valid JSON object with the corrected installations field +7. The response should be in this exact format: {{"installations": {{...}}}} + +Focus on accuracy - the installations must work exactly as documented in the README. If the README shows different installation methods, include all valid ones.\"\"\" + } + ] + }Note: I could not find public API docs for anyon.chatxiv.org; the path used in the script (/api/v1/openai/v1/chat/completions) looks like an OpenAI-compatible chat/completions endpoint, but I could not confirm supported model IDs or any deviations from the OpenAI payload schema. Please confirm the exact model identifier(s) the host accepts (e.g., gpt-4o, gpt-4o-mini, etc.) and whether the endpoint expects standard OpenAI Chat Completions payloads.
.github/workflows/generate-manifest.yml (1)
36-37: Ensure ANYON_API_KEY secret is configuredThe workflow depends on ANYON_API_KEY being available in repository secrets or org-level secrets.
If not already set, add ANYON_API_KEY under Settings → Secrets and variables → Actions.
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: '3.11' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update setup-python action to v5 (Node16 deprecation fix)
actionlint flagged v4 as too old; v5 is required on modern runners.
Apply this diff:
- - name: Set up Python
- uses: actions/setup-python@v4
+ - name: Set up Python
+ uses: actions/setup-python@v5
with:
python-version: '3.11'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: actions/setup-python@v4 | |
| with: | |
| python-version: '3.11' | |
| - name: Set up Python | |
| uses: actions/setup-python@v5 | |
| with: | |
| python-version: '3.11' |
🧰 Tools
🪛 actionlint (1.7.7)
25-25: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/generate-manifest.yml around lines 25 to 27, update the
setup-python GitHub Action version from v4 to v5 to address actionlint/runner
deprecation; change the uses line to use actions/setup-python@v5 and keep the
python-version input unchanged so the workflow runs on the modern runner without
Node16-related warnings.
| echo "branch_name=add-manifest-$REPO_NAME" >> $GITHUB_OUTPUT | ||
| - name: Create Pull Request | ||
| uses: peter-evans/create-pull-request@v5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump create-pull-request action to v6 (Node16 deprecation fix)
actionlint flagged v5 as too old; v6 uses the supported runtime.
Apply this diff:
- - name: Create Pull Request
- uses: peter-evans/create-pull-request@v5
+ - name: Create Pull Request
+ uses: peter-evans/create-pull-request@v6📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: peter-evans/create-pull-request@v5 | |
| - name: Create Pull Request | |
| uses: peter-evans/create-pull-request@v6 |
🧰 Tools
🪛 actionlint (1.7.7)
49-49: the runner of "peter-evans/create-pull-request@v5" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/generate-manifest.yml at line 49, the workflow uses
peter-evans/create-pull-request@v5 which is flagged as outdated; update the
action reference to peter-evans/create-pull-request@v6 to use the supported
Node16 runtime and resolve the actionlint warning, then commit the change to the
workflow file.
| commit-message: | | ||
| feat: add manifest for ${{ steps.repo-info.outputs.repo_name }} | ||
| Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }} | ||
| 🤖 Generated with [Claude Code](https://claude.ai/code) | ||
| Co-Authored-By: Claude <[email protected]> | ||
| title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces in commit-message block (yamllint errors)
YAMLlint reports trailing spaces. Remove them to unblock CI.
Apply this diff:
commit-message: |
feat: add manifest for ${{ steps.repo-info.outputs.repo_name }}
-
+
Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }}
-
+
🤖 Generated with [Claude Code](https://claude.ai/code)
-
+
Co-Authored-By: Claude <[email protected]>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| commit-message: | | |
| feat: add manifest for ${{ steps.repo-info.outputs.repo_name }} | |
| Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }} | |
| 🤖 Generated with [Claude Code](https://claude.ai/code) | |
| Co-Authored-By: Claude <[email protected]> | |
| title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}" | |
| commit-message: | | |
| feat: add manifest for ${{ steps.repo-info.outputs.repo_name }} | |
| Generated manifest JSON for repository: ${{ github.event.inputs.repo_url }} | |
| 🤖 Generated with [Claude Code](https://claude.ai/code) | |
| Co-Authored-By: Claude <[email protected]> | |
| title: "feat: Add MCP manifest for ${{ steps.repo-info.outputs.repo_name }}" |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 54-54: trailing spaces
(trailing-spaces)
[error] 56-56: trailing spaces
(trailing-spaces)
[error] 58-58: trailing spaces
(trailing-spaces)
🤖 Prompt for AI Agents
.github/workflows/generate-manifest.yml around lines 52 to 60: the
commit-message block contains trailing spaces at the ends of lines causing
yamllint errors; remove all trailing whitespace characters from each line in
that multi-line scalar (including the blank line after the second line and any
spaces at the end of the emoji/Co-Authored lines), save the file, and rerun the
linter/CI to confirm the YAML no longer reports trailing-space issues.
| body: | | ||
| ## Summary | ||
| This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }} | ||
| ## Changes | ||
| - Added new manifest JSON file in `mcp-registry/servers/` | ||
| - Manifest was automatically generated using the chatxiv.org API | ||
| ## Test plan | ||
| - [ ] Verify the generated JSON is valid | ||
| - [ ] Check that all required fields are present | ||
| - [ ] Validate installation instructions work correctly | ||
| --- | ||
| 🤖 Generated with [Claude Code](https://claude.ai/code) | ||
| branch: ${{ steps.repo-info.outputs.branch_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces in PR body (yamllint errors)
Clean up blank lines with trailing whitespace.
Apply this diff:
body: |
## Summary
-
+
This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }}
-
+
## Changes
-
+
- Added new manifest JSON file in `mcp-registry/servers/`
- Manifest was automatically generated using the chatxiv.org API
-
+
## Test plan
-
+
- [ ] Verify the generated JSON is valid
- [ ] Check that all required fields are present
- [ ] Validate installation instructions work correctly
-
+
---
-
+
🤖 Generated with [Claude Code](https://claude.ai/code)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| body: | | |
| ## Summary | |
| This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }} | |
| ## Changes | |
| - Added new manifest JSON file in `mcp-registry/servers/` | |
| - Manifest was automatically generated using the chatxiv.org API | |
| ## Test plan | |
| - [ ] Verify the generated JSON is valid | |
| - [ ] Check that all required fields are present | |
| - [ ] Validate installation instructions work correctly | |
| --- | |
| 🤖 Generated with [Claude Code](https://claude.ai/code) | |
| branch: ${{ steps.repo-info.outputs.branch_name }} | |
| body: | | |
| ## Summary | |
| This PR adds a new MCP server manifest generated from the repository: ${{ github.event.inputs.repo_url }} | |
| ## Changes | |
| - Added new manifest JSON file in `mcp-registry/servers/` | |
| - Manifest was automatically generated using the chatxiv.org API | |
| ## Test plan | |
| - [ ] Verify the generated JSON is valid | |
| - [ ] Check that all required fields are present | |
| - [ ] Validate installation instructions work correctly | |
| --- | |
| 🤖 Generated with [Claude Code](https://claude.ai/code) | |
| branch: ${{ steps.repo-info.outputs.branch_name }} |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 63-63: trailing spaces
(trailing-spaces)
[error] 65-65: trailing spaces
(trailing-spaces)
[error] 67-67: trailing spaces
(trailing-spaces)
[error] 70-70: trailing spaces
(trailing-spaces)
[error] 72-72: trailing spaces
(trailing-spaces)
[error] 76-76: trailing spaces
(trailing-spaces)
[error] 78-78: trailing spaces
(trailing-spaces)
🤖 Prompt for AI Agents
.github/workflows/generate-manifest.yml around lines 61 to 80: the PR body
contains blank lines and lines with trailing whitespace causing yamllint errors;
remove any trailing spaces at the ends of lines and clean up unnecessary blank
lines in the multiline body block (preserve content and indentation, but ensure
no lines end with spaces and there are no extra blank lines).
| def extract_json_from_content(content: str) -> Optional[dict]: | ||
| """Extract JSON from the API response content.""" | ||
| # Look for JSON code block | ||
| json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL) | ||
| if json_match: | ||
| try: | ||
| raw_url = self._convert_to_raw_url(repo_url) | ||
| response = requests.get(raw_url) | ||
|
|
||
| if response.status_code != 200 and "main" in raw_url: | ||
| logger.warning( | ||
| f"Failed to fetch README.md from {repo_url} with {raw_url}. Status code: {response.status_code}" | ||
| ) | ||
| raw_url = raw_url.replace("/main/", "/master/") | ||
| response = requests.get(raw_url) | ||
|
|
||
| if response.status_code != 200: | ||
| raise ValueError( | ||
| f"Failed to fetch README.md from {repo_url} with {raw_url}. Status code: {response.status_code}" | ||
| ) | ||
|
|
||
| return response.text | ||
| except Exception as e: | ||
| logger.error(f"Error fetching README from {repo_url}: {e}") | ||
| return "" | ||
|
|
||
| def _convert_to_raw_url(self, repo_url: str) -> str: | ||
| """Convert GitHub URL to raw content URL for README.md.""" | ||
| if "github.com" not in repo_url: | ||
| raise ValueError(f"Invalid GitHub URL: {repo_url}") | ||
|
|
||
| # Handle subdirectory URLs (tree format) | ||
| if "/tree/" in repo_url: | ||
| # For URLs like github.com/user/repo/tree/branch/path/to/dir | ||
| parts = repo_url.split("/tree/") | ||
| base_url = parts[0].replace("github.com", "raw.githubusercontent.com") | ||
| path_parts = parts[1].split("/", 1) | ||
|
|
||
| if len(path_parts) > 1: | ||
| branch = path_parts[0] | ||
| subdir = path_parts[1] | ||
| return f"{base_url}/{branch}/{subdir}/README.md" | ||
| else: | ||
| branch = path_parts[0] | ||
| return f"{base_url}/{branch}/README.md" | ||
|
|
||
| # Handle direct file URLs | ||
| if "/blob/" in repo_url: | ||
| raw_url = repo_url.replace("/blob/", "/raw/") | ||
| if raw_url.endswith(".md"): | ||
| return raw_url | ||
| else: | ||
| return f"{raw_url}/README.md" | ||
|
|
||
| # Handle repository root URLs | ||
| raw_url = repo_url.replace("github.com", "raw.githubusercontent.com") | ||
| return f"{raw_url.rstrip('/')}/main/README.md" | ||
|
|
||
| @staticmethod | ||
| async def categorize_servers_with_llms(name, description) -> str: | ||
| """Categorize a server based on name and description. | ||
| Args: | ||
| name: Server name | ||
| description: Server description | ||
| Returns: | ||
| Category string | ||
| """ | ||
| agent = CategorizationAgent() | ||
|
|
||
| result = await agent.execute(server_name=name, server_description=description, include_examples=True) | ||
|
|
||
| return result["category"] | ||
|
|
||
| def extract_with_llms(self, repo_url: str, readme_content: str) -> Dict: | ||
| """Extract manifest information using OpenAI with OpenRouter. | ||
| Args: | ||
| repo_url: GitHub repository URL | ||
| readme_content: Content of the README file | ||
| Returns: | ||
| Dictionary containing the extracted manifest information | ||
| """ | ||
| # Initialize the complete manifest dictionary | ||
| complete_manifest = {} | ||
|
|
||
| # Step 1: Extract basic information (display_name, license, tags) | ||
| basic_info = self._extract_basic_info(repo_url, readme_content) | ||
| complete_manifest.update(basic_info) | ||
|
|
||
| # Step 2: Extract arguments | ||
| arguments = self._extract_arguments(repo_url, readme_content) | ||
| if arguments: | ||
| complete_manifest["arguments"] = arguments | ||
|
|
||
| # Step 3: Extract installations | ||
| installations = self._extract_installations(repo_url, readme_content) | ||
| if installations: | ||
| # post process | ||
| arguments = complete_manifest.get("arguments", {}) | ||
| if arguments: | ||
| for install_type, installation in installations.items(): | ||
| new_installation, replacement = validate_arguments_in_installation(installation, arguments) | ||
| if replacement: | ||
| installations[install_type] = new_installation | ||
| complete_manifest["installations"] = installations | ||
|
|
||
| # Step 4: Extract examples | ||
| examples = self._extract_examples(repo_url, readme_content) | ||
| if examples: | ||
| complete_manifest["examples"] = examples | ||
|
|
||
| return complete_manifest | ||
|
|
||
| def _call_llm(self, repo_url: str, readme_content: str, schema: Dict, prompt: str) -> Dict: | ||
| """Generic helper method to call LLM with common retry pattern. | ||
| Args: | ||
| repo_url: GitHub repository URL | ||
| readme_content: README content | ||
| schema: JSON schema for the function call | ||
| prompt: User prompt for extraction | ||
| system_prompt: System prompt for extraction | ||
| Returns: | ||
| Extracted information or default value if failed | ||
| """ | ||
| system_prompt = "You are a helpful assistant that extracts information from a GitHub repository about a server." | ||
|
|
||
| max_retries = 3 | ||
| retry_count = 0 | ||
|
|
||
| # Extract required fields from schema if available | ||
| required_fields = schema.get("parameters", {}).get("required", []) | ||
|
|
||
| while retry_count < max_retries: | ||
| try: | ||
| completion = self.client.chat.completions.create( | ||
| extra_headers={"HTTP-Referer": os.environ.get("SITE_URL", "https://mcpm.sh"), "X-Title": "MCPM"}, | ||
| model="anthropic/claude-3.7-sonnet", | ||
| messages=[ | ||
| {"role": "system", "content": system_prompt}, | ||
| { | ||
| "role": "user", | ||
| "content": f"GitHub URL: {repo_url}\n\nREADME Content:\n{readme_content}\n\n{prompt}", | ||
| }, | ||
| ], | ||
| tools=[{"type": "function", "function": schema}], | ||
| temperature=0, | ||
| tool_choice="required", | ||
| ) | ||
|
|
||
| if not completion.choices or not completion.choices[0].message.tool_calls: | ||
| logger.warning(f"Retry {retry_count + 1}/{max_retries}: No tool calls in response") | ||
| retry_count += 1 | ||
| continue | ||
|
|
||
| tool_call = completion.choices[0].message.tool_calls[0] | ||
| result = json.loads(tool_call.function.arguments) | ||
|
|
||
| # Validate required fields if specified | ||
| if required_fields: | ||
| missing_fields = [field for field in required_fields if field not in result] | ||
| if missing_fields: | ||
| logger.warning(f"Retry {retry_count + 1}/{max_retries}: Missing fields: {missing_fields}") | ||
| retry_count += 1 | ||
| continue | ||
|
|
||
| return result | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Error extracting data with LLM (try {retry_count + 1}/{max_retries}): {e}") | ||
| retry_count += 1 | ||
|
|
||
| logger.error(f"All {max_retries} attempts to extract data failed") | ||
|
|
||
| return {field: None for field in required_fields} | ||
|
|
||
| def _extract_basic_info(self, repo_url: str, readme_content: str) -> Dict: | ||
| """Extract basic information (display_name, license, tags) using LLM.""" | ||
| schema = { | ||
| "name": "extract_basic_info", | ||
| "description": "Extract basic manifest information", | ||
| "parameters": { | ||
| "type": "object", | ||
| "required": ["display_name", "tags"], | ||
| "properties": { | ||
| "display_name": {"type": "string", "description": "Human-readable server name"}, | ||
| "license": {"type": "string"}, | ||
| "tags": {"type": "array", "items": {"type": "string"}}, | ||
| }, | ||
| "additionalProperties": False, | ||
| }, | ||
| } | ||
|
|
||
| return self._call_llm( | ||
| repo_url=repo_url, | ||
| readme_content=readme_content, | ||
| schema=schema, | ||
| prompt=( | ||
| "Extract the display_name, license, and tags from the README file. " | ||
| "The display_name should be a human-readable server name close to the name of the repository. " | ||
| "The tags should be a list of tags that describe the server." | ||
| ), | ||
| ) | ||
|
|
||
| def _extract_arguments(self, repo_url: str, readme_content: str) -> Dict: | ||
| """Extract arguments information using LLM.""" | ||
| schema = { | ||
| "name": "extract_arguments", | ||
| "description": "Extract arguments information", | ||
| "required": ["arguments"], | ||
| "parameters": { | ||
| "type": "object", | ||
| "properties": { | ||
| "arguments": { | ||
| "type": "array", | ||
| "description": "An array of configuration arguments required by the server", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["key", "description"], | ||
| "properties": { | ||
| "key": {"type": "string", "description": "The name of the argument"}, | ||
| "description": {"type": "string", "description": "Description of the argument"}, | ||
| "required": {"type": "boolean", "description": "Whether this argument is required"}, | ||
| "example": {"type": "string", "description": "Example value"}, | ||
| }, | ||
| }, | ||
| } | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| result = self._call_llm( | ||
| repo_url=repo_url, | ||
| readme_content=readme_content, | ||
| schema=schema, | ||
| prompt=( | ||
| """Extract the configuration arguments required by this server from the README file. | ||
| The arguments should be a list of arguments that are required when running the server. | ||
| It can often be found in the usage section of the README file. | ||
| <Example> | ||
| <README> Docker | ||
| { | ||
| "mcpServers": { | ||
| "brave-search": { | ||
| "command": "docker", | ||
| "args": [ | ||
| "run", | ||
| "-i", | ||
| "--rm", | ||
| "-e", | ||
| "BRAVE_API_KEY", | ||
| "mcp/brave-search" | ||
| ], | ||
| "env": { | ||
| "BRAVE_API_KEY": "YOUR_API_KEY_HERE" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| NPX | ||
| { | ||
| "mcpServers": { | ||
| "brave-search": { | ||
| "command": "npx", | ||
| "args": [ | ||
| "-y", | ||
| "@modelcontextprotocol/server-brave-search" | ||
| ], | ||
| "env": { | ||
| "BRAVE_API_KEY": "YOUR_API_KEY_HERE" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| <README/> | ||
| From the example README, you should get: | ||
| { | ||
| "arguments": [ | ||
| { | ||
| "key": "BRAVE_API_KEY", | ||
| "description": "The API key for the Brave Search server", | ||
| "required": true, | ||
| "example": "YOUR_API_KEY_HERE" | ||
| return json.loads(json_match.group(1)) | ||
| except json.JSONDecodeError as e: | ||
| print(f"Error parsing JSON: {e}") | ||
| return None | ||
|
|
||
| # Try to find JSON without code block markers | ||
| try: | ||
| return json.loads(content) | ||
| except json.JSONDecodeError: | ||
| print(f"Could not extract valid JSON from response: {content}") | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden JSON extraction: support list-based content, CRLF fences, and safer logging
APIs often return message content as an array of parts; passing a list to re.search will crash. The regex is also strict on LF only. This patch supports array content, tolerates CRLF/code-fence variants, and avoids dumping the full response on failures.
Apply this diff:
-def extract_json_from_content(content: str) -> Optional[dict]:
- """Extract JSON from the API response content."""
- # Look for JSON code block
- json_match = re.search(r'```json\n(.*?)\n```', content, re.DOTALL)
- if json_match:
- try:
- return json.loads(json_match.group(1))
- except json.JSONDecodeError as e:
- print(f"Error parsing JSON: {e}")
- return None
-
- # Try to find JSON without code block markers
- try:
- return json.loads(content)
- except json.JSONDecodeError:
- print(f"Could not extract valid JSON from response: {content}")
- return None
+def extract_json_from_content(content: str) -> Optional[dict]:
+ """Extract JSON from the API response content."""
+ # Coerce array-of-content-parts into a single string (Anthropic/OpenAI multi-part)
+ if not isinstance(content, str):
+ try:
+ content = "".join(
+ part.get("text", "") if isinstance(part, dict) else str(part)
+ for part in content
+ )
+ except Exception:
+ print("Unexpected content type in API response; cannot parse JSON.")
+ return None
+
+ # Look for JSON code block (tolerate CRLF and language hints)
+ json_match = re.search(r'```json[^\n]*\r?\n(.*?)\r?\n```', content, re.DOTALL | re.IGNORECASE)
+ if not json_match:
+ json_match = re.search(r'```\s*[\w-]*\s*\r?\n(.*?)\r?\n```', content, re.DOTALL)
+ if json_match:
+ try:
+ return json.loads(json_match.group(1))
+ except json.JSONDecodeError as e:
+ print(f"Error parsing JSON from fenced block: {e}")
+ # fall through to raw parsing
+
+ # Try to parse raw JSON
+ try:
+ return json.loads(content)
+ except json.JSONDecodeError:
+ preview = content[:500].replace("\n", " ")
+ print(f"Could not extract valid JSON from response. Preview: {preview}...")
+ return None# [2.7.0](v2.6.1...v2.7.0) (2025-08-15) ### Features * add script and workflow for contributing registry ([#233](#233)) ([ec67763](ec67763))
|
🎉 This PR is included in version 2.7.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
PR Type
Enhancement
Description
Replace complex LLM-based manifest generation with API-based approach
Add GitHub workflow for automated manifest generation via PR
Simplify script to use chatxiv.org API instead of local processing
Add validation step to correct installations against README
Diagram Walkthrough
File Walkthrough
get_manifest.py
Complete rewrite using API-based approachscripts/get_manifest.py
client
extract_json_from_content()for parsing API responsesvalidate_installations()to verify against READMEgenerate-manifest.yml
New GitHub workflow for manifest generation.github/workflows/generate-manifest.yml
Summary by CodeRabbit
New Features
Refactor
Chores