Skip to content

Commit f1d611f

Browse files
Add comprehensive D4D Assistant workflows and MCP configuration (#57)
* Add D4D Assistant workflow instructions for GitHub Actions - Add dedicated instruction files for D4D Assistant in .github/workflows/ - d4d_assistant_create.md: Complete workflow for creating new datasheets - d4d_assistant_edit.md: Complete workflow for editing existing datasheets - Update CLAUDE.md to reference these instruction files Both instruction files include: - Step-by-step processes for metadata extraction and datasheet generation - PR creation workflows with descriptive templates - User notification templates for GitHub issue comments - Validation requirements and error handling - Schema reference and constraint checking 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]> * Enhance D4D Assistant with MCP configuration, scope limits, and validation guidance ## MCP Server Setup - Add .mcp.json with GitHub and ARTL MCP server configurations - Update .claude/settings.json to enable project MCP servers - Add permissions for mcp__github__*, mcp__artl__*, WebSearch, WebFetch - Create .github/workflows/README.md with MCP setup and troubleshooting guide ## D4D Assistant Scope - Add explicit scope limitations (D4D tasks only) - Provide polite redirect template for non-D4D questions - Document available MCP tools and their purposes: - GitHub MCP: PR/issue management, repository operations - ARTL MCP: Academic literature search and retrieval - WebSearch: Find dataset documentation - WebFetch: Retrieve content from URLs ## Validation Enhancement - Add comprehensive validation instructions to both workflows - Document common validation errors with fixes - Provide step-by-step debugging guidance - Emphasize validation is required before PR creation - Include alternative validation methods ## Documentation - Add setup guide in .github/workflows/README.md - Document MCP server capabilities and authentication - Include troubleshooting section for common issues - Provide security notes about MCP server trust 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]> * Add PR modification workflows and explicit instruction file reading ## PR Modification Support - Add "Modifying an Existing PR" section to both workflow files - Document when to modify vs. create new PR - Provide step-by-step workflow for PR updates: - Find and checkout existing PR branch - Make requested changes - Validate and commit updates - Comment on PR with changes - Optionally notify in issue ## Example Scenarios - User requests additional fields after review - User corrects values in existing PR - Validation errors discovered after PR creation - New source documentation provided ## CLAUDE.md Enhancements - Add explicit instruction to READ workflow files FIRST - List all three task types: create, edit, modify PR - Provide quick reference workflow - Add critical notes about validation and communication - Reference both workflow files contain PR modification sections This ensures the D4D Assistant can iterate on PRs based on user feedback instead of always creating new PRs. 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent 49afe36 commit f1d611f

File tree

6 files changed

+1498
-2
lines changed

6 files changed

+1498
-2
lines changed

.claude/settings.json

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,12 @@
1414
"Bash(poetry:*)",
1515
"Bash(make:*)",
1616
"Bash(python:*)",
17-
"Bash(uv:*)"
17+
"Bash(uv:*)",
18+
"mcp__github__*",
19+
"mcp__artl__*",
20+
"WebSearch",
21+
"WebFetch"
1822
]
19-
}
23+
},
24+
"enableAllProjectMcpServers": true
2025
}

.github/workflows/README.md

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# D4D Assistant Setup
2+
3+
This directory contains instruction files for the D4D Assistant, which helps create and edit Datasheets for Datasets (D4D) via GitHub Actions.
4+
5+
## Instruction Files
6+
7+
- **`d4d_assistant_create.md`** - Instructions for creating new D4D datasheets
8+
- **`d4d_assistant_edit.md`** - Instructions for editing existing D4D datasheets
9+
10+
These files guide the D4D Assistant through workflows for metadata extraction, YAML generation, validation, and pull request creation.
11+
12+
## MCP Server Configuration
13+
14+
The D4D Assistant requires Model Context Protocol (MCP) servers to be configured. These are already set up in the repository:
15+
16+
### Configured MCP Servers
17+
18+
**`.mcp.json`** (at repository root):
19+
```json
20+
{
21+
"mcpServers": {
22+
"github": {
23+
"type": "http",
24+
"url": "https://api.githubcopilot.com/mcp/"
25+
},
26+
"artl": {
27+
"command": "uvx",
28+
"args": ["artl-mcp"]
29+
}
30+
}
31+
}
32+
```
33+
34+
**`.claude/settings.json`**:
35+
- Enables project MCP servers with `enableAllProjectMcpServers: true`
36+
- Allows MCP tool usage with permissions for `mcp__github__*` and `mcp__artl__*`
37+
- Includes permissions for `WebSearch` and `WebFetch` tools
38+
39+
### What Each MCP Server Does
40+
41+
1. **GitHub MCP** (`mcp__github__*`)
42+
- Create branches, commits, and pull requests
43+
- Comment on issues and PRs
44+
- Read repository files and structure
45+
- Manage labels and milestones
46+
47+
2. **ARTL MCP** (`mcp__artl__*`)
48+
- Search and retrieve academic literature about datasets
49+
- Find papers by DOI, PMID, or PMCID
50+
- Extract metadata from academic publications
51+
- Useful for finding dataset documentation in scholarly articles
52+
53+
3. **WebSearch** (built-in)
54+
- Search the web for dataset documentation
55+
- Find official dataset pages and documentation
56+
- Discover related resources
57+
58+
4. **WebFetch** (built-in)
59+
- Fetch content from URLs
60+
- Download and extract text from PDFs
61+
- Access API documentation
62+
63+
## First-Time Setup
64+
65+
When running the D4D Assistant for the first time:
66+
67+
1. **Approve Project MCP Servers**
68+
- Claude Code will prompt to approve MCP servers from `.mcp.json`
69+
- Accept the project MCP servers (GitHub and ARTL)
70+
71+
2. **Authenticate GitHub MCP (if needed)**
72+
- If GitHub MCP requires OAuth, use `/mcp` command
73+
- Follow browser prompts to authenticate
74+
- Authentication is stored securely and refreshed automatically
75+
76+
3. **Install ARTL MCP**
77+
- The ARTL MCP server uses `uvx artl-mcp`
78+
- Ensure `uvx` (uv tool runner) is available on your system
79+
- Install with: `pip install uv` or follow [uv installation guide](https://docs.astral.sh/uv/)
80+
81+
## Troubleshooting
82+
83+
### MCP Server Not Starting
84+
85+
If you see errors like "Connection closed" or "MCP server failed to start":
86+
87+
1. **Check uvx availability**:
88+
```bash
89+
which uvx
90+
uvx --version
91+
```
92+
93+
2. **Test ARTL MCP manually**:
94+
```bash
95+
uvx artl-mcp
96+
```
97+
98+
3. **Reset MCP configuration**:
99+
```bash
100+
claude mcp reset-project-choices
101+
```
102+
103+
### Authentication Issues
104+
105+
If GitHub MCP fails to authenticate:
106+
107+
1. Use `/mcp` command in Claude Code
108+
2. Select "Authenticate" for GitHub
109+
3. Complete OAuth flow in browser
110+
4. Use "Clear authentication" if you need to re-authenticate
111+
112+
### Permission Denied
113+
114+
If you see permission errors:
115+
116+
1. Check `.claude/settings.json` has MCP permissions enabled
117+
2. Ensure `enableAllProjectMcpServers: true` is set
118+
3. Restart Claude Code to apply configuration changes
119+
120+
## Manual MCP Management
121+
122+
You can also manage MCP servers via CLI:
123+
124+
```bash
125+
# List configured servers
126+
claude mcp list
127+
128+
# Get details for a specific server
129+
claude mcp get github
130+
131+
# Add a new server (example)
132+
claude mcp add --transport http notion https://mcp.notion.com/mcp
133+
134+
# Remove a server
135+
claude mcp remove server-name
136+
```
137+
138+
For more information, see the [Claude Code MCP documentation](https://code.claude.com/docs/en/mcp).
139+
140+
## D4D Assistant Capabilities
141+
142+
With these MCP servers configured, the D4D Assistant can:
143+
144+
- ✅ Create new D4D datasheets from URLs and documentation
145+
- ✅ Edit existing D4D YAML files based on user requests
146+
- ✅ Search academic literature for dataset papers
147+
- ✅ Fetch content from web pages and PDFs
148+
- ✅ Create pull requests with changes
149+
- ✅ Comment on GitHub issues with status updates
150+
- ✅ Validate YAML against the D4D schema
151+
- ✅ Generate HTML previews of datasheets
152+
153+
## Security Note
154+
155+
MCP servers execute code and access external services. The configured servers are:
156+
157+
- **GitHub MCP**: Official Anthropic-maintained server for GitHub operations
158+
- **ARTL MCP**: Academic literature search tool (`uvx artl-mcp`)
159+
- **WebSearch/WebFetch**: Built-in Claude Code tools
160+
161+
These are trusted sources, but always review MCP server configurations before approving them.

0 commit comments

Comments
 (0)