Skip to content

Conversation

@justaddcoffee
Copy link
Collaborator

@justaddcoffee justaddcoffee commented Nov 6, 2025

Summary

This PR establishes comprehensive workflows and configuration for the D4D Assistant, enabling automated creation and editing of Datasheets for Datasets (D4D) via GitHub Actions.

What's Included

1. D4D Assistant Instruction Files

New Files:

  • .github/workflows/d4d_assistant_create.md - Complete workflow for creating new D4D datasheets
  • .github/workflows/d4d_assistant_edit.md - Complete workflow for editing existing datasheets
  • .github/workflows/README.md - MCP server setup and troubleshooting guide

Key Features:

  • Step-by-step workflows for metadata extraction and YAML generation
  • Comprehensive validation instructions with common error fixes
  • Pull request creation and modification procedures
  • User communication templates for issues and PRs
  • Scope limitations (D4D tasks only) with polite redirect templates
  • Error handling guidance for various scenarios

2. MCP Server Configuration

New Files:

  • .mcp.json - Project-scoped MCP server configuration

Updated Files:

  • .claude/settings.json - Enabled project MCP servers and permissions

Configured MCP Tools:

  • GitHub MCP (HTTP): Repository operations, PR/issue management
  • ARTL MCP (stdio): Academic literature search and retrieval
  • WebSearch: Find dataset documentation on the web
  • WebFetch: Retrieve content from URLs and PDFs

3. Enhanced CLAUDE.md

Updated Section: "D4D Assistant Instructions (GitHub Actions)"

Changes:

  • Explicit instruction to READ workflow files FIRST
  • Quick reference workflow (8-step process)
  • Critical notes about validation requirements
  • Links to all instruction files
  • Guidance on modifying existing PRs

D4D Assistant Capabilities

With these workflows, the D4D Assistant can:

Create new datasheets from dataset documentation URLs
Edit existing datasheets based on user requests
Modify existing PRs with additional changes
Validate YAML against D4D schema with detailed error handling
Search academic literature for dataset papers (ARTL MCP)
Fetch web content from documentation sources
Create pull requests with comprehensive descriptions
Notify users in GitHub issues with PR links and instructions
Handle scope limitations by redirecting non-D4D questions

Workflow Examples

Creating a New Datasheet

  1. User opens issue: @d4dassistant create a datasheet for XYZ dataset (with URLs)
  2. Assistant reads .github/workflows/d4d_assistant_create.md
  3. Fetches content from URLs using WebFetch/WebSearch
  4. Searches academic literature with ARTL MCP if needed
  5. Extracts metadata and generates YAML conforming to D4D schema
  6. Validates YAML (MUST pass before proceeding)
  7. Generates HTML preview for reviewers
  8. Creates PR with detailed description
  9. Comments on issue with PR link and next steps

Editing an Existing Datasheet

  1. User requests: @d4dassistant add instance_count field to dataset_xyz.yaml
  2. Assistant reads .github/workflows/d4d_assistant_edit.md
  3. Locates existing YAML file
  4. Makes requested changes
  5. Validates updated YAML
  6. Regenerates HTML preview
  7. Creates PR with before/after comparison
  8. Comments on issue with PR link

Modifying an Existing PR

  1. User comments on PR: "Can you also add the preprocessing section?"
  2. Assistant identifies this as a PR modification request
  3. Checks out existing PR branch with gh pr checkout <number>
  4. Makes additional changes
  5. Validates again
  6. Commits and pushes to existing PR branch
  7. Comments on PR listing new changes
  8. Optionally updates issue if changes are substantial

Validation Requirements

All workflows enforce strict validation:

  • YAML MUST validate against schema before PR creation
  • Common validation errors documented with fixes
  • Step-by-step debugging guidance provided
  • Alternative validation methods listed
  • NO PRs with invalid YAML

User Communication

The D4D Assistant keeps users informed:

  • Creates detailed PR descriptions with review instructions
  • Comments on issues with PR links and status updates
  • Updates PRs when changes are made
  • Provides clear next steps for reviewers
  • Links back to original issues for context

MCP Setup

First-time users need to:

  1. Approve project MCP servers when prompted
  2. Authenticate GitHub MCP if needed (OAuth via /mcp command)
  3. Ensure uvx is available for ARTL MCP

See .github/workflows/README.md for detailed setup and troubleshooting.

Files Changed

New Files (5)

  • .mcp.json
  • .github/workflows/d4d_assistant_create.md
  • .github/workflows/d4d_assistant_edit.md
  • .github/workflows/README.md

Modified Files (2)

  • .claude/settings.json
  • CLAUDE.md

Testing

To test these workflows:

  1. Merge this PR
  2. Open a test issue requesting a D4D datasheet
  3. Mention the D4D Assistant with dataset URLs
  4. Verify the assistant follows the documented workflow
  5. Check PR quality and user notifications

Related Issues

This PR builds upon the D4D AI Assistant GitHub integration (#55) by providing comprehensive workflows and tool access.


🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

justaddcoffee and others added 3 commits November 6, 2025 14:01
- Add dedicated instruction files for D4D Assistant in .github/workflows/
- d4d_assistant_create.md: Complete workflow for creating new datasheets
- d4d_assistant_edit.md: Complete workflow for editing existing datasheets
- Update CLAUDE.md to reference these instruction files

Both instruction files include:
- Step-by-step processes for metadata extraction and datasheet generation
- PR creation workflows with descriptive templates
- User notification templates for GitHub issue comments
- Validation requirements and error handling
- Schema reference and constraint checking

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
…ation guidance

## MCP Server Setup
- Add .mcp.json with GitHub and ARTL MCP server configurations
- Update .claude/settings.json to enable project MCP servers
- Add permissions for mcp__github__*, mcp__artl__*, WebSearch, WebFetch
- Create .github/workflows/README.md with MCP setup and troubleshooting guide

## D4D Assistant Scope
- Add explicit scope limitations (D4D tasks only)
- Provide polite redirect template for non-D4D questions
- Document available MCP tools and their purposes:
  - GitHub MCP: PR/issue management, repository operations
  - ARTL MCP: Academic literature search and retrieval
  - WebSearch: Find dataset documentation
  - WebFetch: Retrieve content from URLs

## Validation Enhancement
- Add comprehensive validation instructions to both workflows
- Document common validation errors with fixes
- Provide step-by-step debugging guidance
- Emphasize validation is required before PR creation
- Include alternative validation methods

## Documentation
- Add setup guide in .github/workflows/README.md
- Document MCP server capabilities and authentication
- Include troubleshooting section for common issues
- Provide security notes about MCP server trust

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
## PR Modification Support
- Add "Modifying an Existing PR" section to both workflow files
- Document when to modify vs. create new PR
- Provide step-by-step workflow for PR updates:
  - Find and checkout existing PR branch
  - Make requested changes
  - Validate and commit updates
  - Comment on PR with changes
  - Optionally notify in issue

## Example Scenarios
- User requests additional fields after review
- User corrects values in existing PR
- Validation errors discovered after PR creation
- New source documentation provided

## CLAUDE.md Enhancements
- Add explicit instruction to READ workflow files FIRST
- List all three task types: create, edit, modify PR
- Provide quick reference workflow
- Add critical notes about validation and communication
- Reference both workflow files contain PR modification sections

This ensures the D4D Assistant can iterate on PRs based on user feedback
instead of always creating new PRs.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@justaddcoffee justaddcoffee merged commit f1d611f into main Nov 6, 2025
7 of 8 checks passed
@justaddcoffee justaddcoffee deleted the d4d-assistant-workflows branch November 6, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant