Skip to content

Commit 49afe36

Browse files
Add D4D AI Assistant GitHub integration (#55)
* Add D4D AI Assistant GitHub integration Set up @d4dassistant GitHub AI agent for automatic D4D generation. Features: - GitHub Action workflow that triggers on @d4dassistant mentions - Claude Code agent (via dragon-ai-agent/run-goose-obo) - Automated D4D YAML generation from dataset documentation - Schema validation before PR creation - Unique timestamp-based ID generation to avoid conflicts Files added: - .github/ai-controllers.json - Authorized users list - .github/workflows/d4d-agent.yml - GitHub Action workflow - .goosehints - Agent instructions and workflow - .github/D4D_ASSISTANT_README.md - User documentation Generated D4Ds are saved to: html-demos/user_d4ds/ Usage: @d4dassistant Create D4D for https://example.com/dataset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add goose config file * Switch D4D assistant from Goose to Claude Code Replace run-goose-obo action with [email protected] to use Claude Code directly. Configure CBORG API endpoint via .claude/settings.json for consistent model access across environments. Changes: - Update workflow to use dragon-ai-agent/[email protected] - Remove openai-api-key parameter (Claude-only) - Add Claude Code configuration parameters - Create .claude/settings.json with CBORG base URL and model settings - Remove obsolete .config/goose/ directory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent 0982496 commit 49afe36

File tree

6 files changed

+701
-0
lines changed

6 files changed

+701
-0
lines changed

.claude/settings.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"model": "anthropic/claude-sonnet",
3+
"apiKeyHelper": "echo $ANTHROPIC_API_KEY",
4+
"env": {
5+
"ANTHROPIC_BASE_URL": "https://api.cborg.lbl.gov",
6+
"ANTHROPIC_MODEL": "anthropic/claude-sonnet",
7+
"ANTHROPIC_SMALL_FAST_MODEL": "anthropic/claude-haiku",
8+
"DISABLE_NON_ESSENTIAL_MODEL_CALLS": "1"
9+
},
10+
"permissions": {
11+
"allow": [
12+
"Bash(git:*)",
13+
"Bash(gh:*)",
14+
"Bash(poetry:*)",
15+
"Bash(make:*)",
16+
"Bash(python:*)",
17+
"Bash(uv:*)"
18+
]
19+
}
20+
}

.github/D4D_ASSISTANT_README.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# D4D AI Assistant
2+
3+
This repository has an AI assistant (`@d4dassistant`) that can automatically generate D4D (Datasheets for Datasets) YAML files from dataset documentation.
4+
5+
## How to Use
6+
7+
### Request D4D Generation
8+
9+
Authorized users (listed in `.github/ai-controllers.json`) can mention `@d4dassistant` in GitHub issues to request D4D generation:
10+
11+
```markdown
12+
@d4dassistant Please create a D4D for this dataset: https://example.com/dataset-page
13+
14+
Additional context: This dataset contains medical imaging data for cancer research...
15+
```
16+
17+
### What the Assistant Does
18+
19+
1. **Analyzes** your dataset description and any provided URLs
20+
2. **Fetches** documentation from web pages, PDFs, or repositories
21+
3. **Generates** a valid D4D YAML file conforming to the LinkML schema
22+
4. **Validates** the YAML against the schema
23+
5. **Creates** a pull request with the D4D file in `html-demos/user_d4ds/`
24+
6. **Comments** on your issue with a link to the PR
25+
26+
### Example Requests
27+
28+
**With URL:**
29+
```markdown
30+
@d4dassistant Create a D4D for the Bridge2AI VOICE dataset
31+
32+
URL: https://physionet.org/content/b2ai-voice/
33+
This is a voice biomarker dataset for health research.
34+
```
35+
36+
**With description only:**
37+
```markdown
38+
@d4dassistant Generate a D4D for my diabetes study dataset
39+
40+
Dataset name: T2D Longitudinal Study
41+
Description: 5-year longitudinal study of 1000 Type 2 diabetes patients
42+
Format: CSV files with clinical measurements and lab results
43+
License: CC-BY-4.0
44+
```
45+
46+
**With GitHub repository:**
47+
```markdown
48+
@d4dassistant Create D4D from this repo: https://github.com/org/dataset-repo
49+
50+
The README has all the dataset details.
51+
```
52+
53+
## What Information to Provide
54+
55+
The more information you provide, the better the D4D will be. Useful information includes:
56+
57+
- **URLs**: Dataset landing pages, documentation, PDFs, GitHub repos
58+
- **Dataset name**: Short and descriptive
59+
- **Description**: What the dataset contains and why it exists
60+
- **Creators**: Who created/maintains the dataset
61+
- **Size**: Number of instances, file size
62+
- **Format**: CSV, JSON, Parquet, etc.
63+
- **License**: How the data can be used
64+
- **Collection details**: How and when data was gathered
65+
- **Use cases**: What tasks it's intended for
66+
67+
## What Gets Generated
68+
69+
The assistant creates a YAML file following the D4D schema with sections like:
70+
71+
- **Motivation**: Why the dataset was created
72+
- **Composition**: What it contains (instances, splits, etc.)
73+
- **Collection**: How data was gathered
74+
- **Preprocessing**: Data cleaning steps
75+
- **Uses**: Recommended and discouraged applications
76+
- **Distribution**: Access information and licensing
77+
- **Maintenance**: Who maintains it and how to get support
78+
79+
## File Location
80+
81+
Generated D4D files are saved to: `html-demos/user_d4ds/{dataset_name}_d4d.yaml`
82+
83+
Each filename includes a timestamp or unique identifier to avoid conflicts.
84+
85+
## Reviewing the Generated D4D
86+
87+
Once the PR is created:
88+
89+
1. Review the generated YAML file
90+
2. Check that metadata is accurate
91+
3. Request changes if needed (comment on the PR)
92+
4. Merge when satisfied
93+
94+
The assistant can update the D4D based on your feedback - just comment on the PR with your requested changes.
95+
96+
## Authorization
97+
98+
To add users who can invoke the assistant, edit `.github/ai-controllers.json`:
99+
100+
```json
101+
["username1", "username2", "username3"]
102+
```
103+
104+
Only authorized users can trigger the assistant by mentioning `@d4dassistant`.
105+
106+
## Technical Details
107+
108+
- **Agent**: Powered by Claude Code via `dragon-ai-agent/run-goose-obo` GitHub Action
109+
- **Schema**: Uses LinkML schema from `src/data_sheets_schema/schema/`
110+
- **Validation**: Runs `make test-examples` to ensure schema compliance
111+
- **Examples**: References `src/data/examples/valid/` for guidance
112+
113+
## Troubleshooting
114+
115+
**Assistant didn't respond:**
116+
- Check that you're in the authorized users list
117+
- Ensure you mentioned `@d4dassistant` (not `@d4d-assistant` or similar)
118+
- Check GitHub Actions logs for errors
119+
120+
**Generated D4D is incomplete:**
121+
- Provide more information in a follow-up comment
122+
- Share additional URLs or documentation
123+
- The assistant can update the D4D based on new info
124+
125+
**Validation errors:**
126+
- The assistant should fix these automatically
127+
- If the PR has validation errors, comment with details
128+
- The assistant will update the PR
129+
130+
## Support
131+
132+
For issues or questions:
133+
- Open a GitHub issue
134+
- Tag authorized users for assistance
135+
- Check `.goosehints` file for assistant instructions

.github/ai-controllers.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
["justaddcoffee", "monicacecilia", "caufieldjh", "realmarcin", "jniestroy"]

.github/workflows/d4d-agent.yml

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
name: D4D AI Assistant GitHub Mentions
2+
3+
on:
4+
issues:
5+
types: [opened, edited]
6+
issue_comment:
7+
types: [created, edited]
8+
pull_request:
9+
types: [opened, edited]
10+
pull_request_review_comment:
11+
types: [created, edited]
12+
workflow_dispatch:
13+
inputs:
14+
item-type:
15+
description: 'Type of item (issue or pull_request)'
16+
required: true
17+
type: choice
18+
options:
19+
- issue
20+
- pull_request
21+
item-number:
22+
description: 'Issue or PR number'
23+
required: true
24+
type: number
25+
26+
jobs:
27+
check-mention:
28+
runs-on: ubuntu-latest
29+
outputs:
30+
qualified-mention: ${{ steps.detect.outputs.qualified-mention }}
31+
prompt: ${{ steps.detect.outputs.prompt }}
32+
user: ${{ steps.detect.outputs.user }}
33+
item-type: ${{ steps.detect.outputs.item-type }}
34+
item-number: ${{ steps.detect.outputs.item-number }}
35+
controllers: ${{ steps.detect.outputs.controllers }}
36+
steps:
37+
- name: Checkout repository
38+
uses: actions/checkout@v4
39+
40+
- name: Detect AI mention
41+
id: detect
42+
uses: actions/github-script@v7
43+
with:
44+
github-token: ${{ secrets.PAT_FOR_PR }}
45+
script: |
46+
// Load allowed users from config
47+
const fs = require('fs');
48+
let allowedUsers = [];
49+
try {
50+
const configContent = fs.readFileSync('.github/ai-controllers.json', 'utf8');
51+
allowedUsers = JSON.parse(configContent);
52+
} catch (error) {
53+
console.log('Error loading allowed users:', error);
54+
// Use fallback controllers if provided
55+
const fallback = 'jtr4v';
56+
allowedUsers = fallback ? fallback.split(',').map(u => u.trim()) : [];
57+
}
58+
59+
// Get content and user from event payload
60+
let content = '';
61+
let userLogin = '';
62+
let itemType = '';
63+
let itemNumber = 0;
64+
65+
if (context.eventName === 'workflow_dispatch') {
66+
// Manual trigger - fetch the issue/PR from GitHub API
67+
itemType = context.payload.inputs['item-type'];
68+
itemNumber = parseInt(context.payload.inputs['item-number']);
69+
userLogin = context.actor; // Use the person who triggered the workflow
70+
71+
if (itemType === 'issue') {
72+
// First check issue body
73+
const issue = await github.rest.issues.get({
74+
owner: context.repo.owner,
75+
repo: context.repo.repo,
76+
issue_number: itemNumber
77+
});
78+
content = issue.data.body || '';
79+
80+
// If no @d4dassistant in body, check comments
81+
if (!content.includes('@d4dassistant')) {
82+
const comments = await github.rest.issues.listComments({
83+
owner: context.repo.owner,
84+
repo: context.repo.repo,
85+
issue_number: itemNumber
86+
});
87+
// Find the most recent comment with @d4dassistant
88+
for (let i = comments.data.length - 1; i >= 0; i--) {
89+
if (comments.data[i].body && comments.data[i].body.includes('@d4dassistant')) {
90+
content = comments.data[i].body;
91+
break;
92+
}
93+
}
94+
}
95+
} else if (itemType === 'pull_request') {
96+
const pr = await github.rest.pulls.get({
97+
owner: context.repo.owner,
98+
repo: context.repo.repo,
99+
pull_number: itemNumber
100+
});
101+
content = pr.data.body || '';
102+
103+
// If no @d4dassistant in body, check comments
104+
if (!content.includes('@d4dassistant')) {
105+
const comments = await github.rest.issues.listComments({
106+
owner: context.repo.owner,
107+
repo: context.repo.repo,
108+
issue_number: itemNumber
109+
});
110+
// Find the most recent comment with @d4dassistant
111+
for (let i = comments.data.length - 1; i >= 0; i--) {
112+
if (comments.data[i].body && comments.data[i].body.includes('@d4dassistant')) {
113+
content = comments.data[i].body;
114+
break;
115+
}
116+
}
117+
}
118+
}
119+
} else if (context.eventName === 'issues') {
120+
content = context.payload.issue.body || '';
121+
userLogin = context.payload.issue.user.login;
122+
itemType = 'issue';
123+
itemNumber = context.payload.issue.number;
124+
} else if (context.eventName === 'pull_request') {
125+
content = context.payload.pull_request.body || '';
126+
userLogin = context.payload.pull_request.user.login;
127+
itemType = 'pull_request';
128+
itemNumber = context.payload.pull_request.number;
129+
} else if (context.eventName === 'issue_comment') {
130+
content = context.payload.comment.body || '';
131+
userLogin = context.payload.comment.user.login;
132+
itemType = 'issue';
133+
itemNumber = context.payload.issue.number;
134+
} else if (context.eventName === 'pull_request_review_comment') {
135+
content = context.payload.comment.body || '';
136+
userLogin = context.payload.comment.user.login;
137+
itemType = 'pull_request';
138+
itemNumber = context.payload.pull_request.number;
139+
}
140+
141+
// Check if user is allowed and mention exists
142+
const isAllowed = allowedUsers.includes(userLogin);
143+
const mentionRegex = new RegExp('@d4dassistant\\s+(.*)', 'i');
144+
const mentionMatch = content.match(mentionRegex);
145+
146+
const qualifiedMention = isAllowed && mentionMatch !== null;
147+
const prompt = qualifiedMention ? mentionMatch[1].trim() : '';
148+
149+
console.log(`User: ${userLogin}, Allowed: ${isAllowed}, Has mention: ${mentionMatch !== null}, Content: "${content}"`);
150+
151+
// Set outputs
152+
core.setOutput('qualified-mention', qualifiedMention);
153+
core.setOutput('prompt', prompt);
154+
core.setOutput('user', userLogin);
155+
core.setOutput('item-type', itemType);
156+
core.setOutput('item-number', itemNumber);
157+
core.setOutput('controllers', allowedUsers.map(u => '@' + u).join(', '));
158+
159+
return {
160+
qualifiedMention,
161+
itemType,
162+
itemNumber,
163+
prompt,
164+
user: userLogin,
165+
controllers: allowedUsers.map(u => '@' + u).join(', ')
166+
};
167+
168+
respond-to-mention:
169+
needs: check-mention
170+
if: needs.check-mention.outputs.qualified-mention == 'true'
171+
permissions:
172+
contents: write
173+
pull-requests: write
174+
issues: write
175+
runs-on: ubuntu-latest
176+
steps:
177+
- name: Checkout repository
178+
uses: actions/checkout@v4
179+
with:
180+
fetch-depth: 0
181+
token: ${{ secrets.PAT_FOR_PR }}
182+
183+
- name: Respond with AI Agent
184+
uses: dragon-ai-agent/[email protected]
185+
with:
186+
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
187+
github-token: ${{ secrets.PAT_FOR_PR }}
188+
prompt: ${{ needs.check-mention.outputs.prompt }}
189+
user: ${{ needs.check-mention.outputs.user }}
190+
item-type: ${{ needs.check-mention.outputs.item-type }}
191+
item-number: ${{ needs.check-mention.outputs.item-number }}
192+
controllers: ${{ needs.check-mention.outputs.controllers }}
193+
agent-name: 'd4dassistant'
194+
branch-prefix: 'd4dassistant'
195+
robot-version: 'v1.9.7'
196+
enable-robot: 'true'
197+
enable-obo-scripts: 'true'
198+
enable-python-tools: 'true'
199+
python-packages: 'aurelian jinja2-cli "wrapt>=1.17.2"'
200+
claude-allowed-tools: '["Bash(git:*)", "Bash(gh:*)", "FileSystem(*)"]'

0 commit comments

Comments
 (0)