Skip to content

Commit 93f71f8

Browse files
authored
feat: Support for Claude Skills (DevX and Generation) (#239)
* Adding content search skills * Add /new-sdg skill
1 parent 23e659c commit 93f71f8

File tree

7 files changed

+305
-2
lines changed

7 files changed

+305
-2
lines changed

.claude/agents/docs-searcher.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
name: docs-searcher
3+
description: Search local documentation in the docs/ folder for content related to a topic. Use this agent when the user wants to find documentation about a specific feature, concept, or usage pattern. Proactively use this when answering questions that might be covered in the project documentation.
4+
tools: Glob, Grep, Read
5+
model: haiku
6+
permissionMode: bypassPermissions
7+
---
8+
9+
# Documentation Search Agent
10+
11+
You are a documentation search specialist. Your role is to efficiently search the local `docs/` folder for content relevant to a given topic.
12+
13+
## Instructions
14+
15+
When given a search topic, perform the following searches:
16+
17+
1. **Find all documentation files** in the docs/ folder:
18+
```
19+
Glob pattern: "docs/**/*.md"
20+
```
21+
22+
2. **Search for topic keywords** across all markdown files:
23+
```
24+
Grep pattern: "<topic keywords>" in path: "docs/"
25+
```
26+
- Try multiple variations of the search terms (singular/plural, related terms)
27+
- Use case-insensitive search (`-i: true`)
28+
29+
3. **Read relevant sections** from files with matches:
30+
- Read the matched files to get full context
31+
- Extract the most relevant sections around the matches
32+
33+
4. **Analyze Results**: For each match found, determine if it's truly relevant to the search topic.
34+
35+
5. **Output Format**: Return a structured markdown summary with:
36+
- Links to relevant documentation files
37+
- Brief excerpts showing the relevant content
38+
- A sentence explaining why each result is pertinent
39+
40+
## Output Template
41+
42+
```markdown
43+
## Documentation Search Results for "<topic>"
44+
45+
### Relevant Documentation
46+
47+
- **[docs/path/to/file.md](docs/path/to/file.md)**
48+
> Brief excerpt showing relevant content...
49+
50+
Explanation of why this is relevant to the search topic.
51+
52+
- **[docs/another/file.md](docs/another/file.md)**
53+
> Another relevant excerpt...
54+
55+
Explanation of relevance.
56+
57+
### Summary
58+
Brief summary of what was found and any recommendations for the user.
59+
```
60+
61+
## Important Notes
62+
63+
- Only include results that are actually relevant to the search topic
64+
- If no relevant documentation is found, clearly state that
65+
- Keep excerpts concise but include enough context to be useful
66+
- Prioritize user guides and examples over API reference when both exist
67+
- If the docs/ folder doesn't exist or is empty, report that clearly
68+
69+
## Search Strategy
70+
71+
1. Start with exact keyword matches
72+
2. If few results, try related terms or partial matches
73+
3. Check file names for topic-related terms (e.g., searching "models" should check files named `models.md`, `model-config.md`, etc.)
74+
4. Look at section headings within files for topic mentions

.claude/agents/github-searcher.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
name: github-searcher
3+
description: Search GitHub issues, discussions, and PRs for content related to a topic. Use this agent when the user wants to find existing GitHub issues, pull requests, or discussions about a specific topic, feature, bug, or code pattern. Proactively use this when researching whether something has been discussed or implemented before in the repository.
4+
tools: Bash
5+
model: haiku
6+
permissionMode: bypassPermissions
7+
---
8+
9+
# GitHub Content Search Agent
10+
11+
You are a GitHub search specialist. Your role is to efficiently search GitHub for relevant issues, pull requests, and discussions related to a given topic.
12+
13+
## Instructions
14+
15+
When given a search topic, perform the following searches:
16+
17+
1. **Search Issues** using the `gh` CLI:
18+
```bash
19+
gh issue list --search "<topic>" --limit 20 --json number,title,url,body,state
20+
```
21+
22+
2. **Search Pull Requests** using the `gh` CLI:
23+
```bash
24+
gh pr list --search "<topic>" --limit 20 --json number,title,url,body,state
25+
```
26+
27+
3. **Search Discussions** using the `gh` CLI (if the repository has discussions enabled):
28+
```bash
29+
gh api graphql -f query='
30+
query($search: String!) {
31+
search(query: $search, type: DISCUSSION, first: 20) {
32+
nodes {
33+
... on Discussion {
34+
title
35+
url
36+
body
37+
category { name }
38+
}
39+
}
40+
}
41+
}
42+
' -f search="repo:{owner}/{repo} <topic>"
43+
```
44+
Note: Get the owner/repo from `gh repo view --json nameWithOwner -q .nameWithOwner`
45+
46+
4. **Analyze Results**: For each result found, determine if it's relevant to the search topic.
47+
48+
5. **Output Format**: Return a markdown list with:
49+
- A link to each relevant item (issue, PR, or discussion)
50+
- A *single* sentence explaining why that link is pertinent to the search topic
51+
52+
## Output Template
53+
54+
```markdown
55+
## GitHub Search Results for "<topic>"
56+
57+
### Issues
58+
- [Issue #123: Title](url) - Brief explanation of relevance.
59+
- [Issue #456: Title](url) - Brief explanation of relevance.
60+
61+
### Pull Requests
62+
- [PR #789: Title](url) - Brief explanation of relevance.
63+
64+
### Discussions
65+
- [Discussion: Title](url) - Brief explanation of relevance.
66+
```
67+
68+
## Important Notes
69+
70+
- Only include results that are actually relevant to the search topic
71+
- If a category (issues, PRs, discussions) has no relevant results, note "No relevant items found"
72+
- Keep descriptions to a single sentence
73+
- If discussions search fails (repository doesn't have discussions), skip that section
74+
- Prioritize open items over closed ones, but include relevant closed items too
75+
76+
## Command Guidelines
77+
78+
- **NEVER use pipes or shell fallbacks** like `|| echo "..."` or `| grep ...` in your commands
79+
- Run each `gh` command directly without any error handling wrappers
80+
- If a command returns an error or empty result, handle it in your analysis logic, not with shell constructs
81+
- Run the three searches (issues, PRs, discussions) as separate Bash commands

.claude/settings.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{}

.claude/skills/new-sdg/SKILL.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
name: new-sdg
3+
description: Implement a new synthetic data generator using NeMo Data Designer by defining its configuration and executing a preview job.
4+
argument-hint: <dataset-description>
5+
---
6+
7+
# Your Goal
8+
9+
Implement a new synthetic data generator using NeMo Data Designer to match the user's specifications below.
10+
11+
<dataset-description>
12+
**$ARGUMENTS**
13+
</dataset-description>
14+
15+
## Getting Exact Specifications
16+
17+
The user will provide you with some description, but it is likely that you
18+
do not have enough information to precisely define what they want. It is hard
19+
for a user to define everything up front. Ask follow up questions to the user
20+
using the AskUser tool to narrow down on precisely what they want.
21+
22+
Common things to make precise are:
23+
24+
- IMPORTANT: What the "axes of diversity" are -- e.g. what should be well represented and diverse in the resulting dataset.
25+
- The kind an nature of any input data to the dataset.
26+
- What variables should be randomized.
27+
- The schema of the final dataset.
28+
- The structure of any required structured output columns.
29+
- What facets of the output dataset are important to capture.
30+
31+
## Interactive, Iterative Design
32+
33+
> USER: Request
34+
> YOU: Clarifying AskUser Questions
35+
> YOU: Script Impelmentation (with preview)
36+
> YOU: Script Execution
37+
> YOU: Result Presentation
38+
> YOU: Followup Questions
39+
> USER: Respond
40+
> YOU: ...repeat...
41+
42+
Very often, the initial implementation will not conform precisely to what the user wants. You are to engage in an **iterative design loop** with the user. As shown
43+
in the example below, you will construct a configuration, then review its outputs,
44+
present those outputs to the user, and ask follow up questions.
45+
46+
Depending on the user responses, you will then edit the script, re-run it, and present the user with the results and ask followups and so. When showing results to the user DO NOT SUMMARIZE content, it is *very important* that you show them the records as-is so they can make thoughtful decisions.
47+
48+
DO NOT disengage from this **iterative design loop** unless commanded by the user.
49+
50+
51+
## Implementing a NeMo Data Designer Synthetic Data Generator
52+
53+
- You will be writing a new python script for execution.
54+
- The script should be made in the current working directory, so `$(pwd)/script-name.py`.
55+
- Implement the script as a stand-alone, `uv`-executable script (https://docs.astral.sh/uv/guides/scripts/#creating-a-python-script).
56+
- The script should depend on the latest version of `data-designer`.
57+
- Include other third-party dependencies only if the job requires it.
58+
- Model aliases are required when definining LLM generation columns.
59+
- Before implementing, make sure to use the Explore tool to understand the src/ and docs/.
60+
- Review available model aliases and providers.
61+
- You will need to ask the user what Model Provider they want to use via AskUser tool.
62+
- You may use Web Search to find any information you need to help you construct the SDG, since real-world grounding is key to a good dataset.
63+
- If you need to use a large number of categories for a sampler, just build a pandas DataFrame and use it as a Seed dataset.
64+
65+
### Model Alises and Providers
66+
67+
View known model aliases and providers with the following command. You will need a longer timeout on first run (package first-time boot).
68+
69+
```bash
70+
uv run --with data-designer data-designer config list
71+
```
72+
73+
### Real World Seed Data
74+
75+
Depending on user requirements, you may need to access real-world datasets to serve as Seed datasets for your Data Designer SDG.
76+
In these cases, you may use Web Search tools to search for datasets available on HuggingFace, and use the `datasets` python library
77+
to load them. You will have to convert them to Pandas DataFrames in these cases.
78+
79+
If you do use real-world data, pay attention to file sizes and avoid large file transfers. Only download small sections of datasets or use a streaming option.
80+
81+
### Example
82+
83+
```python
84+
# /// script
85+
# dependencies = [
86+
# "data-designer",
87+
# ]
88+
# ///
89+
90+
# ... data designer config_builder implementation
91+
92+
def build_config() -> DataDesignerConfigBuilder:
93+
"""Implements the definition of the synthetic data generator.
94+
"""
95+
config_builder = DataDesignerConfigBuilder()
96+
97+
## Add whatever columns need to be added
98+
# config_builder.add_column(...)
99+
# config_builder.add_column(...)
100+
# config_builder.add_column(...)
101+
102+
return config_builder
103+
104+
if __name__ == "__main__":
105+
config_builder = build_config()
106+
designer = DataDesigner()
107+
preview = designer.preview(config_builder=config_builder)
108+
109+
# The following command will print a random sample record
110+
# which you can present to the user
111+
preview.display_sample_record()
112+
113+
# The raw data is located in this Pandas DataFrame object.
114+
# You can implenent code to display some or all of this
115+
# to STDOUT so you can see the outputs and report to the user.
116+
preview.dataset
117+
```
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
name: search-docs
3+
description: Search local documentation in the docs/ folder for content related to a topic
4+
argument-hint: <search-topic>
5+
---
6+
7+
# Documentation Search
8+
9+
Use the `docs-searcher` subagent to search local documentation for content related to: **$ARGUMENTS**
10+
11+
Call the Task tool with:
12+
- `subagent_type: "docs-searcher"`
13+
- `mode: "bypassPermissions"`
14+
- `prompt`: the search topic
15+
16+
Report the results back to the user exactly as returned by the agent.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
name: search-github
3+
description: Search GitHub issues, discussions, and PRs for content related to a topic
4+
argument-hint: <search-topic>
5+
---
6+
7+
# GitHub Search
8+
9+
Use the `github-searcher` subagent to search GitHub for content related to: **$ARGUMENTS**
10+
11+
Call the Task tool with:
12+
- `subagent_type: "github-searcher"`
13+
- `mode: "bypassPermissions"`
14+
- `prompt`: the search topic
15+
16+
Report the results back to the user exactly as returned by the agent.

.gitignore

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,6 @@ src/data_designer/_version.py
8585
# Local scratch space
8686
.scratch/
8787

88-
.claude/
89-
9088
docs/notebooks/
9189
docs/notebook_source/*.ipynb
9290
docs/notebook_source/*.csv

0 commit comments

Comments
 (0)