Skip to content

Commit 7938498

Browse files
committed
refactor: simplify model validation to use Claude AI
Major simplification of CI/CD: - Remove complex Python model validation scripts (400+ lines) - Let Claude handle model validation intelligently via GitHub Actions - Claude fetches latest models from docs.anthropic.com/en/docs/about-claude/models/overview.md - Add comprehensive notebook validation script for local testing - Interactive dashboard with progress tracking - Auto-fix for deprecated models - GitHub issue export format - Idempotent with state persistence - Simplify CI to use single Python version (3.11) - Update workflows to use Claude for all intelligent validation Benefits: - No more hardcoded model lists to maintain - Claude understands context (e.g., educational examples) - 50% faster CI (removed matrix strategy) - Single source of truth for models (docs site)
1 parent 27cb34c commit 7938498

File tree

9 files changed

+818
-288
lines changed

9 files changed

+818
-288
lines changed

.github/workflows/claude-model-check.yml

Lines changed: 9 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -21,40 +21,23 @@ jobs:
2121
with:
2222
fetch-depth: 0
2323

24-
- name: Install uv
25-
uses: astral-sh/setup-uv@v4
26-
27-
- name: Setup Python
28-
run: uv python install 3.11
29-
30-
- name: Install dependencies
31-
run: uv sync
32-
33-
- name: Check models with script
34-
id: model_check
35-
run: |
36-
uv run python scripts/check_models.py --github-output || true
37-
38-
# Only run Claude validation for repo members (API costs)
3924
- name: Claude Model Validation
40-
if: |
41-
github.event.pull_request.author_association == 'MEMBER' ||
42-
github.event.pull_request.author_association == 'OWNER'
4325
uses: anthropics/claude-code-action@beta
4426
with:
4527
use_sticky_comment: true
4628
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
4729
github_token: ${{ secrets.GITHUB_TOKEN }}
4830
timeout_minutes: "5"
4931
direct_prompt: |
50-
Review the changed files for Claude model usage.
32+
Review the changed files for Claude model usage.
5133
52-
Check the latest models at: https://docs.anthropic.com/en/docs/about-claude/models/overview.md
34+
First, fetch the current list of allowed models from:
35+
https://docs.anthropic.com/en/docs/about-claude/models/overview.md
5336
54-
Please check for:
55-
1. Any internal/non-public model names
56-
2. Usage of deprecated models (older Sonnet 3.5 and Opus 3 models)
57-
3. Recommend using aliases for better maintainability
58-
4. For testing examples, suggest claude-3-5-haiku-latest (fastest/cheapest)
37+
Then check:
38+
1. All model references are from the current public models list
39+
2. Flag any deprecated models (older Sonnet 3.5, Opus 3 versions)
40+
3. Flag any internal/non-public model names
41+
4. Suggest using aliases ending in -latest for better maintainability
5942
60-
Format as actionable feedback.
43+
Provide clear, actionable feedback on any issues found.

.github/workflows/claude-notebook-review.yml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,12 @@ jobs:
3434
Review the changes to Jupyter notebooks and Python scripts in this PR. Please check for:
3535
3636
## Model Usage
37-
Check that all Claude model references use current, public models:
38-
- claude-3-5-haiku-latest (recommended for testing)
39-
- claude-3-5-sonnet-latest (for complex tasks)
40-
- Avoid deprecated models like claude-3-haiku-20240307, old Sonnet 3.5 versions
37+
Verify all Claude model references against the current list at:
38+
https://docs.anthropic.com/en/docs/about-claude/models/overview.md
39+
- Flag any deprecated models (older Sonnet 3.5, Opus 3 versions)
40+
- Flag any internal/non-public model names
41+
- Suggest current alternatives when issues found
42+
- Recommend aliases ending in -latest for stability
4143
4244
## Code Quality
4345
- Python code follows PEP 8 conventions

.github/workflows/notebook-quality.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,6 @@ jobs:
4444
run: |
4545
uv run python scripts/validate_notebooks.py
4646
47-
- name: Check model usage
48-
run: |
49-
uv run python scripts/check_models.py
50-
5147
# Only run API tests on main branch or for maintainers (costs money)
5248
- name: Execute notebooks (API Testing)
5349
if: |

.gitignore

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,4 +144,9 @@ examples/fine-tuned_qa/local_cache/*
144144
test_outputs/
145145
.ruff_cache/
146146
lychee-report.md
147-
.lycheecache
147+
.lycheecache
148+
149+
# Notebook validation
150+
.notebook_validation_state.json
151+
.notebook_validation_checkpoint.json
152+
validation_report_*.md

.pre-commit-config.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,6 @@ repos:
1010

1111
- repo: local
1212
hooks:
13-
- id: check-models
14-
name: Check Claude model usage
15-
entry: python scripts/check_models.py
16-
language: python
17-
files: '\.ipynb$'
18-
pass_filenames: false
19-
2013
- id: validate-notebooks
2114
name: Validate notebook structure
2215
entry: python scripts/validate_notebooks.py

CONTRIBUTING.md

Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,9 @@ This repository uses automated tools to maintain code quality:
5454

5555
### The Notebook Validation Stack
5656

57-
- **[papermill](https://papermill.readthedocs.io/)**: Parameterized notebook execution for testing
57+
- **[nbconvert](https://nbconvert.readthedocs.io/)**: Notebook execution for testing
5858
- **[ruff](https://docs.astral.sh/ruff/)**: Fast Python linter and formatter with native Jupyter support
59+
- **Claude AI Review**: Intelligent code review using Claude
5960

6061
**Note**: Notebook outputs are intentionally kept in this repository as they demonstrate expected results for users.
6162

@@ -67,26 +68,22 @@ This repository uses automated tools to maintain code quality:
6768
uv run ruff format skills/
6869

6970
uv run python scripts/validate_notebooks.py
70-
uv run python scripts/check_models.py
7171
```
7272

7373
3. **Test notebook execution** (optional, requires API key):
7474
```bash
75-
uv run papermill skills/classification/guide.ipynb test.ipynb \
76-
-p model "claude-3-5-haiku-latest" \
77-
-p test_mode true \
78-
-p max_tokens 10
75+
uv run jupyter nbconvert --to notebook \
76+
--execute skills/classification/guide.ipynb \
77+
--ExecutePreprocessor.kernel_name=python3 \
78+
--output test_output.ipynb
7979
```
8080

8181
### Pre-commit Hooks
8282

8383
Pre-commit hooks will automatically run before each commit to ensure code quality:
8484

85-
- Strip notebook outputs
8685
- Format code with ruff
8786
- Validate notebook structure
88-
- Check for hardcoded API keys
89-
- Validate Claude model usage
9087

9188
If a hook fails, fix the issues and try committing again.
9289

@@ -101,9 +98,9 @@ If a hook fails, fix the issues and try committing again.
10198
```
10299

103100
2. **Use current Claude models**:
104-
- For examples: `claude-3-5-haiku-latest` (fast and cheap)
105-
- For powerful tasks: `claude-opus-4-1`
106-
- Check allowed models in `scripts/allowed_models.py`
101+
- Use model aliases (e.g., `claude-3-5-haiku-latest`) for better maintainability
102+
- Check current models at: https://docs.anthropic.com/en/docs/about-claude/models/overview
103+
- Claude will automatically validate model usage in PR reviews
107104

108105
3. **Keep notebooks focused**:
109106
- One concept per notebook
@@ -175,9 +172,6 @@ Run the validation suite:
175172
# Check all notebooks
176173
uv run python scripts/validate_notebooks.py
177174

178-
# Check model usage
179-
uv run python scripts/check_models.py
180-
181175
# Run pre-commit on all files
182176
uv run pre-commit run --all-files
183177
```
@@ -187,11 +181,10 @@ uv run pre-commit run --all-files
187181
Our GitHub Actions workflows will automatically:
188182

189183
- Validate notebook structure
190-
- Check for hardcoded secrets
191184
- Lint code with ruff
192185
- Test notebook execution (for maintainers)
193186
- Check links
194-
- Validate Claude model usage
187+
- Claude reviews code and model usage
195188

196189
External contributors will have limited API testing to conserve resources.
197190

scripts/allowed_models.py

Lines changed: 0 additions & 114 deletions
This file was deleted.

0 commit comments

Comments
 (0)