Skip to content

Commit 9aa24e4

Browse files
PedramNavidclaude
andauthored
feat: add cookbook-audit skill for automated notebook validation (anthropics#242)
* feat: add cookbook-audit skill for automated notebook validation Refactor notebook-review command to delegate validation to a new cookbook-audit skill. Add comprehensive automated validation script (validate_notebook.py) that: - Checks for hardcoded secrets and API keys - Validates notebook structure and introductions - Detects code quality issues (variable names, verbosity) - Identifies deprecated API patterns and invalid models - Converts notebooks to markdown for easier review Add detailed audit rubric (SKILL.md) with: - Structured audit workflow and report format - Scoring framework across 4 dimensions (20 points total) - Concrete examples of high and low-scoring audits - Comprehensive checklist and content philosophy - Style and structural requirements for cookbook notebooks The validate_notebook.py script runs automated checks and generates a markdown version of notebooks (saved to gitignored tmp/ folder) for more efficient context usage during manual review. * feat(security): add detect-secrets configuration and Anthropic credentials detector Add baseline configuration for the detect-secrets library with a custom plugin to detect Anthropic API keys and credentials in notebooks. Includes comprehensive set of built-in detectors and heuristic filters to prevent secrets from being committed to the repository. feat(cookbook-audit): integrate detect-secrets for hardcoded credential detection Enhanced the notebook validation to use detect-secrets for identifying hardcoded API keys and credentials. The implementation: - Runs detect-secrets-hook on notebooks with baseline configuration - Automatically locates baseline at `scripts/detect-secrets/.secrets.baseline` - Falls back to basic pattern matching if detect-secrets unavailable - Provides detailed output for manual review of potential secrets Updated documentation to reflect the automated secret scanning capability. * chore(workflows): remove unnecessary id-token permission Remove id-token: write permission from Claude Code workflow files as it is not needed for these operations. The workflows only require: - contents: read (to checkout repository code) - pull-requests: write (to comment on pull requests) The id-token: write permission is used for OIDC authentication with cloud providers (AWS, GCP, Azure) which these workflows do not use. This follows the principle of least privilege and reduces the security attack surface. Affected workflows: - claude-notebook-review.yml - claude-link-review.yml 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * use relative paths and run ruff on notebook script --------- Co-authored-by: Claude <[email protected]>
1 parent 5cb47e1 commit 9aa24e4

File tree

9 files changed

+907
-29
lines changed

9 files changed

+907
-29
lines changed
Lines changed: 3 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,13 @@
11
---
2-
allowed-tools: Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*)
2+
allowed-tools: Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(echo:*),Read,Glob,Grep,WebFetch
33
description: Comprehensive review of Jupyter notebooks and Python scripts
44
---
55

6-
Review the changes to Jupyter notebooks and Python scripts in this PR. Please check for:
7-
8-
## Code Quality
9-
- Python code follows PEP 8 conventions
10-
- Proper error handling
11-
- Clear variable names and documentation
12-
- No hardcoded API keys (use os.getenv("ANTHROPIC_API_KEY"))
13-
14-
## Notebook Structure
15-
- Clear introduction explaining what the notebook demonstrates and why it's useful
16-
- Configuration instructions (how to set up API keys, install dependencies, etc.)
17-
- Connecting explanations between cells that help users understand the flow
18-
- Clear markdown explanations between code cells
19-
- Logical flow from simple to complex
20-
- Outputs preserved for educational value
21-
- Dependencies properly imported
22-
23-
## Security
24-
- Check for any hardcoded API keys or secrets (not just Anthropic keys)
25-
- Ensure all sensitive credentials use environment variables (os.environ, getenv, etc.)
26-
- Flag any potential secret patterns (tokens, passwords, private keys)
27-
- Note: Educational examples showing "what not to do" are acceptable if clearly marked
28-
- Safe handling of user inputs
29-
- Appropriate use of environment variables
6+
Review the changes to Jupyter notebooks and Python scripts in this PR using the Notebook review skill.
307

318
Provide a clear summary with:
329
- ✅ What looks good
3310
- ⚠️ Suggestions for improvement
3411
- ❌ Critical issues that must be fixed
3512

36-
**IMPORTANT: Post your review as a comment on the pull request using the command: `gh pr comment $PR_NUMBER --body "your review"`**
13+
**IMPORTANT: Post your review as a comment on the pull request using the command: `gh pr comment $PR_NUMBER --body "your review"`**
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Temporary files generated during notebook review
2+
tmp/
3+
*.pyc
4+
__pycache__/
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
---
2+
name: cookbook-audit
3+
description: Audit an Anthropic Cookbook notebook based on a rubric. Use whenever a notebook review or audit is requested.
4+
---
5+
6+
# Cookbook Audit
7+
8+
## Instructions
9+
10+
Review the requested Cookbook notebook using the following guidelines. Provide a score based on scoring guidelines and recommendations on improving the cookbook.
11+
12+
## Workflow
13+
14+
Follow these steps for a comprehensive audit:
15+
16+
1. **Identify the notebook**: Ask user for path if not provided
17+
2. **Run automated checks**: Use `python3 validate_notebook.py <path>` to catch technical issues and generate markdown
18+
- The script automatically runs detect-secrets to scan for hardcoded API keys and credentials
19+
- Uses custom patterns defined in `scripts/detect-secrets/plugins.py`
20+
- Checks against baseline at `scripts/detect-secrets/.secrets.baseline`
21+
3. **Review markdown output**: The script generates a markdown file in the `tmp/` folder for easier review (saves context vs raw .ipynb)
22+
- The tmp/ folder is gitignored to avoid committing review artifacts
23+
- Markdown includes code cells but excludes outputs for cleaner review
24+
4. **Manual review**: Read through the markdown version evaluating against rubric
25+
5. **Score each dimension**: Apply scoring guidelines objectively
26+
6. **Generate report**: Follow the audit report format below
27+
7. **Provide specific examples**: Show concrete improvements with line references
28+
29+
## Audit Report Format
30+
31+
Present your audit using this structure:
32+
33+
### Executive Summary
34+
- **Overall Score**: X/20
35+
- **Key Strengths** (2-3 bullet points)
36+
- **Critical Issues** (2-3 bullet points)
37+
38+
### Detailed Scoring
39+
40+
#### 1. Narrative Quality: X/5
41+
[Brief justification with specific examples]
42+
43+
#### 2. Code Quality: X/5
44+
[Brief justification with specific examples]
45+
46+
#### 3. Technical Accuracy: X/5
47+
[Brief justification with specific examples]
48+
49+
#### 4. Actionability & Understanding: X/5
50+
[Brief justification with specific examples]
51+
52+
### Specific Recommendations
53+
54+
[Prioritized, actionable list of improvements with references to specific sections]
55+
56+
### Examples & Suggestions
57+
58+
[Show specific excerpts from the notebook with concrete suggestions for improvement]
59+
60+
## Quick Reference Checklist
61+
62+
Use this to ensure comprehensive coverage:
63+
64+
**Structure & Organization**
65+
- [ ] Has clear introduction (1-2 paragraphs)
66+
- [ ] States problem, audience, and outcome
67+
- [ ] Lists prerequisites clearly
68+
- [ ] Has logical section progression
69+
- [ ] Includes conclusion/summary
70+
71+
**Code Quality**
72+
- [ ] All code blocks have explanatory text before them
73+
- [ ] No hardcoded API keys (automatically checked by detect-secrets)
74+
- [ ] Meaningful variable names
75+
- [ ] Comments explain "why" not "what"
76+
- [ ] Follows language best practices
77+
- [ ] Model name defined as constant at top of notebook
78+
79+
**Output Management**
80+
- [ ] pip install logs suppressed with %%capture
81+
- [ ] No verbose debug output
82+
- [ ] Shows relevant API responses
83+
- [ ] Stack traces only when demonstrating error handling
84+
85+
**Content Quality**
86+
- [ ] Explains why approaches work
87+
- [ ] Discusses when to use this approach
88+
- [ ] Mentions limitations/considerations
89+
- [ ] Provides transferable knowledge
90+
- [ ] Appropriate model selection
91+
92+
**Technical Requirements**
93+
- [ ] Executable without modification (except API keys)
94+
- [ ] Uses non-deprecated API patterns
95+
- [ ] Uses valid model names (claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1)
96+
- [ ] Model name defined as constant at top of notebook
97+
- [ ] Includes dependency specifications
98+
- [ ] Assigned to primary category
99+
- [ ] Has relevant tags
100+
101+
### Content Philosophy: Action + Understanding
102+
103+
Cookbooks are primarily action-oriented but strategically incorporate understanding and informed by Diataxis framework.
104+
105+
Practical focus: Show users how to accomplish specific tasks with working code
106+
Builder's perspective: Written from the user's point of view, solving real problems
107+
Agency-building: Help users understand why approaches work, not just how
108+
Transferable knowledge: Teach patterns and principles that apply beyond the specific example
109+
Critical thinking: Encourage users to question outputs, recognize limitations, make informed choices
110+
111+
### What Makes a Good Cookbook
112+
113+
A good cookbook doesn't just help users solve today's problem, it also helps them understand the underlying principles behind the solutions, encouraging them to recognize when and how to adapt approaches. Users will be able to make more informed decisions about AI system design, develop judgement about model outputs, and build skills that transfer to future AI systems.
114+
115+
### What Cookbooks Are NOT
116+
117+
Cookbooks are not pure tutorials: We assume users have basic technical skills and API familiarity. We clearly state prerequisites in our cookbooks, and direct users to the Academy to learn more on topics.
118+
They are not comprehensive explanations: We don't teach transformer architecture or probability theory. We need to understand that our users are following our cookbooks to solve problems they are facing today. They are busy, in the midst of learning or building, and want to be able to use what they learn to solve their immediate needs.
119+
Cookbooks are not reference docs: We don't exhaustively document every parameter, we link to appropriate resources in our documentation as needed.
120+
Cookbooks are not simple tips and tricks: We don't teach "hacks" that only work for the current model generation. We don't over-promise and under-deliver.
121+
Cookbooks are not production-ready code: They showcase use cases and capabilities, not production patterns. Excessive error handling is not required.
122+
123+
### Style Guidelines
124+
125+
#### Voice & Tone
126+
127+
Educational and agency-building
128+
Professional but approachable
129+
Respectful of user intelligence and time
130+
Either second person ("you") or first person plural ("we") - be consistent within a notebook
131+
132+
#### Writing Quality
133+
134+
Clear, concise explanations
135+
Active voice preferred
136+
Short paragraphs (3-5 sentences)
137+
Avoid jargon without definition
138+
Use headers to break up sections
139+
140+
#### Code Presentation
141+
142+
Every code block should be preceded by explanatory text
143+
Comments should explain why, not what
144+
Use meaningful variable names
145+
146+
#### Output Handling
147+
Remove extraneous output, e.g with %%capture
148+
pip install logs
149+
Verbose debug statements
150+
Lengthy stack traces (unless demonstrating error handling)
151+
Show relevant output:
152+
API responses that demonstrate functionality
153+
Examples of successful execution
154+
155+
### Structural Requirements
156+
157+
Required Sections
158+
159+
1. Introduction (Required)
160+
[Cookbook Title]
161+
162+
[1-2 paragraphs covering:]
163+
- What problem this solves
164+
- Who this is for
165+
- What you'll build/accomplish
166+
167+
Prerequisites
168+
- Required technical skills
169+
- API keys needed
170+
- Dependencies to install
171+
172+
2. Main Content (Required)
173+
Organized by logical steps or phases, each with:
174+
Clear section headers
175+
Explanatory text before code blocks
176+
Code examples
177+
Expected outputs (where relevant)
178+
Understanding callouts: Brief explanations of why approaches work, when to use them, or important considerations
179+
180+
3. Conclusion (Recommended)
181+
182+
Summary of what was accomplished
183+
Limitations or considerations
184+
Next steps or related resources
185+
186+
Optional Sections
187+
How It Works: Brief explanation of the underlying approach or mechanism
188+
When to Use This: Guidance on appropriate use cases and contexts
189+
Limitations & Considerations: Important caveats, failure modes, or constraints
190+
Troubleshooting: Common issues and solutions
191+
Variations: Alternative approaches or extensions
192+
Performance Notes: Optimization considerations
193+
Further Reading: Links to relevant docs, papers, or deeper explanations
194+
195+
## Examples
196+
197+
### Example 1: High-Quality Notebook Audit (Score: 18/20)
198+
199+
**Notebook**: "Building a Customer Support Agent with Tool Use"
200+
201+
#### Executive Summary
202+
- **Overall Score**: 18/20
203+
- **Key Strengths**:
204+
- Excellent narrative flow from problem to solution
205+
- Clean, well-documented code with proper error handling
206+
- Strong focus on transferable patterns (tool schema design, error recovery)
207+
- **Critical Issues**:
208+
- Missing %%capture on pip install cells
209+
- Could benefit from a limitations section discussing when NOT to use this approach
210+
211+
#### Detailed Scoring
212+
213+
**1. Narrative Quality: 5/5**
214+
Opens with clear problem statement about reducing support ticket volume. Each section builds logically. Concludes with discussion of production considerations.
215+
216+
**2. Code Quality: 4/5**
217+
Excellent structure and naming. Clean, idiomatic code. Model defined as constant. Minor issue: pip install output not suppressed in cells 1-2.
218+
219+
**3. Technical Accuracy: 5/5**
220+
Demonstrates best practices for tool use. Appropriate model selection (using valid claude-sonnet-4-5 model). Correct API usage with streaming.
221+
222+
**4. Actionability & Understanding: 4/5**
223+
Very practical with clear adaptation points. Explains why tool schemas are designed certain ways. Could add more discussion on when this approach isn't suitable.
224+
225+
#### Specific Recommendations
226+
1. Add `%%capture` to cells 1-2 to suppress pip install logs
227+
2. Add "Limitations & Considerations" section discussing scenarios where simpler approaches might be better
228+
3. Consider adding a "Variations" section showing how to adapt for different support scenarios
229+
230+
---
231+
232+
### Example 2: Needs Improvement Notebook Audit (Score: 11/20)
233+
234+
**Notebook**: "Text Classification with Claude"
235+
236+
#### Executive Summary
237+
- **Overall Score**: 11/20
238+
- **Key Strengths**:
239+
- Working code that demonstrates basic classification
240+
- Covers multiple classification approaches
241+
- **Critical Issues**:
242+
- No introduction explaining use case or prerequisites
243+
- Code blocks lack explanatory text
244+
- No discussion of why approaches work or when to use them
245+
- Missing error handling and best practices
246+
247+
#### Detailed Scoring
248+
249+
**1. Narrative Quality: 2/5**
250+
Jumps directly into code without context. No introduction explaining what problem this solves or who it's for. Sections lack connecting narrative.
251+
252+
**2. Code Quality: 3/5**
253+
Code is functional but lacks structure. Variable names like `x1`, `result`, `temp` are unclear. No comments explaining non-obvious choices. Model not defined as constant at top.
254+
255+
**3. Technical Accuracy: 3/5**
256+
API calls work but use invalid or deprecated model names. Model selection not explained. No discussion of token efficiency or performance.
257+
258+
**4. Actionability & Understanding: 3/5**
259+
Shows multiple approaches but doesn't explain when to use each. No discussion of trade-offs. Unclear how to adapt to different classification tasks.
260+
261+
#### Specific Recommendations
262+
263+
**High Priority:**
264+
1. Add introduction section (1-2 paragraphs) explaining:
265+
- What classification problems this addresses
266+
- Prerequisites (basic Python, API key, familiarity with classification)
267+
- What readers will accomplish
268+
269+
2. Add explanatory text before EVERY code block explaining what it does and why
270+
271+
3. Update to current API patterns and explain model selection rationale
272+
273+
**Medium Priority:**
274+
4. Improve variable names: `x1``sample_text`, `result``classification_result`
275+
5. Define model as constant at top: `MODEL = 'claude-sonnet-4-5'`
276+
6. Update to use valid model names (claude-sonnet-4-5, claude-haiku-4-5, or claude-opus-4-1)
277+
7. Add "When to Use This" section explaining which approach for which scenario
278+
279+
**Low Priority:**
280+
8. Add conclusion summarizing trade-offs between approaches
281+
9. Add "Limitations" section discussing accuracy considerations
282+
10. Consider adding evaluation metrics example

0 commit comments

Comments
 (0)