Commit 9d0f14a
Add Claude Code deterministic D4D generation framework
This commit adds comprehensive deterministic D4D (Datasheets for Datasets)
generation using Claude Sonnet 4.5 with temperature=0.0 for reproducibility.
## New Files
### D4D YAML Files (Claude Code)
- Concatenated D4D YAMLs: 4 projects (AI_READI, CHORUS, CM4AI, VOICE)
- Each with comprehensive metadata tracking (SHA-256 hashes, provenance)
- Individual D4D YAMLs: 12 files across all projects
- GPT-5 generated, Claude Code validated
- Complete metadata files with reproducibility information
### Documentation
- DETERMINISM.md: Comprehensive guide on deterministic D4D generation
- Explains model settings (temperature=0.0, pinned version)
- Documents schema versioning and prompt tracking
- Describes Claude Code direct synthesis approach
- Provides verification and comparison procedures
- src/docs/d4d_examples.md: Added Claude Code section
- Links to synthesized HTML and YAML files
- Metadata file downloads
- Explains deterministic features
- CLAUDE.md: Updated with Claude Code D4D generation instructions
### Python Scripts
- src/download/process_concatenated_d4d_claude.py
- API-based deterministic D4D extraction (temperature=0.0)
- External prompt loading with SHA-256 tracking
- Local schema file usage
- Comprehensive metadata generation
- src/download/process_individual_d4d_claude_direct.py
- Metadata generator for individually validated D4D files
- Documents Claude Code validation process
### Prompts (version-controlled)
- src/download/prompts/d4d_concatenated_system_prompt.txt
- src/download/prompts/d4d_concatenated_user_prompt.txt
- src/download/prompts/determinism_settings.yaml
### Makefile Updates
- Added extract-d4d-concat-claude targets
- Added extract-d4d-individual-claude targets
- Added list-d4d-individual-claude target
- Comprehensive documentation in comments
## Key Features
1. **Deterministic Settings**
- Temperature: 0.0 (maximum determinism)
- Model: claude-sonnet-4-5-20250929 (date-pinned)
- Schema: Local file (version-controlled)
- Prompts: External files (version-controlled)
2. **Comprehensive Metadata**
- SHA-256 hashes: input files, schema, prompts
- Processing environment details
- Git commit provenance
- Reproducibility commands
3. **Two Implementation Paths**
- API-based: process_concatenated_d4d_claude.py (requires ANTHROPIC_API_KEY)
- Direct synthesis: Claude Code assistant (current files)
4. **Make Targets**
- make extract-d4d-concat-claude PROJECT=AI_READI
- make extract-d4d-concat-all-claude
- make extract-d4d-individual-claude
- make list-d4d-individual-claude
## Statistics
- Concatenated: 4 D4D YAMLs + 4 metadata files
- Individual: 12 D4D YAMLs + 12 metadata files
- Total: 16 D4D YAMLs with complete provenance tracking
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>1 parent 4dc4bbb commit 9d0f14a
File tree
42 files changed
+4599
-16
lines changed- data
- d4d_concatenated/claudecode
- d4d_individual/claudecode
- AI_READI
- CHORUS
- CM4AI
- VOICE
- src
- docs
- download
- prompts
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
42 files changed
+4599
-16
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
346 | 349 | | |
347 | 350 | | |
348 | 351 | | |
| |||
351 | 354 | | |
352 | 355 | | |
353 | 356 | | |
| 357 | + | |
354 | 358 | | |
355 | 359 | | |
356 | 360 | | |
| |||
507 | 511 | | |
508 | 512 | | |
509 | 513 | | |
| 514 | + | |
510 | 515 | | |
511 | 516 | | |
512 | 517 | | |
| |||
0 commit comments