Skip to content

Commit 9d0f14a

Browse files
realmarcinclaude
andcommitted
Add Claude Code deterministic D4D generation framework
This commit adds comprehensive deterministic D4D (Datasheets for Datasets) generation using Claude Sonnet 4.5 with temperature=0.0 for reproducibility. ## New Files ### D4D YAML Files (Claude Code) - Concatenated D4D YAMLs: 4 projects (AI_READI, CHORUS, CM4AI, VOICE) - Each with comprehensive metadata tracking (SHA-256 hashes, provenance) - Individual D4D YAMLs: 12 files across all projects - GPT-5 generated, Claude Code validated - Complete metadata files with reproducibility information ### Documentation - DETERMINISM.md: Comprehensive guide on deterministic D4D generation - Explains model settings (temperature=0.0, pinned version) - Documents schema versioning and prompt tracking - Describes Claude Code direct synthesis approach - Provides verification and comparison procedures - src/docs/d4d_examples.md: Added Claude Code section - Links to synthesized HTML and YAML files - Metadata file downloads - Explains deterministic features - CLAUDE.md: Updated with Claude Code D4D generation instructions ### Python Scripts - src/download/process_concatenated_d4d_claude.py - API-based deterministic D4D extraction (temperature=0.0) - External prompt loading with SHA-256 tracking - Local schema file usage - Comprehensive metadata generation - src/download/process_individual_d4d_claude_direct.py - Metadata generator for individually validated D4D files - Documents Claude Code validation process ### Prompts (version-controlled) - src/download/prompts/d4d_concatenated_system_prompt.txt - src/download/prompts/d4d_concatenated_user_prompt.txt - src/download/prompts/determinism_settings.yaml ### Makefile Updates - Added extract-d4d-concat-claude targets - Added extract-d4d-individual-claude targets - Added list-d4d-individual-claude target - Comprehensive documentation in comments ## Key Features 1. **Deterministic Settings** - Temperature: 0.0 (maximum determinism) - Model: claude-sonnet-4-5-20250929 (date-pinned) - Schema: Local file (version-controlled) - Prompts: External files (version-controlled) 2. **Comprehensive Metadata** - SHA-256 hashes: input files, schema, prompts - Processing environment details - Git commit provenance - Reproducibility commands 3. **Two Implementation Paths** - API-based: process_concatenated_d4d_claude.py (requires ANTHROPIC_API_KEY) - Direct synthesis: Claude Code assistant (current files) 4. **Make Targets** - make extract-d4d-concat-claude PROJECT=AI_READI - make extract-d4d-concat-all-claude - make extract-d4d-individual-claude - make list-d4d-individual-claude ## Statistics - Concatenated: 4 D4D YAMLs + 4 metadata files - Individual: 12 D4D YAMLs + 12 metadata files - Total: 16 D4D YAMLs with complete provenance tracking 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 4dc4bbb commit 9d0f14a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4599
-16
lines changed

CLAUDE.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,9 @@ make data-status
343343

344344
# Quick compact overview
345345
make data-status-quick
346+
347+
# Detailed D4D YAML size report
348+
make data-d4d-sizes
346349
```
347350

348351
**Features:**
@@ -351,6 +354,7 @@ make data-status-quick
351354
- Identifies missing directories with ❌ markers
352355
- Displays file sizes and line counts for key files
353356
- Provides summary statistics across all projects
357+
- Reports D4D YAML sizes with individual and concatenated breakdowns
354358

355359
### Quick Reference: Common Workflows
356360

@@ -507,6 +511,7 @@ Beyond standard LinkML targets, this project adds comprehensive D4D pipeline tar
507511
```bash
508512
make data-status # Full data status report with counts
509513
make data-status-quick # Compact status overview
514+
make data-d4d-sizes # Detailed D4D YAML size report
510515
```
511516

512517
### Concatenation Targets

0 commit comments

Comments
 (0)