Skip to content

Conversation

@realmarcin
Copy link
Collaborator

No description provided.

realmarcin and others added 7 commits November 22, 2025 23:37
Makefile install target was missing --no-root flag, causing package installation errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Prevent package installation errors in documentation deployment workflow.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update test matrix to use Python 3.12 instead of 3.9 for modern Python support.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added --no-root flag to poetry install command in pypi-publish.yaml to match
all other workflow files and prevent package installation errors.

This was the final location calling poetry install without --no-root.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
PyYAML 6.0 doesn't have pre-built wheels for Python 3.12 and fails to build
from source with Cython errors. Updated to PyYAML 6.0.3 which includes
Python 3.12 wheels.

Fixes: AttributeError: cython_sources in PEP517 build

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
greenlet 1.1.2 doesn't have Python 3.12 wheels and fails to build from source
due to incompatibility with Python 3.12's internal C API changes. Updated to
greenlet 3.2.4 which includes Python 3.12 wheels.

Fixes: Build errors with CFrame, exc_type, recursion_depth in Python 3.12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Resolved conflicts from schema relocation to cookiecutter standard paths.
All schema files now at: src/model_card_schema/schema/

Changes:
- src/modelcards/ → src/model_card_schema/
- modelcards.yaml → model_card_schema.yaml
- Updated about.yaml to point to correct path

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@realmarcin realmarcin requested a review from Copilot November 24, 2025 00:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

realmarcin and others added 2 commits November 23, 2025 16:58
This commit implements Phase 1 of the Datasheets for Datasets (D4D) integration,
providing a production-ready harmonized schema with comprehensive examples and
documentation.

## New Files

**src/model_card_schema/schema/model_card_schema_d4dharmonized.yaml** (~1,500 lines):
- Production D4D harmonized schema using external reference pattern
- Three new reference classes: CreatorReference, DatasetReference, GrantReference
- Replaces simple classes with D4D references (owner → CreatorReference, dataSet → DatasetReference)
- Adds provenance metadata (created_by, modified_by, created_on, modified_on)
- Preserves ALL extended template features (DOE, compute infrastructure, reproducibility)
- No schema imports - avoids naming conflicts

**D4D_HARMONIZATION.md** (comprehensive user guide):
- Overview of D4D harmonization and benefits
- Quick start guide
- Key concepts (CreatorReference, DatasetReference, GrantReference, Provenance)
- Schema comparison table (deprecated vs new classes)
- Complete migration guide with step-by-step examples
- Best practices for URLs, provenance, creator attribution
- FAQ section
- References and support information

**src/data/examples/d4d_integration/** (complete example suite):
- climate-forecasting-model-card.yaml - Full model card using D4D schema
- creators/jane-smith-creator.yaml - D4D Creator (Person) with ORCID, CRediT roles
- creators/climate-ai-lab-creator.yaml - D4D Creator (Organization) with ROR
- datasets/noaa-historical-climate-dataset.yaml - D4D Dataset (200+ fields)
- grants/doe-scidac-grant.yaml - D4D Grant with PI, budget, objectives
- README.md - Complete usage guide with validation instructions

## Modified Files

**INTEGRATION_GUIDE.md**:
- Updated status to "Phase 1 COMPLETED"
- Updated Pattern 1 section with actual D4D implementation
- Updated implementation status with completed tasks
- Updated references to point to new examples
- Changed version to 2.0, date to November 23, 2025

**CLAUDE.md**:
- Updated "Current Status" to mention Phase 1 COMPLETED
- Updated "Schema Source Files" section with correct paths
- Added comprehensive D4D Harmonized Schema description
- Updated "Implementation Status" section
- Updated "D4D Harmonization" section with completion status
- Updated "Important Notes" to list two production schemas

## Deleted Files

**src/model_card_schema/schema/model_card_schema_harmonized.yaml**:
- Removed old conceptual harmonized schema
- Replaced by model_card_schema_d4dharmonized.yaml (production version)

## Key Achievements

**Schema Enhancements**:
- Upgraded dataset documentation from 7 fields → 200+ fields (60+ D4D classes)
- Enhanced creator attribution: simple name/contact → ORCID, CRediT roles, affiliations
- Enhanced funding: string → structured Grant with PI, budget, objectives
- Added provenance tracking at two levels (modelCard root, ModelDetails)

**Implementation Approach**:
- External reference pattern (no schema imports)
- Clean separation of concerns
- No naming conflicts
- Backward compatible migration path

**Comprehensive Documentation**:
- D4D_HARMONIZATION.md - User-facing guide (complete migration guide, examples, FAQ)
- INTEGRATION_GUIDE.md - Technical implementation guide
- ALIGNMENT_ANALYSIS.md - Schema comparison (existing)
- Example README - Detailed usage instructions

**Complete Examples**:
- Real-world climate model example
- 2 Creator instances (Person + Organization)
- 1 comprehensive Dataset instance (motivation, composition, collection, preprocessing, uses, privacy, distribution, maintenance)
- 1 Grant instance (DOE SciDAC)

## Benefits

**For Users**:
- Single source of truth for datasets (document once, reference many times)
- Comprehensive documentation (7 fields → 200+ fields)
- Rich creator attribution (ORCID, CRediT roles)
- Detailed funding transparency
- Provenance tracking
- No breaking changes to existing model cards

**For Developers**:
- Practical working examples
- Clear integration patterns
- Documented technical approach
- Phased implementation roadmap

## Migration Path

Users can choose:
1. **Base schema** - Simple model cards without D4D integration
2. **D4D harmonized schema** - Comprehensive dataset/creator documentation

Migration is straightforward:
1. Create D4D instances (Creator, Dataset, Grant)
2. Update model card to reference D4D instances
3. Add provenance metadata

See D4D_HARMONIZATION.md for complete migration guide.

## References

- INTEGRATION_GUIDE.md - Technical integration patterns
- D4D_HARMONIZATION.md - User guide and migration
- ALIGNMENT_ANALYSIS.md - Schema comparison analysis
- src/data/examples/d4d_integration/README.md - Example usage guide
- Datasheets for Datasets: https://github.com/bridge2ai/data-sheets-schema

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
These macOS-specific files should not be tracked in version control.
.DS_Store is already in .gitignore to prevent future additions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@realmarcin realmarcin requested a review from Copilot December 4, 2025 19:15
@realmarcin realmarcin merged commit 157f940 into main Dec 4, 2025
5 checks passed
@realmarcin realmarcin deleted the schema-update branch December 4, 2025 19:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant