Add comprehensive enum metadata and stewardship framework #32

cmungall · 2025-11-25T20:59:32Z

Summary

This PR enhances the value sets project with comprehensive metadata and stewardship tracking for all 516 enums across 109 schema files.

Key Features

🏛️ Custom Metamodel for Validation

Created src/valuesets/metamodel/valuesets_meta.yaml with metaclasses:

ValueSetEnumDefinition: Requires title, description, status, contributors
ValueSetEnumDefinitionWithStewardship: Adds stewards, publishers, endorsers
ReferenceEnumDefinition: For reference/controlled vocabulary enums
Additional specialized metaclasses for sources and conformance

🌐 Stewardship Value Set

New stewardship.yaml with 15 recognized organizations:

GA4GH, Gene Ontology Consortium, Genomic Standards Consortium
Phenopackets, EMBL-EBI, NCBI, CDISC, HL7, ISO, W3C
Uses ROR IDs for authoritative organization identification

📋 Complete Metadata Coverage

Updated all 516 enums with:

✅ Human-readable titles
✅ Descriptions
✅ Status (DRAFT, STANDARD, etc. using StandardsMaturityLevel)
✅ Contributors (default: ORCID + claude-code)
✅ Metaclass instantiation references

🎯 Conservative Stewardship Application

Applied stewardship metadata to well-established value sets:

Gene Ontology (4 enums):

go_aspect.yaml, go_causality.yaml, go_evidence.yaml
Stewards/Publishers: https://geneontology.org

Phenopackets (11 enums):

KaryotypicSexEnum, PhenotypicSexEnum, AllelicStateEnum, ACMGPathogenicityEnum, etc.
Stewards: https://phenopackets.org | Publishers: https://ga4gh.org

INSDC/GSC (2 enums):

insdc_geographic_locations.yaml, insdc_missing_values.yaml
Stewards: https://gensc.org | Publishers: https://www.insdc.org

📖 Governance Documentation

Created docs/governance.md covering:

Source of Truth model (mirrored vs community value sets)
Maturity levels and lifecycle management
Stewardship roles and responsibilities
Contributing process and quality standards
Working group structure (placeholder for community expansion)

🐛 Bug Fix

Fixed pre-existing venv issue with stale shebangs pointing to old project location:

✅ just site now succeeds
✅ just test passes (63 tests)
✅ All LinkML generators work correctly

Technical Notes

Stewardship metadata uses annotations: block because LinkML's EnumDefinition doesn't support custom slots directly:

enums:
  GOAspect:
    instantiates:
    - valuesets_meta:ValueSetEnumDefinitionWithStewardship
    annotations:
      stewards: https://geneontology.org
      publishers: https://geneontology.org

Testing

✅ All 63 tests pass
✅ just site generates documentation
✅ just validate runs without errors
✅ Schema validation passes

Impact

All enums now have complete, consistent metadata
Foundation for community governance and stewardship tracking
Validation framework via custom metamodel
Clear path for expanding stewardship annotations

Next Steps

Future work could include:

Expanding stewardship annotations to additional value sets
Setting up working groups for domain-specific governance
Creating automated validation for metadata completeness

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

This commit enhances the value sets project with comprehensive metadata and stewardship tracking for all 516 enums across 109 schema files. ## Key Changes ### 1. Custom Metamodel for Validation (src/valuesets/metamodel/) Created `valuesets_meta.yaml` defining metaclasses for enum validation: - `ValueSetEnumDefinition`: Base metaclass requiring title, description, status, and contributors - `ValueSetEnumDefinitionWithStewardship`: Extends base with stewardship annotations (stewards, publishers, endorsers) - `ValueSetEnumDefinitionWithSource`: For enums with external source - `ValueSetEnumDefinitionWithConformance`: For enums conforming to standards - `ReferenceEnumDefinition`: For reference/controlled vocabulary enums ### 2. Stewardship Value Set (src/valuesets/schema/stewardship.yaml) Created enum of recognized steward organizations with ROR IDs: - GA4GH, Gene Ontology Consortium, Genomic Standards Consortium - Phenopackets, EMBL-EBI, NCBI, CDISC, HL7, ISO, W3C, etc. - Supports tracking organizational responsibility for value sets ### 3. Batch Metadata Updates (All 516 Enums) Updated all enums with required metadata fields: - Added `title` (human-readable names) - Ensured `description` present - Set `status` (using StandardsMaturityLevel: DRAFT, STANDARD, etc.) - Added default `contributors`: orcid:0000-0002-6601-2165 and github.com/anthropics/claude-code - Added `instantiates` references to appropriate metaclasses ### 4. Stewardship Annotations (Conservative Application) Applied stewardship metadata to well-established value sets: **Gene Ontology (4 enums):** - go_aspect.yaml, go_causality.yaml, go_evidence.yaml - Stewards: https://geneontology.org - Publishers: https://geneontology.org **Phenopackets (11 enums in clinical/phenopackets.yaml):** - KaryotypicSexEnum, PhenotypicSexEnum, AllelicStateEnum, ACMGPathogenicityEnum, etc. - Stewards: https://phenopackets.org - Publishers: https://ga4gh.org **INSDC/GSC (2 enums):** - insdc_geographic_locations.yaml, insdc_missing_values.yaml - Stewards: https://gensc.org - Publishers: https://www.insdc.org Note: Used `annotations:` block since LinkML EnumDefinition doesn't support custom stewardship slots directly. ### 5. Governance Documentation (docs/governance.md) Created comprehensive governance documentation covering: - Source of Truth model (mirrored vs community value sets) - Maturity levels and lifecycle - Stewardship roles and responsibilities - Contributing process and quality standards - Working group structure (placeholder) ### 6. Bug Fix: Stale Virtual Environment Fixed pre-existing issue where `.venv` had stale shebangs pointing to old project location (`common-value-sets`). Recreated venv, enabling: - `just site` now succeeds - `just test` passes (63 tests) - All LinkML generators work correctly ## Impact - All 516 enums now have complete, consistent metadata - Stewardship tracking enables community governance - Custom metamodel provides validation framework - Foundation for future expansion of stewardship annotations ## Testing - ✅ All 63 tests pass - ✅ `just site` generates documentation successfully - ✅ `just validate` runs without errors - ✅ Schema validation passes

claude · 2025-11-25T20:59:50Z

Claude finished @cmungall's task —— View job

PR Review: Add comprehensive enum metadata and stewardship framework

Review Checklist:

Analyze metamodel design and validation approach
Review enum metadata consistency across sample files
Examine stewardship implementation and organization data
Review governance documentation structure
Check for potential bugs and security issues
Assess performance implications
Evaluate test coverage and validation
Provide comprehensive feedback and recommendations

🔍 Review Summary

This is an excellent foundational PR that adds crucial governance infrastructure to the valuesets project. The implementation demonstrates thoughtful design and comprehensive coverage.

✅ Strengths

1. Well-Designed Metamodel Architecture

Clean separation of concerns: The metamodel (src/valuesets/metamodel/valuesets_meta.yaml) properly separates validation rules from the main schema
Hierarchical inheritance: Smart use of is_a relationships (ValueSetEnumDefinition → ValueSetEnumDefinitionWithStewardship)
FHIR alignment: Stewardship roles (stewards/publishers/endorsers) follow established HL7 FHIR MetadataResource patterns
Future-proof design: Uses instantiates for declarative conformance that can enable validation when LinkML supports it

2. Comprehensive Metadata Coverage

516 enums updated: All enums now have complete metadata (title, description, status, contributors)
Consistent application: Excellent consistency across all domain files I sampled
Proper ontology mappings: Good use of meaning: fields with valid CURIEs (GO:, NCIT:, etc.)
Conservative stewardship: Appropriately applied only to well-established standards

3. Robust Stewardship Implementation

The stewardship.yaml design is particularly well done:

ROR IDs preferred: Uses Research Organization Registry IDs for authoritative identification
Comprehensive coverage: 15 key organizations spanning bioinformatics, standards bodies, and research institutions
Rich metadata: Each organization includes establishment dates, websites, and contextual annotations

4. Excellent Governance Documentation

The governance.md provides:

Clear stewardship model: Good distinction between mirrored vs. community value sets
Maturity progression: Well-defined advancement path from DRAFT to STANDARD
Future-oriented: Placeholder structure for working groups and steering committee

🐛 Issues Found

1. Minor Schema Inconsistencies

ID namespace inconsistency: Some files use https://w3id.org/linkml/valuesets/ while others use https://w3id.org/valuesets/ or https://w3id.org/common-value-sets/ (seen in maturity_levels.yaml:5)
Missing source attribution: Some enums with external origins lack source: fields

2. Validation Gaps

No metamodel validation enforcement: Current LinkML doesn't validate instantiates constraints
Missing status validation: No validation that status values match StandardsMaturityLevel enum
Stewardship requirements unclear: ValueSetEnumDefinitionWithStewardship requires "at least one steward OR publisher" but this isn't enforceable

3. Documentation Issues

Placeholder URLs: governance.md contains placeholder URLs (https://example.org/valuesets-stewardship-signup)
Missing validation commands: Should mention running just validate after changes per CLAUDE.md guidelines

🔒 Security Assessment

No security issues identified: All changes are schema/metadata updates
Safe external references: All stewardship URLs point to legitimate organizations
No credentials exposed: No secrets or sensitive information in the changes

⚡ Performance Considerations

Minimal performance impact: Metadata additions don't affect runtime performance
Large file count: 100 changed files is significant but well-organized
Generation efficiency: Should verify just site generation time isn't significantly impacted

🧪 Test Coverage Assessment

Reviewed existing test suite:

Good enum functionality tests: /tests/test_rich_enums.py provides comprehensive coverage
Missing metamodel tests: No tests for new validation metaclasses
No governance validation tests: Should add tests for stewardship metadata validation

📋 Recommendations

High Priority

Standardize ID namespaces: Choose one pattern and apply consistently across all schema files
Add metamodel tests: Create tests validating that enums properly instantiate their declared metaclasses
Fix placeholder URLs: Replace example.org URLs in governance.md with real endpoints or mark as "TBD"

Medium Priority

Add validation helper: Create just validate-stewardship command to check stewardship metadata completeness
Document stewardship criteria: Add guidelines for when to apply stewardship annotations
Consider status validation: Add validation that status values are from StandardsMaturityLevel enum

Low Priority

Enhance test coverage: Add tests for new governance features
Performance baseline: Measure and document site generation time impact
Working group structure: Begin organizing domain-specific working groups mentioned in governance.md

💡 Architecture Praise

The annotation-based stewardship approach is particularly clever:

annotations:
  stewards: https://geneontology.org
  publishers: https://geneontology.org

This works around LinkML's current limitation where EnumDefinition doesn't support custom slots directly, while maintaining clean semantics and future extensibility.

🎯 Conclusion

This PR represents exceptional work that establishes crucial governance foundations. The metamodel design is sophisticated, the metadata coverage is comprehensive, and the stewardship model is well-architected. The minor issues identified are easily addressable and don't detract from the overall quality.

Recommendation: Approve with minor fixes

The identified namespace inconsistencies and placeholder URLs should be addressed, but the core contribution is excellent and ready for adoption.

Changes: - Changed all enums to use `valuesets_meta:ValueSetEnumDefinition` instead of `ValueSetEnumDefinitionWithStewardship` - Single base metaclass is simpler and cleaner - Stewardship metadata remains via annotations block - Removed Python 3.9 from CI test matrix (EOL) - Updated 6 schema files: GO (3), INSDC (2), Phenopackets (1) - Regenerated Python datamodels All enums now consistently use the same base metaclass while maintaining stewardship annotations where appropriate.

claude · 2025-11-25T21:06:03Z

Claude finished @cmungall's task —— View job

PR Review: Add comprehensive enum metadata and stewardship framework

Review Checklist:

Analyze metamodel design and validation approach
Review enum metadata consistency across sample files
Examine stewardship implementation and organization data
Review governance documentation structure
Check for potential bugs and security issues
Assess performance implications
Evaluate test coverage and validation
Provide comprehensive feedback and recommendations

🔍 Review Summary

This is an excellent foundational PR that adds crucial governance infrastructure to the valuesets project. The implementation demonstrates thoughtful design and comprehensive coverage.

✅ Strengths

1. Well-Designed Metamodel Architecture

Clean separation of concerns: The metamodel (src/valuesets/metamodel/valuesets_meta.yaml) properly separates validation rules from the main schema
Hierarchical inheritance: Smart use of is_a relationships (ValueSetEnumDefinition → ValueSetEnumDefinitionWithStewardship)
FHIR alignment: Stewardship roles (stewards/publishers/endorsers) follow established HL7 FHIR MetadataResource patterns
Future-proof design: Uses instantiates for declarative conformance that can enable validation when LinkML supports it

2. Comprehensive Metadata Coverage

516 enums updated: All enums now have complete metadata (title, description, status, contributors)
Consistent application: Excellent consistency across all domain files I sampled
Proper ontology mappings: Good use of meaning: fields with valid CURIEs (GO:, NCIT:, etc.)
Conservative stewardship: Appropriately applied only to well-established standards

3. Robust Stewardship Implementation

The stewardship.yaml design is particularly well done:

ROR IDs preferred: Uses Research Organization Registry IDs for authoritative identification
Comprehensive coverage: 15 key organizations spanning bioinformatics, standards bodies, and research institutions
Rich metadata: Each organization includes establishment dates, websites, and contextual annotations

4. Excellent Governance Documentation

The governance.md provides:

Clear stewardship model: Good distinction between mirrored vs. community value sets
Maturity progression: Well-defined advancement path from DRAFT to STANDARD
Future-oriented: Placeholder structure for working groups and steering committee

🐛 Issues Found

1. Minor Schema Inconsistencies

ID namespace inconsistency: Some files use https://w3id.org/linkml/valuesets/ while others use https://w3id.org/valuesets/ or https://w3id.org/common-value-sets/ (seen in maturity_levels.yaml:5)
Missing source attribution: Some enums with external origins lack source: fields

2. Validation Gaps

No metamodel validation enforcement: Current LinkML doesn't validate instantiates constraints
Missing status validation: No validation that status values match StandardsMaturityLevel enum
Stewardship requirements unclear: ValueSetEnumDefinitionWithStewardship requires "at least one steward OR publisher" but this isn't enforceable

3. Documentation Issues

Placeholder URLs: governance.md contains placeholder URLs (https://example.org/valuesets-stewardship-signup)
Missing validation commands: Should mention running just validate after changes per CLAUDE.md guidelines

🔒 Security Assessment

No security issues identified: All changes are schema/metadata updates
Safe external references: All stewardship URLs point to legitimate organizations
No credentials exposed: No secrets or sensitive information in the changes

⚡ Performance Considerations

Minimal performance impact: Metadata additions don't affect runtime performance
Large file count: 100 changed files is significant but well-organized
Generation efficiency: Should verify just site generation time isn't significantly impacted

🧪 Test Coverage Assessment

Reviewed existing test suite:

Good enum functionality tests: /tests/test_rich_enums.py provides comprehensive coverage
Missing metamodel tests: No tests for new validation metaclasses
No governance validation tests: Should add tests for stewardship metadata validation

📋 Recommendations

High Priority

Standardize ID namespaces: Choose one pattern and apply consistently across all schema files
Add metamodel tests: Create tests validating that enums properly instantiate their declared metaclasses
Fix placeholder URLs: Replace example.org URLs in governance.md with real endpoints or mark as "TBD"

Medium Priority

Add validation helper: Create just validate-stewardship command to check stewardship metadata completeness
Document stewardship criteria: Add guidelines for when to apply stewardship annotations
Consider status validation: Add validation that status values are from StandardsMaturityLevel enum

Low Priority

Enhance test coverage: Add tests for new governance features
Performance baseline: Measure and document site generation time impact
Working group structure: Begin organizing domain-specific working groups mentioned in governance.md

💡 Architecture Praise

The annotation-based stewardship approach is particularly clever:

annotations:
  stewards: https://geneontology.org
  publishers: https://geneontology.org

This works around LinkML's current limitation where EnumDefinition doesn't support custom slots directly, while maintaining clean semantics and future extensibility.

🎯 Conclusion

This PR represents exceptional work that establishes crucial governance foundations. The metamodel design is sophisticated, the metadata coverage is comprehensive, and the stewardship model is well-architected. The minor issues identified are easily addressable and don't detract from the overall quality.

Recommendation: Approve with minor fixes

The identified namespace inconsistencies and placeholder URLs should be addressed, but the core contribution is excellent and ready for adoption.

Copilot

Pull request overview

This PR enhances the valuesets project by adding comprehensive metadata and stewardship tracking for all 516 enums across 109 schema files. The changes establish a custom metamodel for validation, define stewardship organizations, and apply metadata to all enums in the repository.

Key changes:

Created a custom metamodel (valuesets_meta.yaml) with metaclasses for enum validation
Introduced stewardship.yaml defining 15 recognized organizations for value set governance
Updated all 516 enums with title, description, status, contributors, and instantiates references
Applied stewardship metadata to 17 well-established value sets (GO, Phenopackets, INSDC/GSC)
Fixed pre-existing venv issue with stale shebangs
Created governance documentation

Reviewed changes

Copilot reviewed 132 out of 136 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/valuesets/metamodel/valuesets_meta.yaml	Defines metaclasses for enum validation and stewardship tracking
src/valuesets/schema/stewardship.yaml	New enum defining 15 stewardship organizations with ROR IDs
src/valuesets/schema/*/.yaml	109 schema files updated with enum metadata (title, status, contributors, instantiates)
src/valuesets/schema/clinical/phenopackets.yaml	Added stewardship annotations to 11 Phenopackets enums
docs/governance.md	New governance documentation
justfile	Added gen-slides to site target

cmungall requested a review from Copilot November 26, 2025 00:10

Copilot started reviewing on behalf of cmungall November 26, 2025 00:11 View session

Copilot finished reviewing on behalf of cmungall November 26, 2025 00:11

Copilot AI reviewed Nov 26, 2025

View reviewed changes

cmungall merged commit 1f21cae into main Nov 26, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive enum metadata and stewardship framework #32

Add comprehensive enum metadata and stewardship framework #32

Uh oh!

cmungall commented Nov 25, 2025

Uh oh!

claude bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add comprehensive enum metadata and stewardship framework #32

Add comprehensive enum metadata and stewardship framework #32

Uh oh!

Conversation

cmungall commented Nov 25, 2025

Summary

Key Features

🏛️ Custom Metamodel for Validation

🌐 Stewardship Value Set

📋 Complete Metadata Coverage

🎯 Conservative Stewardship Application

📖 Governance Documentation

🐛 Bug Fix

Technical Notes

Testing

Impact

Next Steps

Uh oh!

claude bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add comprehensive enum metadata and stewardship framework

🔍 Review Summary

✅ Strengths

1. Well-Designed Metamodel Architecture

2. Comprehensive Metadata Coverage

3. Robust Stewardship Implementation

4. Excellent Governance Documentation

🐛 Issues Found

1. Minor Schema Inconsistencies

2. Validation Gaps

3. Documentation Issues

🔒 Security Assessment

⚡ Performance Considerations

🧪 Test Coverage Assessment

📋 Recommendations

High Priority

Medium Priority

Low Priority

💡 Architecture Praise

🎯 Conclusion

Uh oh!

claude bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add comprehensive enum metadata and stewardship framework

🔍 Review Summary

✅ Strengths

1. Well-Designed Metamodel Architecture

2. Comprehensive Metadata Coverage

3. Robust Stewardship Implementation

4. Excellent Governance Documentation

🐛 Issues Found

1. Minor Schema Inconsistencies

2. Validation Gaps

3. Documentation Issues

🔒 Security Assessment

⚡ Performance Considerations

🧪 Test Coverage Assessment

📋 Recommendations

High Priority

Medium Priority

Low Priority

💡 Architecture Praise

🎯 Conclusion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude bot commented Nov 25, 2025 •

edited

Loading

claude bot commented Nov 25, 2025 •

edited

Loading