Skip to content

Conversation

@cmungall
Copy link
Member

Summary

This PR enhances the value sets project with comprehensive metadata and stewardship tracking for all 516 enums across 109 schema files.

Key Features

🏛️ Custom Metamodel for Validation

Created src/valuesets/metamodel/valuesets_meta.yaml with metaclasses:

  • ValueSetEnumDefinition: Requires title, description, status, contributors
  • ValueSetEnumDefinitionWithStewardship: Adds stewards, publishers, endorsers
  • ReferenceEnumDefinition: For reference/controlled vocabulary enums
  • Additional specialized metaclasses for sources and conformance

🌐 Stewardship Value Set

New stewardship.yaml with 15 recognized organizations:

  • GA4GH, Gene Ontology Consortium, Genomic Standards Consortium
  • Phenopackets, EMBL-EBI, NCBI, CDISC, HL7, ISO, W3C
  • Uses ROR IDs for authoritative organization identification

📋 Complete Metadata Coverage

Updated all 516 enums with:

  • ✅ Human-readable titles
  • ✅ Descriptions
  • ✅ Status (DRAFT, STANDARD, etc. using StandardsMaturityLevel)
  • ✅ Contributors (default: ORCID + claude-code)
  • ✅ Metaclass instantiation references

🎯 Conservative Stewardship Application

Applied stewardship metadata to well-established value sets:

Gene Ontology (4 enums):

Phenopackets (11 enums):

INSDC/GSC (2 enums):

📖 Governance Documentation

Created docs/governance.md covering:

  • Source of Truth model (mirrored vs community value sets)
  • Maturity levels and lifecycle management
  • Stewardship roles and responsibilities
  • Contributing process and quality standards
  • Working group structure (placeholder for community expansion)

🐛 Bug Fix

Fixed pre-existing venv issue with stale shebangs pointing to old project location:

  • just site now succeeds
  • just test passes (63 tests)
  • ✅ All LinkML generators work correctly

Technical Notes

Stewardship metadata uses annotations: block because LinkML's EnumDefinition doesn't support custom slots directly:

enums:
  GOAspect:
    instantiates:
    - valuesets_meta:ValueSetEnumDefinitionWithStewardship
    annotations:
      stewards: https://geneontology.org
      publishers: https://geneontology.org

Testing

  • ✅ All 63 tests pass
  • just site generates documentation
  • just validate runs without errors
  • ✅ Schema validation passes

Impact

  • All enums now have complete, consistent metadata
  • Foundation for community governance and stewardship tracking
  • Validation framework via custom metamodel
  • Clear path for expanding stewardship annotations

Next Steps

Future work could include:

  • Expanding stewardship annotations to additional value sets
  • Setting up working groups for domain-specific governance
  • Creating automated validation for metadata completeness

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

This commit enhances the value sets project with comprehensive metadata
and stewardship tracking for all 516 enums across 109 schema files.

## Key Changes

### 1. Custom Metamodel for Validation (src/valuesets/metamodel/)
Created `valuesets_meta.yaml` defining metaclasses for enum validation:
- `ValueSetEnumDefinition`: Base metaclass requiring title, description,
  status, and contributors
- `ValueSetEnumDefinitionWithStewardship`: Extends base with stewardship
  annotations (stewards, publishers, endorsers)
- `ValueSetEnumDefinitionWithSource`: For enums with external source
- `ValueSetEnumDefinitionWithConformance`: For enums conforming to standards
- `ReferenceEnumDefinition`: For reference/controlled vocabulary enums

### 2. Stewardship Value Set (src/valuesets/schema/stewardship.yaml)
Created enum of recognized steward organizations with ROR IDs:
- GA4GH, Gene Ontology Consortium, Genomic Standards Consortium
- Phenopackets, EMBL-EBI, NCBI, CDISC, HL7, ISO, W3C, etc.
- Supports tracking organizational responsibility for value sets

### 3. Batch Metadata Updates (All 516 Enums)
Updated all enums with required metadata fields:
- Added `title` (human-readable names)
- Ensured `description` present
- Set `status` (using StandardsMaturityLevel: DRAFT, STANDARD, etc.)
- Added default `contributors`: orcid:0000-0002-6601-2165 and
  github.com/anthropics/claude-code
- Added `instantiates` references to appropriate metaclasses

### 4. Stewardship Annotations (Conservative Application)
Applied stewardship metadata to well-established value sets:

**Gene Ontology (4 enums):**
- go_aspect.yaml, go_causality.yaml, go_evidence.yaml
- Stewards: https://geneontology.org
- Publishers: https://geneontology.org

**Phenopackets (11 enums in clinical/phenopackets.yaml):**
- KaryotypicSexEnum, PhenotypicSexEnum, AllelicStateEnum,
  ACMGPathogenicityEnum, etc.
- Stewards: https://phenopackets.org
- Publishers: https://ga4gh.org

**INSDC/GSC (2 enums):**
- insdc_geographic_locations.yaml, insdc_missing_values.yaml
- Stewards: https://gensc.org
- Publishers: https://www.insdc.org

Note: Used `annotations:` block since LinkML EnumDefinition doesn't
support custom stewardship slots directly.

### 5. Governance Documentation (docs/governance.md)
Created comprehensive governance documentation covering:
- Source of Truth model (mirrored vs community value sets)
- Maturity levels and lifecycle
- Stewardship roles and responsibilities
- Contributing process and quality standards
- Working group structure (placeholder)

### 6. Bug Fix: Stale Virtual Environment
Fixed pre-existing issue where `.venv` had stale shebangs pointing to
old project location (`common-value-sets`). Recreated venv, enabling:
- `just site` now succeeds
- `just test` passes (63 tests)
- All LinkML generators work correctly

## Impact
- All 516 enums now have complete, consistent metadata
- Stewardship tracking enables community governance
- Custom metamodel provides validation framework
- Foundation for future expansion of stewardship annotations

## Testing
- ✅ All 63 tests pass
- ✅ `just site` generates documentation successfully
- ✅ `just validate` runs without errors
- ✅ Schema validation passes
@claude
Copy link
Contributor

claude bot commented Nov 25, 2025

Claude finished @cmungall's task —— View job


PR Review: Add comprehensive enum metadata and stewardship framework

Review Checklist:

  • Analyze metamodel design and validation approach
  • Review enum metadata consistency across sample files
  • Examine stewardship implementation and organization data
  • Review governance documentation structure
  • Check for potential bugs and security issues
  • Assess performance implications
  • Evaluate test coverage and validation
  • Provide comprehensive feedback and recommendations

🔍 Review Summary

This is an excellent foundational PR that adds crucial governance infrastructure to the valuesets project. The implementation demonstrates thoughtful design and comprehensive coverage.

✅ Strengths

1. Well-Designed Metamodel Architecture

  • Clean separation of concerns: The metamodel (src/valuesets/metamodel/valuesets_meta.yaml) properly separates validation rules from the main schema
  • Hierarchical inheritance: Smart use of is_a relationships (ValueSetEnumDefinition → ValueSetEnumDefinitionWithStewardship)
  • FHIR alignment: Stewardship roles (stewards/publishers/endorsers) follow established HL7 FHIR MetadataResource patterns
  • Future-proof design: Uses instantiates for declarative conformance that can enable validation when LinkML supports it

2. Comprehensive Metadata Coverage

  • 516 enums updated: All enums now have complete metadata (title, description, status, contributors)
  • Consistent application: Excellent consistency across all domain files I sampled
  • Proper ontology mappings: Good use of meaning: fields with valid CURIEs (GO:, NCIT:, etc.)
  • Conservative stewardship: Appropriately applied only to well-established standards

3. Robust Stewardship Implementation

The stewardship.yaml design is particularly well done:

  • ROR IDs preferred: Uses Research Organization Registry IDs for authoritative identification
  • Comprehensive coverage: 15 key organizations spanning bioinformatics, standards bodies, and research institutions
  • Rich metadata: Each organization includes establishment dates, websites, and contextual annotations

4. Excellent Governance Documentation

The governance.md provides:

  • Clear stewardship model: Good distinction between mirrored vs. community value sets
  • Maturity progression: Well-defined advancement path from DRAFT to STANDARD
  • Future-oriented: Placeholder structure for working groups and steering committee

🐛 Issues Found

1. Minor Schema Inconsistencies

  • ID namespace inconsistency: Some files use https://w3id.org/linkml/valuesets/ while others use https://w3id.org/valuesets/ or https://w3id.org/common-value-sets/ (seen in maturity_levels.yaml:5)
  • Missing source attribution: Some enums with external origins lack source: fields

2. Validation Gaps

  • No metamodel validation enforcement: Current LinkML doesn't validate instantiates constraints
  • Missing status validation: No validation that status values match StandardsMaturityLevel enum
  • Stewardship requirements unclear: ValueSetEnumDefinitionWithStewardship requires "at least one steward OR publisher" but this isn't enforceable

3. Documentation Issues

  • Placeholder URLs: governance.md contains placeholder URLs (https://example.org/valuesets-stewardship-signup)
  • Missing validation commands: Should mention running just validate after changes per CLAUDE.md guidelines

🔒 Security Assessment

  • No security issues identified: All changes are schema/metadata updates
  • Safe external references: All stewardship URLs point to legitimate organizations
  • No credentials exposed: No secrets or sensitive information in the changes

⚡ Performance Considerations

  • Minimal performance impact: Metadata additions don't affect runtime performance
  • Large file count: 100 changed files is significant but well-organized
  • Generation efficiency: Should verify just site generation time isn't significantly impacted

🧪 Test Coverage Assessment

Reviewed existing test suite:

  • Good enum functionality tests: /tests/test_rich_enums.py provides comprehensive coverage
  • Missing metamodel tests: No tests for new validation metaclasses
  • No governance validation tests: Should add tests for stewardship metadata validation

📋 Recommendations

High Priority

  1. Standardize ID namespaces: Choose one pattern and apply consistently across all schema files
  2. Add metamodel tests: Create tests validating that enums properly instantiate their declared metaclasses
  3. Fix placeholder URLs: Replace example.org URLs in governance.md with real endpoints or mark as "TBD"

Medium Priority

  1. Add validation helper: Create just validate-stewardship command to check stewardship metadata completeness
  2. Document stewardship criteria: Add guidelines for when to apply stewardship annotations
  3. Consider status validation: Add validation that status values are from StandardsMaturityLevel enum

Low Priority

  1. Enhance test coverage: Add tests for new governance features
  2. Performance baseline: Measure and document site generation time impact
  3. Working group structure: Begin organizing domain-specific working groups mentioned in governance.md

💡 Architecture Praise

The annotation-based stewardship approach is particularly clever:

annotations:
  stewards: https://geneontology.org
  publishers: https://geneontology.org

This works around LinkML's current limitation where EnumDefinition doesn't support custom slots directly, while maintaining clean semantics and future extensibility.

🎯 Conclusion

This PR represents exceptional work that establishes crucial governance foundations. The metamodel design is sophisticated, the metadata coverage is comprehensive, and the stewardship model is well-architected. The minor issues identified are easily addressable and don't detract from the overall quality.

Recommendation: Approve with minor fixes

The identified namespace inconsistencies and placeholder URLs should be addressed, but the core contribution is excellent and ready for adoption.

Changes:
- Changed all enums to use `valuesets_meta:ValueSetEnumDefinition`
  instead of `ValueSetEnumDefinitionWithStewardship`
- Single base metaclass is simpler and cleaner
- Stewardship metadata remains via annotations block
- Removed Python 3.9 from CI test matrix (EOL)
- Updated 6 schema files: GO (3), INSDC (2), Phenopackets (1)
- Regenerated Python datamodels

All enums now consistently use the same base metaclass while
maintaining stewardship annotations where appropriate.
@claude
Copy link
Contributor

claude bot commented Nov 25, 2025

Claude finished @cmungall's task —— View job


PR Review: Add comprehensive enum metadata and stewardship framework

Review Checklist:

  • Analyze metamodel design and validation approach
  • Review enum metadata consistency across sample files
  • Examine stewardship implementation and organization data
  • Review governance documentation structure
  • Check for potential bugs and security issues
  • Assess performance implications
  • Evaluate test coverage and validation
  • Provide comprehensive feedback and recommendations

🔍 Review Summary

This is an excellent foundational PR that adds crucial governance infrastructure to the valuesets project. The implementation demonstrates thoughtful design and comprehensive coverage.

✅ Strengths

1. Well-Designed Metamodel Architecture

  • Clean separation of concerns: The metamodel (src/valuesets/metamodel/valuesets_meta.yaml) properly separates validation rules from the main schema
  • Hierarchical inheritance: Smart use of is_a relationships (ValueSetEnumDefinition → ValueSetEnumDefinitionWithStewardship)
  • FHIR alignment: Stewardship roles (stewards/publishers/endorsers) follow established HL7 FHIR MetadataResource patterns
  • Future-proof design: Uses instantiates for declarative conformance that can enable validation when LinkML supports it

2. Comprehensive Metadata Coverage

  • 516 enums updated: All enums now have complete metadata (title, description, status, contributors)
  • Consistent application: Excellent consistency across all domain files I sampled
  • Proper ontology mappings: Good use of meaning: fields with valid CURIEs (GO:, NCIT:, etc.)
  • Conservative stewardship: Appropriately applied only to well-established standards

3. Robust Stewardship Implementation

The stewardship.yaml design is particularly well done:

  • ROR IDs preferred: Uses Research Organization Registry IDs for authoritative identification
  • Comprehensive coverage: 15 key organizations spanning bioinformatics, standards bodies, and research institutions
  • Rich metadata: Each organization includes establishment dates, websites, and contextual annotations

4. Excellent Governance Documentation

The governance.md provides:

  • Clear stewardship model: Good distinction between mirrored vs. community value sets
  • Maturity progression: Well-defined advancement path from DRAFT to STANDARD
  • Future-oriented: Placeholder structure for working groups and steering committee

🐛 Issues Found

1. Minor Schema Inconsistencies

  • ID namespace inconsistency: Some files use https://w3id.org/linkml/valuesets/ while others use https://w3id.org/valuesets/ or https://w3id.org/common-value-sets/ (seen in maturity_levels.yaml:5)
  • Missing source attribution: Some enums with external origins lack source: fields

2. Validation Gaps

  • No metamodel validation enforcement: Current LinkML doesn't validate instantiates constraints
  • Missing status validation: No validation that status values match StandardsMaturityLevel enum
  • Stewardship requirements unclear: ValueSetEnumDefinitionWithStewardship requires "at least one steward OR publisher" but this isn't enforceable

3. Documentation Issues

  • Placeholder URLs: governance.md contains placeholder URLs (https://example.org/valuesets-stewardship-signup)
  • Missing validation commands: Should mention running just validate after changes per CLAUDE.md guidelines

🔒 Security Assessment

  • No security issues identified: All changes are schema/metadata updates
  • Safe external references: All stewardship URLs point to legitimate organizations
  • No credentials exposed: No secrets or sensitive information in the changes

⚡ Performance Considerations

  • Minimal performance impact: Metadata additions don't affect runtime performance
  • Large file count: 100 changed files is significant but well-organized
  • Generation efficiency: Should verify just site generation time isn't significantly impacted

🧪 Test Coverage Assessment

Reviewed existing test suite:

  • Good enum functionality tests: /tests/test_rich_enums.py provides comprehensive coverage
  • Missing metamodel tests: No tests for new validation metaclasses
  • No governance validation tests: Should add tests for stewardship metadata validation

📋 Recommendations

High Priority

  1. Standardize ID namespaces: Choose one pattern and apply consistently across all schema files
  2. Add metamodel tests: Create tests validating that enums properly instantiate their declared metaclasses
  3. Fix placeholder URLs: Replace example.org URLs in governance.md with real endpoints or mark as "TBD"

Medium Priority

  1. Add validation helper: Create just validate-stewardship command to check stewardship metadata completeness
  2. Document stewardship criteria: Add guidelines for when to apply stewardship annotations
  3. Consider status validation: Add validation that status values are from StandardsMaturityLevel enum

Low Priority

  1. Enhance test coverage: Add tests for new governance features
  2. Performance baseline: Measure and document site generation time impact
  3. Working group structure: Begin organizing domain-specific working groups mentioned in governance.md

💡 Architecture Praise

The annotation-based stewardship approach is particularly clever:

annotations:
  stewards: https://geneontology.org
  publishers: https://geneontology.org

This works around LinkML's current limitation where EnumDefinition doesn't support custom slots directly, while maintaining clean semantics and future extensibility.

🎯 Conclusion

This PR represents exceptional work that establishes crucial governance foundations. The metamodel design is sophisticated, the metadata coverage is comprehensive, and the stewardship model is well-architected. The minor issues identified are easily addressable and don't detract from the overall quality.

Recommendation: Approve with minor fixes

The identified namespace inconsistencies and placeholder URLs should be addressed, but the core contribution is excellent and ready for adoption.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the valuesets project by adding comprehensive metadata and stewardship tracking for all 516 enums across 109 schema files. The changes establish a custom metamodel for validation, define stewardship organizations, and apply metadata to all enums in the repository.

Key changes:

  • Created a custom metamodel (valuesets_meta.yaml) with metaclasses for enum validation
  • Introduced stewardship.yaml defining 15 recognized organizations for value set governance
  • Updated all 516 enums with title, description, status, contributors, and instantiates references
  • Applied stewardship metadata to 17 well-established value sets (GO, Phenopackets, INSDC/GSC)
  • Fixed pre-existing venv issue with stale shebangs
  • Created governance documentation

Reviewed changes

Copilot reviewed 132 out of 136 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/valuesets/metamodel/valuesets_meta.yaml Defines metaclasses for enum validation and stewardship tracking
src/valuesets/schema/stewardship.yaml New enum defining 15 stewardship organizations with ROR IDs
src/valuesets/schema/**/*.yaml 109 schema files updated with enum metadata (title, status, contributors, instantiates)
src/valuesets/schema/clinical/phenopackets.yaml Added stewardship annotations to 11 Phenopackets enums
docs/governance.md New governance documentation
justfile Added gen-slides to site target

@cmungall cmungall merged commit 1f21cae into main Nov 26, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants