-
Notifications
You must be signed in to change notification settings - Fork 0
Add new value sets and improve existing enums #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## New Value Sets ### Plant Experimental Conditions (PECO-based) - `PlantExposureEnum` - Dynamic enum for all plant exposures (abiotic + biotic) - `PlantStudyConditionEnum` - Field, greenhouse, growth chamber, laboratory studies - `SeasonalEnvironmentExposureEnum` - Spring, summer, autumn, winter, dry/rainy seasons, kharif/rabi (South Asian agricultural seasons) - `EcologicalEnvironmentExposureEnum` - Climate zones (tropical, sub-tropical, temperate, desert), altitude, atmospheric pressure, lowland/upland regions - `PlantGrowthMediumExposureEnum` - Soil, hydroponic, aeroponic, in vitro media ### Radiation Types (ENVO-based) - `RadiationTypeEnum` - Dynamic enum for all radiation types - `ElectromagneticRadiationTypeEnum` - Full EM spectrum from gamma rays to radio waves, plus PAR and longwave radiation - `InfraredRadiationTypeEnum` - Near, mid, and far infrared - `AcousticRadiationTypeEnum` - Acoustic, ultrasound, infrasound ### Publishing - `ArxivCategoryEnum` - Complete arXiv subject classification (physics, math, cs, q-bio, q-fin, stat, eess, econ) ### Other New Enums - `PlannedProcessCompletionStatus` - COB-based status for planned processes (completed/failed) - Various domain-specific enums: genetics, geospatial formats, FAO soil, carcinogenicity, radionuclides, stewardship ## Ontology Mapping Fixes ### PlantSexEnum - Removed inappropriate UBERON and GSSO mappings (these ontologies are for animals, not plants) - Changed HERMAPHRODITIC from UBERON:0007197 to PATO:0001340 - Removed GSSO mappings from DIOECIOUS and MONOECIOUS (no appropriate PATO/PO terms exist) ## Enum Enhancements ### GOEvidenceCode - Added detailed descriptions for all evidence codes - Added structured annotations: is_experimental, is_manual, is_high_throughput - Enhanced documentation explaining evidence categories ### MonthOfYearEnum - Changed month_number annotations to use standard LinkML `rank` field for ordering ### CellCyclePhase - Added ontology mappings and enhanced metadata ## Infrastructure ### New CLI - Added `src/valuesets/cli.py` with typer-based command interface ### New Utilities - `classifier.py` - Value set classification utilities - `query_describer.py` - Query description utilities ### Scripts - `add_enum_metadata.py` - Add metadata to enums - `add_instantiates.py` - Add instantiates declarations - `enrich_enum_docs.py` - Enrich enum documentation - `fix_enum_titles.py` - Fix enum titles ### Build System - Updated justfile with new targets - Added project.justfile for project-specific tasks - Updated pyproject.toml dependencies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Claude finished @cmungall's task —— View job PR Review Complete ✅I've thoroughly reviewed this substantial PR that adds multiple new value sets and improves existing enums. Here's my comprehensive analysis: Review Checklist
✅ Strengths1. Excellent LinkML Schema Organization
2. High-Quality Ontology Mappings
3. Rich Metadata and Documentation
4. Robust Python Code
🔍 Areas for Improvement1. Schema Validation Concerns
2. Test Coverage
3. Minor Code Quality Issues
# Current: Generic ImportError handling
except ImportError as e:
raise typer.BadParameter(f"Cannot import module '{module}': {e}")
# Suggest: More specific error types for different failure modes
4. Documentation
🚨 Potential Issues1. Performance Considerations
2. Security
📋 Specific Recommendations
🎯 Domain-Specific AssessmentPlant Experimental Conditions (PECO-based): ✅ Excellent coverage of plant research conditions with proper PECO mappings Radiation Types (ENVO-based): ✅ Comprehensive electromagnetic spectrum coverage with accurate scientific terminology Carcinogenicity Classifications: ✅ Authoritative use of IARC, EPA IRIS classifications with proper evidence levels arXiv Categories: ✅ Complete taxonomy with appropriate NCIT/MeSH mappings where available 🏁 Overall AssessmentThis is a high-quality PR that significantly enhances the value sets collection. The code follows LinkML best practices, uses authoritative ontology sources, and includes well-designed Python utilities. The modular approach and rich metadata make these value sets highly usable for scientific applications. Recommendation: Approve with minor suggestions The improvements to GOEvidenceCode and the PlantSexEnum ontology mapping fixes alone make this a valuable contribution. The new domain-specific value sets are well-researched and properly implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds several new value sets across multiple domains and improves existing enums with better ontology mappings and documentation. The changes include new plant experimental conditions enums (PECO-based), radiation type enums (ENVO-based), the complete arXiv subject classification system, and various domain-specific enums for genetics, geospatial formats, FAO soil types, carcinogenicity classifications, and radionuclides. Additionally, the PR fixes incorrect ontology mappings in PlantSexEnum, enhances GOEvidenceCode with descriptions and structured annotations, standardizes MonthOfYearEnum to use the rank field, and introduces new utility modules for enum classification and query description along with a CLI interface using typer.
- New value sets for plant experimental conditions, radiation types, arXiv categories, and various domain-specific classifications
- Fixed ontology mappings in PlantSexEnum and enhanced GOEvidenceCode with structured annotations
- Added utility modules for classifier functionality and query description, plus new CLI with typer
Reviewed changes
Copilot reviewed 53 out of 56 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/valuesets/utils/query_describer.py | Adds utility for describing dynamic enum queries in human-readable text |
| src/valuesets/utils/classifier.py | Implements classifier utilities for enum values based on numeric range annotations |
| src/valuesets/utils/init.py | Exports new classifier functions |
| src/valuesets/schema/valuesets.yaml | Adds import for arxiv_categories schema |
| src/valuesets/schema/time/temporal.yaml | Updates MonthOfYearEnum to use standard rank field instead of custom month_number annotation |
| src/valuesets/schema/publishing/arxiv_categories.yaml | Adds complete arXiv subject classification system with all categories |
| src/valuesets/schema/physics/radiation.yaml | Adds ENVO-based radiation type enums |
| src/valuesets/schema/investigation.yaml | Adds PlannedProcessCompletionStatus enum |
| src/valuesets/schema/demographics.yaml | Adds OccupationEnum dynamic enum |
| src/valuesets/schema/bio/plant_sex.yaml | Removes inappropriate UBERON/GSSO mappings for plant sex terms |
| src/valuesets/schema/bio/plant_experimental_conditions.yaml | Adds PECO-based plant experimental condition enums |
| src/valuesets/schema/bio/go_evidence.yaml | Enhances GOEvidenceCode with descriptions and structured annotations |
| src/valuesets/schema/bio/cell_cycle.yaml | Adds succeeded_by annotations to meiotic phases |
| src/valuesets/generators/modular_rich_generator.py | Adds support for rank field in metadata building |
| src/valuesets/enums/ (multiple files) | Generated enum files for new value sets |
| src/valuesets/cli.py | New CLI implementation using typer |
| scripts/ (multiple files) | New utility scripts for enum enrichment and metadata management |
| pyproject.toml | Adds typer dependency and CLI entry point |
| output = sys.stdout if output_path is None else open(output_path, "w", newline="") | ||
| try: | ||
| writer = csv.DictWriter(output, fieldnames=fieldnames) | ||
| writer.writeheader() | ||
| writer.writerows(results) | ||
| finally: | ||
| if output_path: | ||
| output.close() | ||
| elif output_format == "jsonl": | ||
| output = sys.stdout if output_path is None else open(output_path, "w") | ||
| try: | ||
| for row in results: | ||
| output.write(json.dumps(row) + "\n") | ||
| finally: | ||
| if output_path: | ||
| output.close() |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File handle should be opened using a context manager to ensure proper closure even if an exception occurs. Wrap the file operations in a with statement or ensure output.close() is called in all code paths.
| output = sys.stdout if output_path is None else open(output_path, "w", newline="") | |
| try: | |
| writer = csv.DictWriter(output, fieldnames=fieldnames) | |
| writer.writeheader() | |
| writer.writerows(results) | |
| finally: | |
| if output_path: | |
| output.close() | |
| elif output_format == "jsonl": | |
| output = sys.stdout if output_path is None else open(output_path, "w") | |
| try: | |
| for row in results: | |
| output.write(json.dumps(row) + "\n") | |
| finally: | |
| if output_path: | |
| output.close() | |
| if output_path: | |
| with open(output_path, "w", newline="") as output: | |
| writer = csv.DictWriter(output, fieldnames=fieldnames) | |
| writer.writeheader() | |
| writer.writerows(results) | |
| else: | |
| output = sys.stdout | |
| writer = csv.DictWriter(output, fieldnames=fieldnames) | |
| writer.writeheader() | |
| writer.writerows(results) | |
| elif output_format == "jsonl": | |
| if output_path: | |
| with open(output_path, "w") as output: | |
| for row in results: | |
| output.write(json.dumps(row) + "\n") | |
| else: | |
| output = sys.stdout | |
| for row in results: | |
| output.write(json.dumps(row) + "\n") |
| output = sys.stdout if output_path is None else open(output_path, "w", newline="") | ||
| try: | ||
| writer = csv.DictWriter(output, fieldnames=fieldnames) | ||
| writer.writeheader() | ||
| writer.writerows(results) | ||
| finally: | ||
| if output_path: | ||
| output.close() | ||
| elif output_format == "jsonl": | ||
| output = sys.stdout if output_path is None else open(output_path, "w") | ||
| try: | ||
| for row in results: | ||
| output.write(json.dumps(row) + "\n") | ||
| finally: | ||
| if output_path: | ||
| output.close() |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File handle should be opened using a context manager to ensure proper closure even if an exception occurs. Wrap the file operations in a with statement or ensure output.close() is called in all code paths.
| output = sys.stdout if output_path is None else open(output_path, "w", newline="") | |
| try: | |
| writer = csv.DictWriter(output, fieldnames=fieldnames) | |
| writer.writeheader() | |
| writer.writerows(results) | |
| finally: | |
| if output_path: | |
| output.close() | |
| elif output_format == "jsonl": | |
| output = sys.stdout if output_path is None else open(output_path, "w") | |
| try: | |
| for row in results: | |
| output.write(json.dumps(row) + "\n") | |
| finally: | |
| if output_path: | |
| output.close() | |
| if output_path is None: | |
| writer = csv.DictWriter(sys.stdout, fieldnames=fieldnames) | |
| writer.writeheader() | |
| writer.writerows(results) | |
| else: | |
| with open(output_path, "w", newline="") as output: | |
| writer = csv.DictWriter(output, fieldnames=fieldnames) | |
| writer.writeheader() | |
| writer.writerows(results) | |
| elif output_format == "jsonl": | |
| if output_path is None: | |
| for row in results: | |
| sys.stdout.write(json.dumps(row) + "\n") | |
| else: | |
| with open(output_path, "w") as output: | |
| for row in results: | |
| output.write(json.dumps(row) + "\n") |
Co-authored-by: Copilot <[email protected]>
|
Claude finished @cmungall's task —— View job PR Review Complete ✅I've thoroughly reviewed this substantial PR that adds multiple new value sets and improves existing enums. Here's my comprehensive analysis: Review Checklist
✅ Strengths1. Excellent LinkML Schema Organization
2. High-Quality Ontology Mappings
3. Rich Metadata and Documentation
4. Robust Python Code
🔍 Areas for Improvement1. Schema Validation Concerns
2. Test Coverage
3. Minor Code Quality Issues
# Current: Generic ImportError handling
except ImportError as e:
raise typer.BadParameter(f"Cannot import module '{module}': {e}")
# Suggest: More specific error types for different failure modes
4. Documentation
🚨 Potential Issues1. Performance Considerations
2. Security
📋 Specific Recommendations
🎯 Domain-Specific AssessmentPlant Experimental Conditions (PECO-based): ✅ Excellent coverage of plant research conditions with proper PECO mappings Radiation Types (ENVO-based): ✅ Comprehensive electromagnetic spectrum coverage with accurate scientific terminology Carcinogenicity Classifications: ✅ Authoritative use of IARC, EPA IRIS classifications with proper evidence levels arXiv Categories: ✅ Complete taxonomy with appropriate NCIT/MeSH mappings where available 🏁 Overall AssessmentThis is a high-quality PR that significantly enhances the value sets collection. The code follows LinkML best practices, uses authoritative ontology sources, and includes well-designed Python utilities. The modular approach and rich metadata make these value sets highly usable for scientific applications. Recommendation: Approve with minor suggestions The improvements to GOEvidenceCode and the PlantSexEnum ontology mapping fixes alone make this a valuable contribution. The new domain-specific value sets are well-researched and properly implemented. |
Co-authored-by: Copilot <[email protected]>
|
Claude finished @cmungall's task —— View job PR Review Complete ✅I've thoroughly reviewed this substantial PR that adds multiple new value sets and improves existing enums. Here's my comprehensive analysis: Review Checklist
✅ Strengths1. Excellent LinkML Schema Organization
2. High-Quality Ontology Mappings
3. Rich Metadata and Documentation
4. Robust Python Code
🔍 Areas for Improvement1. Schema Validation Concerns
2. Test Coverage
3. Minor Code Quality Issues
# Current: Generic ImportError handling
except ImportError as e:
raise typer.BadParameter(f"Cannot import module '{module}': {e}")
# Suggest: More specific error types for different failure modes
4. Documentation
🚨 Potential Issues1. Performance Considerations
2. Security
📋 Specific Recommendations
🎯 Domain-Specific AssessmentPlant Experimental Conditions (PECO-based): ✅ Excellent coverage of plant research conditions with proper PECO mappings Radiation Types (ENVO-based): ✅ Comprehensive electromagnetic spectrum coverage with accurate scientific terminology Carcinogenicity Classifications: ✅ Authoritative use of IARC, EPA IRIS classifications with proper evidence levels arXiv Categories: ✅ Complete taxonomy with appropriate NCIT/MeSH mappings where available 🏁 Overall AssessmentThis is a high-quality PR that significantly enhances the value sets collection. The code follows LinkML best practices, uses authoritative ontology sources, and includes well-designed Python utilities. The modular approach and rich metadata make these value sets highly usable for scientific applications. Recommendation: Approve with minor suggestions The improvements to GOEvidenceCode and the PlantSexEnum ontology mapping fixes alone make this a valuable contribution. The new domain-specific value sets are well-researched and properly implemented. |
Summary
This PR adds several new value sets across multiple domains and improves existing enums with better ontology mappings and documentation.
New Value Sets
Plant Experimental Conditions (PECO-based)
PlantExposureEnum- Dynamic enum for all plant exposuresPlantStudyConditionEnum- Field, greenhouse, growth chamber, laboratorySeasonalEnvironmentExposureEnum- Standard seasons + South Asian agricultural seasonsEcologicalEnvironmentExposureEnum- Climate zones, altitude, land characteristicsPlantGrowthMediumExposureEnum- Soil, hydroponic, aeroponic, in vitroRadiation Types (ENVO-based)
RadiationTypeEnum- Dynamic enum for all radiation typesElectromagneticRadiationTypeEnum- Full EM spectrumInfraredRadiationTypeEnum- Near, mid, far IRAcousticRadiationTypeEnum- Acoustic, ultrasound, infrasoundOther
ArxivCategoryEnum- Complete arXiv subject classificationPlannedProcessCompletionStatus- COB-based process statusFixes
Enhancements
rankfield instead of custom annotationsTest plan
just site🤖 Generated with Claude Code