Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
0c04448
Extend LinkML schema with 100% KOGUT template coverage for DOE scient…
realmarcin Nov 23, 2025
314724e
Fix deprecated GitHub Actions versions in test workflow
realmarcin Nov 23, 2025
8bfb0cd
Add KOGUT template source files and update gitignore
realmarcin Nov 23, 2025
54ebef5
Update src/linkml/modelcards.yaml
realmarcin Nov 23, 2025
d4baa5c
Update src/data/examples/kogut/climate-model-kogut.yaml
realmarcin Nov 23, 2025
7b23d27
Update src/linkml/modelcards.yaml
realmarcin Nov 23, 2025
fec2929
Update src/data/examples/kogut/README.md
realmarcin Nov 23, 2025
25c0db0
Update src/data/examples/kogut/climate-model-kogut.yaml
realmarcin Nov 23, 2025
4f13a00
Update src/data/examples/kogut/climate-model-kogut.yaml
realmarcin Nov 23, 2025
6c07691
Update src/data/examples/kogut/README.md
realmarcin Nov 23, 2025
37eef9e
Update src/data/examples/kogut/README.md
realmarcin Nov 23, 2025
0a34e21
Initial plan
Copilot Nov 23, 2025
13cc800
Initial plan
Copilot Nov 23, 2025
1606602
Update src/linkml/modelcards.yaml
realmarcin Nov 23, 2025
0f633b2
Update src/linkml/modelcards.yaml
realmarcin Nov 23, 2025
6a00750
Change orcid field from string to uri range for consistency
Copilot Nov 23, 2025
9bd9daf
Initial plan
Copilot Nov 23, 2025
1af2022
Make training_data_separate and evaluation_data_separate boolean glob…
Copilot Nov 23, 2025
8234110
Fix evaluation_data_separate global slot to be boolean type
Copilot Nov 23, 2025
aa84b59
Fix Poetry package configuration and CI installation issues
realmarcin Nov 23, 2025
39aec46
Fix schema path references in about.yaml and Makefile
realmarcin Nov 23, 2025
d29b999
Fix test module import errors in test_data.py
realmarcin Nov 23, 2025
861cae5
Fix firewall blocking Poetry installation by using pre-installed version
realmarcin Nov 23, 2025
d00605f
Install Poetry via pip instead of pre-installed version
realmarcin Nov 23, 2025
7d5eda3
Fix Makefile to use poetry install --no-root
realmarcin Nov 23, 2025
9ed3a79
Fix deploy_documentation.yml to use --no-root flag
realmarcin Nov 23, 2025
bf07991
Update Python version from 3.9 to 3.12 in test workflow
realmarcin Nov 23, 2025
f160469
Fix missing --no-root flag in PyPI publish workflow
realmarcin Nov 23, 2025
571211c
Update schema: separate base and extended slots, rename KOGUT to exte…
Copilot Nov 23, 2025
6a445b0
Rename kogut directory and files to extended, update all references
Copilot Nov 23, 2025
ce1cad5
Clarify KOGUT path reference is for original template source
Copilot Nov 23, 2025
639bbc4
Update test to look for extended template examples instead of kogut
realmarcin Nov 23, 2025
64cff55
Fix CI/CD: Python 3.12, --no-root flags, and pyproject.toml deprecation
realmarcin Nov 23, 2025
35c65ae
Fix CI/CD: Python 3.12, --no-root flags, and pyproject.toml deprecation
realmarcin Nov 23, 2025
c3b7a40
Fix CI/CD: Python 3.12, --no-root flags, and pyproject.toml deprecation
realmarcin Nov 23, 2025
e947968
Merge main into schema-extend, keeping 'extended' terminology
realmarcin Nov 23, 2025
e3dd67c
Update PyYAML to 6.0.3 for Python 3.12 compatibility
realmarcin Nov 23, 2025
5e2093c
Update PyYAML to 6.0.3 for Python 3.12 compatibility
realmarcin Nov 23, 2025
12be021
Update PyYAML to 6.0.3 for Python 3.12 compatibility
realmarcin Nov 23, 2025
17e4c66
Update PyYAML to 6.0.3 for Python 3.12 compatibility
realmarcin Nov 23, 2025
54e09cb
Update greenlet to 3.2.4 for Python 3.12 compatibility
realmarcin Nov 23, 2025
d4600ad
Update greenlet to 3.2.4 for Python 3.12 compatibility
realmarcin Nov 23, 2025
0ae032d
Update greenlet to 3.2.4 for Python 3.12 compatibility
realmarcin Nov 23, 2025
2d638b4
Update greenlet to 3.2.4 for Python 3.12 compatibility
realmarcin Nov 23, 2025
7c2429d
Merge pull request #20 from bridge2ai/copilot/sub-pr-19
realmarcin Nov 23, 2025
98604ec
Merge pull request #21 from bridge2ai/copilot/sub-pr-19-again
realmarcin Nov 23, 2025
8dc97a9
Merge pull request #22 from bridge2ai/copilot/sub-pr-19-another-one
realmarcin Nov 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,3 +129,4 @@ dmypy.json

# Pyre type checker
.pyre/
.DS_Store
218 changes: 218 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,3 +282,221 @@ See `utils/README.md` for complete tool documentation.
- `modelcards.yaml` - Current production schema
- `modelcards_harmonized.yaml` - Proposed harmonized schema (conceptual, has naming conflicts)
- External reference pattern (recommended) - See examples in `src/data/examples/harmonized/`

## Model Card Extended Template

### Branch: `schema-extend`

The schema has been extended on the `schema-extend` branch to provide **100% coverage** for DOE scientific models through an extended template. This extended template emphasizes compute infrastructure, reproducibility, and mission relevance for scientific computing applications.

### Extensions Overview

**Schema Size**: ~1,500 lines (from 967 baseline)
**New Classes**: 10 extended template classes
**Enhanced Classes**: 6 existing classes
**New Slots**: ~40 new fields
**New Enums**: 1 (ContributorRoleEnum)

### New Classes (10)

1. **Contributor** - Role-based contributor attribution
- Fields: name, role (ContributorRoleEnum), email, orcid, affiliation
- Replaces/enhances simple `owner` class
- Example: `{name: "Jane Doe", role: developed_by, orcid: "https://orcid.org/0000-0002-1234-5678"}`

2. **ComputeInfrastructure** - Hardware/software used for training
- Fields: hardware, hardware_list, software, software_dependencies, training_speed
- Captures DOE facility information (NERSC, ALCF, OLCF)
- Example: `hardware_list: ["64 nodes × 4 NVIDIA A100 GPUs", "NERSC Perlmutter"]`

3. **Hyperparameters** - Complete training hyperparameters
- Fields: optimizer, learning_rate, batch_size, training_epochs, training_steps, etc.
- Supports LLM-specific fields (prompting_template, fine_tuning_method)
- Example: `{optimizer: AdamW, learning_rate: 0.0001, batch_size: 512}`

4. **ReproducibilityInfo** - Reproducibility documentation
- Fields: random_seed, environment_config, pipeline_url, hyperparameters
- Example: `{random_seed: 42, hyperparameters: {...}}`

5. **CodeExample** - Code snippets with language
- Fields: code, code_language, description
- Example: `{code: "import torch...", code_language: python}`

6. **UsageDocumentation** - Installation and usage
- Fields: installation_instructions, training_configuration, inference_configuration, code_examples
- Supports conda/docker/SLURM workflows

7. **MissionRelevance** - DOE mission alignment
- Fields: doe_project, doe_facility, funding_source, description
- Example: `{doe_facility: "NERSC Perlmutter", doe_project: "Climate Model Development"}`

8. **OutOfScopeUse** - Prohibited uses
- Fields: description
- Example: `{description: "Not for real-time weather forecasting"}`

9. **TrainingProcedure** - Training methodology
- Fields: description, methodology, reproducibility_info, pre_training_info, training_data_separate
- Nested hyperparameters and reproducibility info

10. **EvaluationProcedure** - Evaluation methodology
- Fields: description, benchmarks, baselines, sota_comparison, uncertainty_quantification, evaluation_data_separate
- Example: Benchmark comparisons, SOTA references, uncertainty analysis

### Enhanced Classes (6)

1. **Version** - Added `last_updated`, `superseded_by`
2. **License** - Added `license_name`, `license_link` for custom licenses
3. **ModelDetails** - Added `short_description`, `contributors` (role-based)
4. **ModelParameters** - Added `compute_infrastructure`, `training_procedure`
5. **QuantitativeAnalysis** - Added `evaluation_procedure`
6. **Considerations** - Added `out_of_scope_uses`

### New Root-Level Fields (2)

Added to `modelCard` class:
- `mission_relevance` (MissionRelevance)
- `usage_documentation` (UsageDocumentation)

### Extended Template Coverage

| Template Section | Schema Mapping | Coverage |
|---------------|----------------|----------|
| Model Details → Description | `model_details.short_description` | ✅ 100% |
| Model Details → Developed By | `model_details.contributors` (role: developed_by) | ✅ 100% |
| Model Details → Shared By | `model_details.contributors` (role: contributed_by) | ✅ 100% |
| Model Details → Version | `model_details.version` (enhanced) | ✅ 100% |
| Model Details → License | `model_details.licenses` (enhanced) | ✅ 100% |
| Compute Infrastructure → Hardware | `compute_infrastructure.hardware_list` | ✅ 100% |
| Compute Infrastructure → Software | `compute_infrastructure.software_dependencies` | ✅ 100% |
| Training → Dataset | `model_parameters.data` | ✅ 100% |
| Training → Procedure | `model_parameters.training_procedure` | ✅ 100% |
| Training → Reproducibility | `training_procedure.reproducibility_info` | ✅ 100% |
| Training → Hyperparameters | `reproducibility_info.hyperparameters` | ✅ 100% |
| Evaluation → Metrics | `quantitative_analysis.performance_metrics` | ✅ 100% |
| Evaluation → Procedure | `quantitative_analysis.evaluation_procedure` | ✅ 100% |
| Uses → Intended Uses | `considerations.use_cases` | ✅ 100% |
| Uses → Out-of-Scope | `considerations.out_of_scope_uses` | ✅ 100% |
| Limitations | `considerations.limitations` | ✅ 100% |
| Ethical Considerations | `considerations.ethical_considerations` | ✅ 100% |
| DOE Mission Relevance | `mission_relevance` | ✅ 100% |
| Usage Documentation | `usage_documentation` | ✅ 100% |

**Overall Coverage**: ✅ **100%**

### Examples

**Extended Template Example**: `src/data/examples/extended/climate-model-extended.yaml`
- Complete ClimateNet-v2 model card
- Demonstrates all extended template features
- Realistic DOE scientific model (climate AI)
- Includes:
- Role-based contributors with ORCID
- NERSC Perlmutter compute infrastructure
- Complete hyperparameters (optimizer, learning rate, batch size, etc.)
- Reproducibility info (random seed, environment)
- DOE mission relevance (BER funding, NERSC facility)
- Complete usage documentation (conda/docker/SLURM)
- Code examples in Python and Bash

**Example Documentation**: `src/data/examples/extended/README.md`
- Complete extended template feature documentation
- Before/after migration examples
- Coverage table
- Validation instructions

### Validation

Schema validates successfully with linkml-lint:
```bash
poetry run linkml-lint src/linkml/modelcards.yaml
```

Only non-blocking naming convention warnings (same as baseline).

### Use Cases

The extended template is ideal for:

1. **DOE Scientific Models**
- Climate models (E3SM, CESM, MPAS)
- Materials science, fusion, bioinformatics
- Any model trained at DOE facilities

2. **HPC/Supercomputing Applications**
- Models trained on NERSC Perlmutter, ALCF Polaris/Aurora, OLCF Frontier
- Large-scale distributed training
- Petabyte-scale datasets

3. **Reproducible Science**
- Complete environment specifications
- Random seeds and hyperparameters
- Training pipeline URLs
- Detailed methodology

4. **DOE Mission-Aligned Projects**
- Office of Science grants (BER, ASCR, NP, HEP)
- Facility-specific documentation
- Funding transparency

### Backward Compatibility

All extended template features are **fully backward compatible**:
- Existing model cards remain valid
- Extended fields are optional
- Legacy `owner` class preserved (alongside new `contributors`)
- No breaking changes to existing schema

### Migration Path

To upgrade an existing model card with extended template features:

1. **Add contributors** (optional, recommended):
```yaml
model_details:
contributors:
- name: "Jane Doe"
role: developed_by
orcid: "https://orcid.org/0000-0002-1234-5678"
```

2. **Add compute infrastructure** (optional):
```yaml
model_parameters:
compute_infrastructure:
hardware_list: ["64 × NVIDIA A100 GPUs"]
software_dependencies: "pytorch=2.1.0\nhorovod=0.28.1"
```

3. **Add reproducibility info** (optional):
```yaml
model_parameters:
training_procedure:
reproducibility_info:
random_seed: 42
hyperparameters:
optimizer: AdamW
learning_rate: 0.0001
```

4. **Add DOE mission relevance** (optional):
```yaml
mission_relevance:
doe_facility: "NERSC Perlmutter"
doe_project: "My DOE Project"
```

5. **Add usage documentation** (optional):
```yaml
usage_documentation:
installation_instructions: "pip install my-model"
code_examples:
- code: "import my_model"
code_language: "python"
```

### Related Files

- **Schema**: `src/linkml/modelcards.yaml` (on `schema-extend` branch)
- **Template Source**: `data/input_docs/KOGUT/model-card.md` (original LBNL DOE KOGUT template - path preserved for historical reference)
- **Example**: `src/data/examples/extended/climate-model-extended.yaml`
- **Example Docs**: `src/data/examples/extended/README.md`
Loading