@@ -282,3 +282,221 @@ See `utils/README.md` for complete tool documentation.
282282 - ` modelcards.yaml ` - Current production schema
283283 - ` modelcards_harmonized.yaml ` - Proposed harmonized schema (conceptual, has naming conflicts)
284284 - External reference pattern (recommended) - See examples in ` src/data/examples/harmonized/ `
285+
286+ ## Model Card Extended Template
287+
288+ ### Branch: ` schema-extend `
289+
290+ The schema has been extended on the ` schema-extend ` branch to provide ** 100% coverage** for DOE scientific models through an extended template. This extended template emphasizes compute infrastructure, reproducibility, and mission relevance for scientific computing applications.
291+
292+ ### Extensions Overview
293+
294+ ** Schema Size** : ~ 1,500 lines (from 967 baseline)
295+ ** New Classes** : 10 extended template classes
296+ ** Enhanced Classes** : 6 existing classes
297+ ** New Slots** : ~ 40 new fields
298+ ** New Enums** : 1 (ContributorRoleEnum)
299+
300+ ### New Classes (10)
301+
302+ 1 . ** Contributor** - Role-based contributor attribution
303+ - Fields: name, role (ContributorRoleEnum), email, orcid, affiliation
304+ - Replaces/enhances simple ` owner ` class
305+ - Example: ` {name: "Jane Doe", role: developed_by, orcid: "https://orcid.org/0000-0002-1234-5678"} `
306+
307+ 2 . ** ComputeInfrastructure** - Hardware/software used for training
308+ - Fields: hardware, hardware_list, software, software_dependencies, training_speed
309+ - Captures DOE facility information (NERSC, ALCF, OLCF)
310+ - Example: ` hardware_list: ["64 nodes × 4 NVIDIA A100 GPUs", "NERSC Perlmutter"] `
311+
312+ 3 . ** Hyperparameters** - Complete training hyperparameters
313+ - Fields: optimizer, learning_rate, batch_size, training_epochs, training_steps, etc.
314+ - Supports LLM-specific fields (prompting_template, fine_tuning_method)
315+ - Example: ` {optimizer: AdamW, learning_rate: 0.0001, batch_size: 512} `
316+
317+ 4 . ** ReproducibilityInfo** - Reproducibility documentation
318+ - Fields: random_seed, environment_config, pipeline_url, hyperparameters
319+ - Example: ` {random_seed: 42, hyperparameters: {...}} `
320+
321+ 5 . ** CodeExample** - Code snippets with language
322+ - Fields: code, code_language, description
323+ - Example: ` {code: "import torch...", code_language: python} `
324+
325+ 6 . ** UsageDocumentation** - Installation and usage
326+ - Fields: installation_instructions, training_configuration, inference_configuration, code_examples
327+ - Supports conda/docker/SLURM workflows
328+
329+ 7 . ** MissionRelevance** - DOE mission alignment
330+ - Fields: doe_project, doe_facility, funding_source, description
331+ - Example: ` {doe_facility: "NERSC Perlmutter", doe_project: "Climate Model Development"} `
332+
333+ 8 . ** OutOfScopeUse** - Prohibited uses
334+ - Fields: description
335+ - Example: ` {description: "Not for real-time weather forecasting"} `
336+
337+ 9 . ** TrainingProcedure** - Training methodology
338+ - Fields: description, methodology, reproducibility_info, pre_training_info, training_data_separate
339+ - Nested hyperparameters and reproducibility info
340+
341+ 10 . ** EvaluationProcedure** - Evaluation methodology
342+ - Fields: description, benchmarks, baselines, sota_comparison, uncertainty_quantification, evaluation_data_separate
343+ - Example: Benchmark comparisons, SOTA references, uncertainty analysis
344+
345+ ### Enhanced Classes (6)
346+
347+ 1 . ** Version** - Added ` last_updated ` , ` superseded_by `
348+ 2 . ** License** - Added ` license_name ` , ` license_link ` for custom licenses
349+ 3 . ** ModelDetails** - Added ` short_description ` , ` contributors ` (role-based)
350+ 4 . ** ModelParameters** - Added ` compute_infrastructure ` , ` training_procedure `
351+ 5 . ** QuantitativeAnalysis** - Added ` evaluation_procedure `
352+ 6 . ** Considerations** - Added ` out_of_scope_uses `
353+
354+ ### New Root-Level Fields (2)
355+
356+ Added to ` modelCard ` class:
357+ - ` mission_relevance ` (MissionRelevance)
358+ - ` usage_documentation ` (UsageDocumentation)
359+
360+ ### Extended Template Coverage
361+
362+ | Template Section | Schema Mapping | Coverage |
363+ | ---------------| ----------------| ----------|
364+ | Model Details → Description | ` model_details.short_description ` | ✅ 100% |
365+ | Model Details → Developed By | ` model_details.contributors ` (role: developed_by) | ✅ 100% |
366+ | Model Details → Shared By | ` model_details.contributors ` (role: contributed_by) | ✅ 100% |
367+ | Model Details → Version | ` model_details.version ` (enhanced) | ✅ 100% |
368+ | Model Details → License | ` model_details.licenses ` (enhanced) | ✅ 100% |
369+ | Compute Infrastructure → Hardware | ` compute_infrastructure.hardware_list ` | ✅ 100% |
370+ | Compute Infrastructure → Software | ` compute_infrastructure.software_dependencies ` | ✅ 100% |
371+ | Training → Dataset | ` model_parameters.data ` | ✅ 100% |
372+ | Training → Procedure | ` model_parameters.training_procedure ` | ✅ 100% |
373+ | Training → Reproducibility | ` training_procedure.reproducibility_info ` | ✅ 100% |
374+ | Training → Hyperparameters | ` reproducibility_info.hyperparameters ` | ✅ 100% |
375+ | Evaluation → Metrics | ` quantitative_analysis.performance_metrics ` | ✅ 100% |
376+ | Evaluation → Procedure | ` quantitative_analysis.evaluation_procedure ` | ✅ 100% |
377+ | Uses → Intended Uses | ` considerations.use_cases ` | ✅ 100% |
378+ | Uses → Out-of-Scope | ` considerations.out_of_scope_uses ` | ✅ 100% |
379+ | Limitations | ` considerations.limitations ` | ✅ 100% |
380+ | Ethical Considerations | ` considerations.ethical_considerations ` | ✅ 100% |
381+ | DOE Mission Relevance | ` mission_relevance ` | ✅ 100% |
382+ | Usage Documentation | ` usage_documentation ` | ✅ 100% |
383+
384+ ** Overall Coverage** : ✅ ** 100%**
385+
386+ ### Examples
387+
388+ ** Extended Template Example** : ` src/data/examples/extended/climate-model-extended.yaml `
389+ - Complete ClimateNet-v2 model card
390+ - Demonstrates all extended template features
391+ - Realistic DOE scientific model (climate AI)
392+ - Includes:
393+ - Role-based contributors with ORCID
394+ - NERSC Perlmutter compute infrastructure
395+ - Complete hyperparameters (optimizer, learning rate, batch size, etc.)
396+ - Reproducibility info (random seed, environment)
397+ - DOE mission relevance (BER funding, NERSC facility)
398+ - Complete usage documentation (conda/docker/SLURM)
399+ - Code examples in Python and Bash
400+
401+ ** Example Documentation** : ` src/data/examples/extended/README.md `
402+ - Complete extended template feature documentation
403+ - Before/after migration examples
404+ - Coverage table
405+ - Validation instructions
406+
407+ ### Validation
408+
409+ Schema validates successfully with linkml-lint:
410+ ``` bash
411+ poetry run linkml-lint src/linkml/modelcards.yaml
412+ ```
413+
414+ Only non-blocking naming convention warnings (same as baseline).
415+
416+ ### Use Cases
417+
418+ The extended template is ideal for:
419+
420+ 1 . ** DOE Scientific Models**
421+ - Climate models (E3SM, CESM, MPAS)
422+ - Materials science, fusion, bioinformatics
423+ - Any model trained at DOE facilities
424+
425+ 2 . ** HPC/Supercomputing Applications**
426+ - Models trained on NERSC Perlmutter, ALCF Polaris/Aurora, OLCF Frontier
427+ - Large-scale distributed training
428+ - Petabyte-scale datasets
429+
430+ 3 . ** Reproducible Science**
431+ - Complete environment specifications
432+ - Random seeds and hyperparameters
433+ - Training pipeline URLs
434+ - Detailed methodology
435+
436+ 4 . ** DOE Mission-Aligned Projects**
437+ - Office of Science grants (BER, ASCR, NP, HEP)
438+ - Facility-specific documentation
439+ - Funding transparency
440+
441+ ### Backward Compatibility
442+
443+ All extended template features are ** fully backward compatible** :
444+ - Existing model cards remain valid
445+ - Extended fields are optional
446+ - Legacy ` owner ` class preserved (alongside new ` contributors ` )
447+ - No breaking changes to existing schema
448+
449+ ### Migration Path
450+
451+ To upgrade an existing model card with extended template features:
452+
453+ 1 . ** Add contributors** (optional, recommended):
454+ ``` yaml
455+ model_details :
456+ contributors :
457+ - name : " Jane Doe"
458+ role : developed_by
459+ orcid : " https://orcid.org/0000-0002-1234-5678"
460+ ` ` `
461+
462+ 2. **Add compute infrastructure** (optional):
463+ ` ` ` yaml
464+ model_parameters :
465+ compute_infrastructure :
466+ hardware_list : ["64 × NVIDIA A100 GPUs"]
467+ software_dependencies : " pytorch=2.1.0\n horovod=0.28.1"
468+ ` ` `
469+
470+ 3. **Add reproducibility info** (optional):
471+ ` ` ` yaml
472+ model_parameters :
473+ training_procedure :
474+ reproducibility_info :
475+ random_seed : 42
476+ hyperparameters :
477+ optimizer : AdamW
478+ learning_rate : 0.0001
479+ ` ` `
480+
481+ 4. **Add DOE mission relevance** (optional):
482+ ` ` ` yaml
483+ mission_relevance :
484+ doe_facility : " NERSC Perlmutter"
485+ doe_project : " My DOE Project"
486+ ` ` `
487+
488+ 5. **Add usage documentation** (optional):
489+ ` ` ` yaml
490+ usage_documentation :
491+ installation_instructions : " pip install my-model"
492+ code_examples :
493+ - code : " import my_model"
494+ code_language : " python"
495+ ` ` `
496+
497+ ### Related Files
498+
499+ - **Schema**: ` src/linkml/modelcards.yaml` (on `schema-extend` branch)
500+ - **Template Source**: `data/input_docs/KOGUT/model-card.md` (original LBNL DOE KOGUT template - path preserved for historical reference)
501+ - **Example**: `src/data/examples/extended/climate-model-extended.yaml`
502+ - **Example Docs**: `src/data/examples/extended/README.md`
0 commit comments