You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add v5 datasheets and semantic evaluations to documentation
- Add new 'Latest: v5 Datasheets and Semantic Evaluations' section
- Include links to all 8 v5 HTML files (4 datasheets + 4 evaluations)
- Document rubric20-semantic evaluation framework and results
- Show evaluation scores: VOICE (96.4%), AI-READI (94.0%), CM4AI (91.7%), CHORUS (84.5%)
- Add comprehensive About section explaining v5 generation and evaluation process
This enables GitHub Pages deployment of v5 HTML files.
**About v5 Datasheets**: Generated using Claude Sonnet 4.5 with deterministic settings (temperature=0.0, model: claude-sonnet-4-5-20250929) on December 20, 2025. Each datasheet has been evaluated using the Rubric20-Semantic framework, which assesses 20 questions across 4 categories (Structural Completeness, Metadata Quality, Technical Documentation, FAIRness & Accessibility) with semantic validation of correctness and consistency.
94
+
95
+
**Average Score**: 77.0/84 (91.7%) across all 4 projects.
96
+
73
97
## Individual Dataset Datasheets
74
98
75
99
These datasheets were created from specific dataset metadata sources:
@@ -134,6 +158,45 @@ See [DETERMINISM.md](https://github.com/bridge2ai/data-sheets-schema/blob/main/D
134
158
### Individual Dataset Datasheets
135
159
The **Individual Dataset Datasheets** provide detailed metadata for specific datasets from each project's primary data repository (FAIRHub, Dataverse, PhysioNet). These focus on individual dataset instances rather than project-level metadata.
136
160
161
+
### v5 Datasheets and Semantic Evaluations
162
+
The **v5 Datasheets** represent the latest generation (December 2025) of comprehensive project metadata created using Claude Sonnet 4.5 with fully deterministic settings:
163
+
164
+
**Generation Process:**
165
+
1. Multiple project-related documents concatenated in reproducible order
166
+
2. AI-powered extraction and synthesis using Claude Sonnet 4.5
167
+
3. Temperature=0.0 for deterministic output
168
+
4. Pinned model version (claude-sonnet-4-5-20250929) for reproducibility
169
+
5. Validation against the LinkML D4D schema
170
+
171
+
**Semantic Evaluation Framework:**
172
+
Each v5 datasheet has been evaluated using the **Rubric20-Semantic** framework, which provides:
173
+
174
+
-**20 Questions** across 4 categories:
175
+
1. Structural Completeness (max 24 points) - Schema field population and required elements
176
+
2. Metadata Quality (max 22 points) - Accuracy, specificity, and completeness of information
- Semantic analysis findings (correctness and consistency checks)
197
+
- Strengths, weaknesses, and recommendations
198
+
- Detailed justifications for all scoring decisions
199
+
137
200
## Schema Information
138
201
139
202
All datasheets conform to the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) framework by Gebru et al., implemented using the [Bridge2AI LinkML schema](https://github.com/bridge2ai/data-sheets-schema).
0 commit comments