Skip to content

Commit d619e7e

Browse files
committed
Add D4D HTML and YAML files to GitHub Pages
Changes: - Update .gitignore to track docs/html_output/ and docs/yaml_output/ - Add curated comprehensive datasheets (HTML + YAML) for AI-READI, CM4AI, VOICE - Add GPT-5 synthesized datasheets (HTML + YAML) for all 4 projects - Add individual dataset datasheets (FAIRHub, Dataverse, PhysioNet) - Add d4d_examples.md page with links to all datasheets - Add datasheet-common.css for styling This enables GitHub Pages to serve the D4D datasheet HTML files that are linked from the README.
1 parent a85987b commit d619e7e

31 files changed

+41445
-1
lines changed

.gitignore

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,15 @@
1-
/docs/
1+
# Ignore generated docs but keep GitHub Pages content
22
/project/docs/
33
/tmp/
44

5+
# Track GitHub Pages content in docs/
6+
# (Generated markdown/html files from LinkML are ignored, but D4D examples are tracked)
7+
docs/*
8+
!docs/*.md
9+
!docs/*.css
10+
!docs/html_output/
11+
!docs/yaml_output/
12+
513
# Byte-compiled / optimized / DLL files
614
__pycache__/
715
*.py[cod]

docs/about.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# data-sheets-schema
2+
3+
A LinkML schema for Datasheets for Datasets.

docs/d4d_examples.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# D4D Examples
2+
3+
This page provides links to rendered Datasheet for Datasets (D4D) examples for Bridge2AI data generating projects.
4+
5+
## Curated Comprehensive Datasheets
6+
7+
These are the most comprehensive datasheets for each project, created through extensive AI-powered synthesis:
8+
9+
### AI-READI
10+
- [Human Readable HTML](html_output/concatenated/curated/AI_READI_human_readable.html)
11+
- [LinkML Format HTML](html_output/concatenated/curated/AI_READI_linkml.html)
12+
- [Download YAML](yaml_output/concatenated/curated/AI_READI_curated.yaml)
13+
14+
### CM4AI
15+
- [Human Readable HTML](html_output/concatenated/curated/CM4AI_human_readable.html)
16+
- [LinkML Format HTML](html_output/concatenated/curated/CM4AI_linkml.html)
17+
- [Download YAML](yaml_output/concatenated/curated/CM4AI_curated.yaml)
18+
19+
### VOICE
20+
- [Human Readable HTML](html_output/concatenated/curated/VOICE_human_readable.html)
21+
- [LinkML Format HTML](html_output/concatenated/curated/VOICE_linkml.html)
22+
- [Download YAML](yaml_output/concatenated/curated/VOICE_curated.yaml)
23+
24+
## GPT-5 Synthesized Datasheets
25+
26+
These datasheets were automatically synthesized from multiple documents using GPT-5:
27+
28+
### AI-READI
29+
- [Synthesized HTML](html_output/concatenated/AI_READI_d4d_synthesized.html)
30+
- [Download YAML](yaml_output/concatenated/gpt5/AI_READI_d4d.yaml)
31+
32+
### CHORUS
33+
- [Synthesized HTML](html_output/concatenated/CHORUS_d4d_synthesized.html)
34+
- [Download YAML](yaml_output/concatenated/gpt5/CHORUS_d4d.yaml)
35+
36+
### CM4AI
37+
- [Synthesized HTML](html_output/concatenated/CM4AI_d4d_synthesized.html)
38+
- [Download YAML](yaml_output/concatenated/gpt5/CM4AI_d4d.yaml)
39+
40+
### VOICE
41+
- [Synthesized HTML](html_output/concatenated/VOICE_d4d_synthesized.html)
42+
- [Download YAML](yaml_output/concatenated/gpt5/VOICE_d4d.yaml)
43+
44+
## Individual Dataset Datasheets
45+
46+
These datasheets were created from specific dataset metadata sources:
47+
48+
### AI-READI (FAIRHub v3)
49+
- [Human Readable](html_output/D4D_-_AI-READI_FAIRHub_v3_human_readable.html)
50+
- [LinkML Format](html_output/D4D_-_AI-READI_FAIRHub_v3_linkml.html)
51+
52+
### CM4AI (Dataverse v3)
53+
- [Human Readable](html_output/D4D_-_CM4AI_Dataverse_v3_human_readable.html)
54+
- [LinkML Format](html_output/D4D_-_CM4AI_Dataverse_v3_linkml.html)
55+
56+
### VOICE (PhysioNet v3)
57+
- [Human Readable](html_output/D4D_-_VOICE_PhysioNet_v3_human_readable.html)
58+
- [LinkML Format](html_output/D4D_-_VOICE_PhysioNet_v3_linkml.html)
59+
60+
## About the Datasheets
61+
62+
### Curated Comprehensive Datasheets
63+
The **Curated Comprehensive Datasheets** represent the most complete and authoritative metadata for each project, created through extensive AI-powered synthesis of multiple data sources and documentation. These files include both human-readable HTML renderings and downloadable YAML source files.
64+
65+
### GPT-5 Synthesized Datasheets
66+
The **GPT-5 Synthesized Datasheets** were created by:
67+
1. Concatenating multiple project-related documents in reproducible order
68+
2. Processing with GPT-5 to extract and synthesize D4D metadata
69+
3. Validating against the LinkML schema
70+
4. Rendering to human-readable HTML format
71+
72+
These provide automated comprehensive project-level metadata and include both HTML views and downloadable YAML files.
73+
74+
### Individual Dataset Datasheets
75+
The **Individual Dataset Datasheets** provide detailed metadata for specific datasets from each project's primary data repository (FAIRHub, Dataverse, PhysioNet).
76+
77+
## Schema Information
78+
79+
All datasheets conform to the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) framework by Gebru et al., implemented using the [Bridge2AI LinkML schema](https://github.com/bridge2ai/data-sheets-schema).
80+
81+
The YAML files can be validated, transformed, and processed using LinkML tools. See the [LinkML documentation](https://linkml.io/) for more information.

docs/html_output/D4D_-_AI-READI_FAIRHub_v3_human_readable.html

Lines changed: 917 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)