You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+196Lines changed: 196 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,202 @@ We are also tracking related developments, such as augmented Datasheets for Data
23
23
Python datamodel
24
24
*[tests/](tests/) - Python tests
25
25
26
+
## D4D Metadata Generation
27
+
28
+
This repository supports two distinct approaches for generating D4D (Datasheets for Datasets) metadata from dataset documentation:
29
+
30
+
### Approach 1: Automated LLM API Agents 🤖
31
+
32
+
**Use when**: You need to batch-process many files automatically with minimal human intervention.
33
+
34
+
Automated scripts that use LLM APIs (OpenAI/Anthropic) to extract D4D metadata from dataset documentation. These agents run autonomously and can process hundreds of files in batch mode.
Simpler version without validation steps, suitable for clean input data.
94
+
95
+
**Requirements for API Agents**:
96
+
- Set `ANTHROPIC_API_KEY` or `OPENAI_API_KEY` environment variable
97
+
- Wrappers use GPT-5 by default (configurable)
98
+
- Files organized in column directories
99
+
100
+
---
101
+
102
+
### Approach 2: Interactive Coding Agents 👨💻
103
+
104
+
**Use when**: You need human oversight, domain expertise, or customized metadata extraction.
105
+
106
+
Use coding assistants like **Claude Code**, **GitHub Copilot**, or **Cursor** to generate D4D metadata interactively. This approach provides human-in-the-loop quality control and domain-specific reasoning.
107
+
108
+
#### 2.1 Using Claude Code (Recommended)
109
+
110
+
**Step 1**: Provide the schema and dataset documentation to Claude Code
111
+
112
+
```
113
+
Please generate D4D (Datasheets for Datasets) metadata for the dataset at:
0 commit comments