Convert HBCD (HEALthy Brain and Child Development) LORIS data dictionaries to ReproSchema with an automated, change‑gated GitHub Actions pipeline.
- Converts LORIS CSVs into a three‑layer ReproSchema structure (Protocol → Activities → Items).
- Detects significant changes between CSV versions and only converts when thresholds are met.
- Validates generated schemas and publishes tagged releases after PRs are merged.
- Configure secrets: add
HBCD_USERNAMEandHBCD_PASSWORD(Repository → Settings → Secrets and variables → Actions). - Run “Retrieve HBCD Data Dictionary” (manual or scheduled weekly). It:
- Downloads latest CSV → compares with previous → decides if conversion is needed.
- If significant: commits the CSV and dispatches “Automated LORIS to ReproSchema Update”.
- Automated Update creates a PR with generated schemas, logs, and a report.
- Merge the PR. After merge, a tag
vYYYY.MM.DD(.N)is created and a GitHub Release is published.
# Env (Python 3.10)
micromamba create -n hbcd python=3.10
micromamba activate hbcd
micromamba install -c conda-forge requests pandas pyyaml beautifulsoup4
pip install reproschema pre-commit
# Convert + validate
python scripts/loris2reproschema.py \
--csv_file loris_data_dictionaries/hbcd_data_dictionary_YYYY-MM-DD.csv \
--config_file config/conversion.yml \
--output_path reproschema_output
reproschema validate reproschema_output/HBCD_LORIS/HBCD_LORIS_schemaconfig/ # YAML configs (conversion, thresholds, pipeline)
reproschema_output/ # Generated protocol + activities
loris_data_dictionaries/# Downloaded LORIS CSVs
scripts/ # Conversion + automation tools
.github/workflows/ # CI workflows (retrieve, automated update)
logs/, reports/, docs/ # Artifacts, summaries, and comparison data
config/conversion.yml: column mappings, type overrides, and protocol settings.config/change_detection.yml: thresholds and column rules (aligned to CSV headers).config/pipeline.yml: output paths, validation options, and comparison settings.
- Orchestrator:
./.github/workflows/automated_update.ymldelegates to two reusable workflows.reusable-update.yml: checks out code, runs conversion + validation, opens an auto PR if changes exist, and posts a one‑line schema diff summary (main → HEAD). Attachespr-schema-diffJSON as an artifact.- Release on merge:
release_on_auto_pr_merge.ymltriggers when the auto-update PR is merged, then tags (vYYYY.MM.DD(.N)), creates a GitHub Release, and publishes comparison JSON for recent tags intodocs/data/.
- Composite actions: live under
./.github/actions/for easy reuse.setup-python-deps: sets up Python 3.10 and installs pinned deps.pr-schema-diff: generates a JSON diff and emits a concise summary for PRs.publish-comparisons: generates/commits website comparison data on main.
- On‑demand comparisons: run any pair without regenerating everything.
- Action: “Generate On‑Demand Comparison” (workflow_dispatch). Inputs:
from_ref,to_ref,publish(false by default).- Always uploads the JSON as an artifact; when
publish=true, it commitsdocs/data/<from>_to_<to>.jsonso the website can show it.
- Always uploads the JSON as an artifact; when
- CLI example:
gh workflow run compare_on_demand.yml -f from_ref=v2025.09.15 -f to_ref=v2025.10.05 -f publish=true --ref main.
- Action: “Generate On‑Demand Comparison” (workflow_dispatch). Inputs:
- Diff website:
docs/index.htmllists tag versions dynamically and includes an On‑Demand panel.- Versions are tags only, loaded from
docs/data/index.json(published on main) with a GitHub API fallback. - Enter refs to preview a published pair; if not found, run the on‑demand Action with
publish=trueto publishdocs/data/<from>_to_<to>.json.
- Versions are tags only, loaded from
- Pre-commit:
pre-commit install && pre-commit run --all-files(Black, YAML/JSON checks, optional validation hook). - Validate schemas:
reproschema validate reproschema_output/HBCD_LORIS/HBCD_LORIS_schema. - Data quality checks (non-failing):
python scripts/check_data_quality.py.
- Tags/releases are created only after the auto PR is merged.
- Comparisons are generated as artifacts; they do not block CI.
MIT — see LICENSE.