hbcd-loris2reproschema

Convert HBCD (HEALthy Brain and Child Development) LORIS data dictionaries to ReproSchema with an automated, change‑gated GitHub Actions pipeline.

What This Repo Does

Converts LORIS CSVs into a three‑layer ReproSchema structure (Protocol → Activities → Items).
Detects significant changes between CSV versions and only converts when thresholds are met.
Validates generated schemas and publishes tagged releases after PRs are merged.

Recommended: Online Workflow (GitHub Actions)

Configure secrets: add HBCD_USERNAME and HBCD_PASSWORD (Repository → Settings → Secrets and variables → Actions).
Run “Retrieve HBCD Data Dictionary” (manual or scheduled weekly). It:
- Downloads latest CSV → compares with previous → decides if conversion is needed.
- If significant: commits the CSV and dispatches “Automated LORIS to ReproSchema Update”.
Automated Update creates a PR with generated schemas, logs, and a report.
Merge the PR. After merge, a tag vYYYY.MM.DD(.N) is created and a GitHub Release is published.

Optional: Local Use (Developers)

# Env (Python 3.10)
micromamba create -n hbcd python=3.10
micromamba activate hbcd
micromamba install -c conda-forge requests pandas pyyaml beautifulsoup4
pip install reproschema pre-commit

# Convert + validate
python scripts/loris2reproschema.py \
  --csv_file loris_data_dictionaries/hbcd_data_dictionary_YYYY-MM-DD.csv \
  --config_file config/conversion.yml \
  --output_path reproschema_output
reproschema validate reproschema_output/HBCD_LORIS/HBCD_LORIS_schema

Repository Structure

config/                 # YAML configs (conversion, thresholds, pipeline)
reproschema_output/     # Generated protocol + activities
loris_data_dictionaries/# Downloaded LORIS CSVs
scripts/                # Conversion + automation tools
.github/workflows/      # CI workflows (retrieve, automated update)
logs/, reports/, docs/  # Artifacts, summaries, and comparison data

Configuration

config/conversion.yml: column mappings, type overrides, and protocol settings.
config/change_detection.yml: thresholds and column rules (aligned to CSV headers).
config/pipeline.yml: output paths, validation options, and comparison settings.

CI Overview

Orchestrator: ./.github/workflows/automated_update.yml delegates to two reusable workflows.
- reusable-update.yml: checks out code, runs conversion + validation, opens an auto PR if changes exist, and posts a one‑line schema diff summary (main → HEAD). Attaches pr-schema-diff JSON as an artifact.
- Release on merge: release_on_auto_pr_merge.yml triggers when the auto-update PR is merged, then tags (vYYYY.MM.DD(.N)), creates a GitHub Release, and publishes comparison JSON for recent tags into docs/data/.
Composite actions: live under ./.github/actions/ for easy reuse.
- setup-python-deps: sets up Python 3.10 and installs pinned deps.
- pr-schema-diff: generates a JSON diff and emits a concise summary for PRs.
- publish-comparisons: generates/commits website comparison data on main.
On‑demand comparisons: run any pair without regenerating everything.
- Action: “Generate On‑Demand Comparison” (workflow_dispatch). Inputs: from_ref, to_ref, publish (false by default).
  - Always uploads the JSON as an artifact; when publish=true, it commits docs/data/<from>_to_<to>.json so the website can show it.
- CLI example: gh workflow run compare_on_demand.yml -f from_ref=v2025.09.15 -f to_ref=v2025.10.05 -f publish=true --ref main.
Diff website: docs/index.html lists tag versions dynamically and includes an On‑Demand panel.
- Versions are tags only, loaded from docs/data/index.json (published on main) with a GitHub API fallback.
- Enter refs to preview a published pair; if not found, run the on‑demand Action with publish=true to publish docs/data/<from>_to_<to>.json.

Development & Quality

Pre-commit: pre-commit install && pre-commit run --all-files (Black, YAML/JSON checks, optional validation hook).
Validate schemas: reproschema validate reproschema_output/HBCD_LORIS/HBCD_LORIS_schema.
Data quality checks (non-failing): python scripts/check_data_quality.py.

Notes

Tags/releases are created only after the auto PR is merged.
Comparisons are generated as artifacts; they do not block CI.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hbcd-loris2reproschema

What This Repo Does

Recommended: Online Workflow (GitHub Actions)

Optional: Local Use (Developers)

Repository Structure

Configuration

CI Overview

Development & Quality

Notes

License

About

Uh oh!

Releases 2

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
config		config
docs		docs
loris_data_dictionaries		loris_data_dictionaries
notes		notes
reports		reports
reproschema_output		reproschema_output
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

License

ReproNim/hbcd-loris2reproschema

Folders and files

Latest commit

History

Repository files navigation

hbcd-loris2reproschema

What This Repo Does

Recommended: Online Workflow (GitHub Actions)

Optional: Local Use (Developers)

Repository Structure

Configuration

CI Overview

Development & Quality

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Uh oh!

Languages

Packages