|
1 | 1 | # KG-Alzheimers |
2 | 2 | [](https://Knowledge-Graph-Hub.github.io/kg-alzheimers/) |
3 | 3 |
|
| 4 | +KG-Alzheimers builds and distributes an Alzheimer’s disease-focused biomedical |
| 5 | +knowledge graph by harmonizing Monarch Initiative and partner data sources into |
| 6 | +BioLink Model-compliant KGX exports, DuckDB/SQLite databases, JSONL feeds, and |
| 7 | +search indexes. |
4 | 8 |
|
5 | | -KG-Alzhheimers is a knowledge graph integrating biomedical data related to Alzheimer's |
6 | | -Disease. |
| 9 | +## Highlights |
7 | 10 |
|
| 11 | +- Integrates dozens of genomics, phenomics, pathway, and literature resources |
| 12 | + behind a repeatable ETL pipeline. |
| 13 | +- Ships a Typer-powered CLI (`ingest`) that orchestrates download, transform, |
| 14 | + merge, QC, export, and packaging steps. |
| 15 | +- Produces denormalized TSV bundles, RDF/JSONL snapshots, and Solr-ready |
| 16 | + indexes suitable for analytics or downstream applications. |
| 17 | +- Uses Poetry for dependency management and reproducible environments; CI/CD via |
| 18 | + Jenkins keeps public releases up to date. |
8 | 19 |
|
| 20 | +## Getting Started |
| 21 | + |
| 22 | +> Requires Python 3.10+ and [Poetry](https://python-poetry.org/). |
| 23 | +
|
| 24 | +```bash |
| 25 | +git clone https://github.com/Knowledge-Graph-Hub/kg-alzheimers.git |
| 26 | +cd kg-alzheimers |
| 27 | +poetry install |
| 28 | + |
| 29 | +# Optional: activate the virtual environment created by Poetry |
| 30 | +poetry shell |
| 31 | +``` |
| 32 | + |
| 33 | +To download all referenced datasets and run the full build pipeline locally: |
| 34 | + |
| 35 | +```bash |
| 36 | +# Retrieve source data declared in src/kg_alzheimers/download.yaml |
| 37 | +poetry run ingest download --all --write-metadata |
| 38 | + |
| 39 | +# Execute Phenio preprocessing plus all Koza ingests |
| 40 | +poetry run ingest transform --all --log --rdf --write-metadata |
| 41 | + |
| 42 | +# Merge transformed outputs into a unified KG bundle with QC checks |
| 43 | +poetry run ingest merge |
| 44 | + |
| 45 | +# (Optional) Generate closure-enriched denormalized tables |
| 46 | +poetry run ingest closure |
| 47 | + |
| 48 | +# Prepare release artifacts (gzipped DuckDB/TSV/JSONL bundles) |
| 49 | +poetry run ingest prepare-release |
| 50 | +``` |
| 51 | + |
| 52 | +Additional commands are documented in [`docs/CLI.md`](docs/CLI.md) and include: |
| 53 | + |
| 54 | +- `poetry run ingest export` — create filtered TSV/JSONL dumps defined in |
| 55 | + `src/kg_alzheimers/data-dump-config.yaml`. |
| 56 | +- `poetry run ingest report` — run DuckDB QC SQL scripts to audit the merge. |
| 57 | +- `poetry run ingest sqlite` / `poetry run ingest solr` — load artifacts into |
| 58 | + local SQLite or Solr instances for exploration. |
| 59 | + |
| 60 | +> **Note:** The `ingest release` command is deprecated; releases are created by |
| 61 | +> the Jenkins pipeline described in `Jenkinsfile`. |
| 62 | +
|
| 63 | +## Documentation |
| 64 | + |
| 65 | +- Detailed guides, modeling principles, and ingest walkthroughs live at |
| 66 | + [Knowledge-Graph-Hub.github.io/kg-alzheimers](https://Knowledge-Graph-Hub.github.io/kg-alzheimers/). |
| 67 | +- Source for the documentation site resides in the [`docs/`](docs) directory |
| 68 | + and is built with MkDocs Material (`mkdocs.yaml`). |
| 69 | + |
| 70 | +## Development |
| 71 | + |
| 72 | +- Run the unit test suite: |
| 73 | + |
| 74 | + ```bash |
| 75 | + poetry run pytest |
| 76 | + ``` |
| 77 | + |
| 78 | +- Format and lint using the configured tooling (Black and Ruff): |
| 79 | + |
| 80 | + ```bash |
| 81 | + poetry run black src tests |
| 82 | + poetry run ruff check src tests |
| 83 | + ``` |
| 84 | + |
| 85 | +Issues and contributions are welcome via GitHub. To propose a new ingest, follow |
| 86 | +the workflow documented under `docs/Create-an-Ingest/`. |
0 commit comments