|
| 1 | +# NHP Model - Copilot Coding Agent Instructions |
| 2 | + |
| 3 | +## Repository Overview |
| 4 | + |
| 5 | +This is the **New Hospital Programme (NHP) Demand Model**, a Python package for healthcare activity prediction. The model provides modeling capabilities for inpatients, outpatients, and A&E (Accident & Emergency) services. It is built as a Python library using modern packaging tools and is deployed as both a Python package and a Docker container to Azure. |
| 6 | + |
| 7 | +**Key Facts:** |
| 8 | +- **Project Type:** Python package/library with Docker containerization |
| 9 | +- **Python Version:** Requires Python 3.11 or higher (specified in pyproject.toml) |
| 10 | +- **Package Manager:** `uv` (modern Python package manager from Astral) |
| 11 | +- **Build System:** setuptools with setuptools-scm for versioning |
| 12 | +- **Primary Language:** Python |
| 13 | +- **Project Size:** Medium-sized Python project |
| 14 | +- **Main Modules:** nhp.model (core model code), nhp.docker (Docker runtime) |
| 15 | + |
| 16 | +## Environment Setup and Build Instructions |
| 17 | + |
| 18 | +### Initial Setup |
| 19 | + |
| 20 | +**ALWAYS start by installing uv and project dependencies:** |
| 21 | + |
| 22 | +```bash |
| 23 | +# Install uv using the recommended approach from Astral |
| 24 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 25 | + |
| 26 | +# Install project dependencies (production only) |
| 27 | +uv sync |
| 28 | + |
| 29 | +# Install with dev dependencies for development/testing (RECOMMENDED for development) |
| 30 | +uv sync --extra dev |
| 31 | + |
| 32 | +# Install with docs dependencies for documentation |
| 33 | +uv sync --extra docs |
| 34 | + |
| 35 | +# Install multiple extras at once |
| 36 | +uv sync --extra dev --extra docs |
| 37 | +``` |
| 38 | + |
| 39 | +**Important:** The `uv sync` command only installs production dependencies. For development work (linting, testing), use `uv sync --extra dev` to install the dev dependencies. |
| 40 | + |
| 41 | +**Python Version:** The project requires Python 3.11+. The CI uses Python 3.11 specifically via `uv python install` in workflows. |
| 42 | + |
| 43 | +### Build Commands |
| 44 | + |
| 45 | +**To build the package:** |
| 46 | + |
| 47 | +```bash |
| 48 | +# Standard build - creates wheel and source distribution |
| 49 | +uv build |
| 50 | + |
| 51 | +# Build for development (sets version to 0.dev0) |
| 52 | +SETUPTOOLS_SCM_PRETEND_VERSION=0.dev0 uv build |
| 53 | +``` |
| 54 | + |
| 55 | +The build creates: |
| 56 | +- `dist/nhp_model-<version>-py3-none-any.whl` |
| 57 | +- `dist/nhp_model-<version>.tar.gz` |
| 58 | + |
| 59 | +**Note:** The Dockerfile includes a TODO comment about forcing version numbers during Docker builds. Currently it uses `ENV SETUPTOOLS_SCM_PRETEND_VERSION=v0.0.0` as a workaround. |
| 60 | + |
| 61 | +### Testing |
| 62 | + |
| 63 | +**Unit Tests (ALWAYS run these before committing):** |
| 64 | + |
| 65 | +```bash |
| 66 | +# Run all unit tests |
| 67 | +uv run pytest tests/unit --verbose |
| 68 | + |
| 69 | +# Run unit tests with coverage report |
| 70 | +uv run pytest --cov=. tests/unit --ignore=tests --cov-branch --cov-report xml:coverage.xml |
| 71 | +``` |
| 72 | + |
| 73 | +**Integration Tests:** |
| 74 | + |
| 75 | +```bash |
| 76 | +# Integration tests require test data in a specific format |
| 77 | +# These are located in tests/integration/ but may require data setup |
| 78 | +uv run pytest tests/integration --verbose |
| 79 | +``` |
| 80 | + |
| 81 | +**All unit tests must pass. Test failures are NOT acceptable.** |
| 82 | + |
| 83 | +### Linting and Formatting |
| 84 | + |
| 85 | +**ALWAYS run linting before committing. All linting checks MUST pass:** |
| 86 | + |
| 87 | +```bash |
| 88 | +# Run ruff linting check |
| 89 | +uvx ruff check . |
| 90 | + |
| 91 | +# Run ruff format check (no auto-formatting) |
| 92 | +uvx ruff format --check . |
| 93 | + |
| 94 | +# Auto-format code (if needed) |
| 95 | +uvx ruff format . |
| 96 | + |
| 97 | +# Run type checking with ty |
| 98 | +uvx ty check . |
| 99 | +``` |
| 100 | + |
| 101 | +**Linting Configuration:** |
| 102 | +- Ruff config is in `pyproject.toml` under `[tool.ruff]` |
| 103 | +- Line length: 100 characters |
| 104 | +- Target Python version: 3.11 |
| 105 | +- Excludes: `notebooks/` directory |
| 106 | +- Key rules: pydocstyle (D), pycodestyle (E/W), isort (I), pylint (PL), pandas-vet (PD), numpy (NPY), ruff-specific (RUF) |
| 107 | +- Docstring convention: Google style |
| 108 | + |
| 109 | +**The notebooks directory is excluded from linting and should not be linted.** |
| 110 | + |
| 111 | +### Documentation |
| 112 | + |
| 113 | +```bash |
| 114 | +# Build documentation (requires docs dependencies) |
| 115 | +uv run mkdocs build --clean |
| 116 | + |
| 117 | +# Serve documentation locally |
| 118 | +uv run mkdocs serve |
| 119 | +``` |
| 120 | + |
| 121 | +Documentation is deployed automatically to Connect via CI on main branch pushes. |
| 122 | + |
| 123 | +### Running the Model |
| 124 | + |
| 125 | +**Local execution:** |
| 126 | + |
| 127 | +```bash |
| 128 | +# Run with sample parameters (requires data in specified path) |
| 129 | +uv run python -m nhp.model queue/params-sample.json -d data/synth --type all |
| 130 | + |
| 131 | +# Run single model type |
| 132 | +uv run python -m nhp.model queue/params-sample.json -d data --type ip # inpatients |
| 133 | +uv run python -m nhp.model queue/params-sample.json -d data --type op # outpatients |
| 134 | +uv run python -m nhp.model queue/params-sample.json -d data --type aae # A&E |
| 135 | + |
| 136 | +# Run specific model iteration for debugging |
| 137 | +uv run python -m nhp.model queue/params-sample.json -d data --model-run 1 --type ip |
| 138 | +``` |
| 139 | + |
| 140 | +**Command-line arguments:** |
| 141 | +- `params_file`: Path to JSON parameters file (default: `queue/params-sample.json`) |
| 142 | +- `-d, --data-path`: Path to data directory (default: `data`) |
| 143 | +- `-r, --model-run`: Which model iteration to run (default: 1) |
| 144 | +- `-t, --type`: Model type - `all`, `ip`, `op`, or `aae` (default: `all`) |
| 145 | +- `--save-full-model-results`: Save complete model results |
| 146 | + |
| 147 | +**Data Requirements:** |
| 148 | +The model expects data in parquet format organized by fiscal year and dataset: |
| 149 | +- Format: `{data_path}/{file}/fyear={year}/dataset={dataset}/` |
| 150 | +- Required files: `ip`, `op`, `aae`, `demographic_factors`, `birth_factors`, `hsa_activity_tables`, `hsa_gams` (pickle) |
| 151 | +- Sample data location: `data/synth/` (synthetic dataset for testing - see GitHub issue #347) |
| 152 | + |
| 153 | +## Project Structure |
| 154 | + |
| 155 | +### Directory Layout |
| 156 | + |
| 157 | +**Core Directories:** |
| 158 | +- `.github/workflows/` - CI/CD pipelines (linting, codecov, build, deploy) |
| 159 | +- `src/nhp/model/` - Core model: `__main__.py`, `model.py`, `inpatients.py`, `outpatients.py`, `aae.py`, `run.py`, `results.py`, `data/` |
| 160 | +- `src/nhp/docker/` - Docker runtime with Azure Storage integration |
| 161 | +- `tests/unit/` - Unit tests |
| 162 | +- `tests/integration/` - Integration tests (require data) |
| 163 | +- `docs/` - MkDocs documentation |
| 164 | +- `notebooks/` - Databricks notebooks (excluded from linting) |
| 165 | +- `queue/` - Parameter files (params-sample.json) |
| 166 | + |
| 167 | +**Key Configuration Files:** |
| 168 | +- `pyproject.toml` - Project metadata, dependencies, ruff/pytest/setuptools config |
| 169 | +- `uv.lock` - Locked dependency versions (DO NOT modify manually) |
| 170 | +- `params-schema.json` - JSON schema for model parameters (deployed to GitHub Pages) |
| 171 | + |
| 172 | +### Architecture Overview |
| 173 | + |
| 174 | +**Model Hierarchy:** |
| 175 | +- `Model` (base class in model.py) - Common model functionality |
| 176 | + - `InpatientsModel` - Inpatient demand modeling |
| 177 | + - `OutpatientsModel` - Outpatient demand modeling |
| 178 | + - `AaEModel` - A&E demand modeling |
| 179 | + |
| 180 | +**Execution Flow:** |
| 181 | +1. `__main__.py` parses CLI arguments and loads parameters |
| 182 | +2. `run.py` orchestrates model execution (single or parallel runs) |
| 183 | +3. `ModelIteration` runs a single model iteration |
| 184 | +4. Results are aggregated and saved by `results.py` |
| 185 | + |
| 186 | +**Data Loading:** |
| 187 | +- Abstract `Data` interface allows multiple data sources |
| 188 | +- `Local` loads from local parquet files |
| 189 | +- `DatabricksNational` loads from Databricks (used in notebooks) |
| 190 | + |
| 191 | +## CI/CD Validation Pipeline |
| 192 | + |
| 193 | +### Pull Request Checks |
| 194 | + |
| 195 | +**Every pull request triggers these workflows (ALL MUST PASS):** |
| 196 | + |
| 197 | +1. **Linting** (`.github/workflows/linting.yaml`): |
| 198 | + - `ruff check` - Code quality checks |
| 199 | + - `ruff format --check` - Code formatting verification |
| 200 | + - `ty check .` - Type checking |
| 201 | + |
| 202 | +2. **Code Coverage** (`.github/workflows/codecov.yaml`): |
| 203 | + - Runs unit tests with coverage |
| 204 | + - Uploads to Codecov |
| 205 | + - Requires passing tests |
| 206 | + |
| 207 | +**IMPORTANT:** All linting and test checks must pass before merge. DO NOT skip or disable these checks. |
| 208 | + |
| 209 | +### Main Branch / Release Workflows |
| 210 | + |
| 211 | +On push to main or tags: |
| 212 | + |
| 213 | +1. **build_app.yaml**: Builds Python wheel, uploads to Azure Storage and GitHub releases |
| 214 | +2. **build_schema.yaml**: Deploys params-schema.json to GitHub Pages |
| 215 | +3. **build_container.yaml**: Builds and pushes Docker image to GitHub Container Registry |
| 216 | +4. **deploy_docs.yaml**: Builds and deploys MkDocs documentation to RStudio Connect |
| 217 | + |
| 218 | +### Docker Deployment |
| 219 | + |
| 220 | +The model is containerized using: |
| 221 | +- Base image: `ghcr.io/astral-sh/uv:python3.11-alpine` |
| 222 | +- Build args: `app_version`, `data_version`, `storage_account` |
| 223 | +- Entry point: `python -m nhp.docker` |
| 224 | +- Tags: `dev` (PRs), `v*.*.*` (releases), `latest` (latest release) |
| 225 | + |
| 226 | +## Common Issues and Workarounds |
| 227 | + |
| 228 | +**Known Issues:** |
| 229 | +1. **Dockerfile Version**: Uses `ENV SETUPTOOLS_SCM_PRETEND_VERSION=v0.0.0` because setuptools-scm needs git metadata (TODO: build wheel and copy instead) |
| 230 | +2. **Data Structure**: Model expects parquet files at `{data_path}/{file}/fyear={year}/dataset={dataset}/`. Missing files cause runtime errors. |
| 231 | +3. **Notebooks**: `notebooks/` directory excluded from linting - don't lint these Databricks notebooks. |
| 232 | + |
| 233 | +**Environment Variables (Docker):** |
| 234 | +- `APP_VERSION`, `DATA_VERSION` (default: "dev") |
| 235 | +- `STORAGE_ACCOUNT` (required for Azure), `BATCH_SIZE` (default: 16) |
| 236 | +- `.env` file supported via python-dotenv for local development |
| 237 | + |
| 238 | +## Testing Strategy |
| 239 | + |
| 240 | +- **Unit Tests**: `tests/unit/` - Mock-based, parameterized. **ALWAYS run before committing.** |
| 241 | +- **Integration Tests**: `tests/integration/` - Require properly formatted test data, test end-to-end runs |
| 242 | +- **Test Organization**: pytest-mock for mocking, fixtures in `tests/conftest.py` |
| 243 | +- **Coverage**: High coverage maintained via Codecov integration |
| 244 | + |
| 245 | +## Best Practices for Coding Agents |
| 246 | + |
| 247 | +1. **ALWAYS install dependencies first**: Run `uv sync --extra dev` before any development work. |
| 248 | + |
| 249 | +2. **ALWAYS run linting before committing**: Run `uvx ruff check .` and `uvx ruff format --check .` - these MUST pass. |
| 250 | + |
| 251 | +3. **ALWAYS run unit tests**: Run `uv run pytest tests/unit` before committing - all tests MUST pass. |
| 252 | + |
| 253 | +4. **Follow Google docstring convention**: All public functions/classes must have Google-style docstrings (enforced by ruff). |
| 254 | + |
| 255 | +5. **Respect line length**: Maximum 100 characters per line (ruff will enforce this). |
| 256 | + |
| 257 | +6. **Don't modify notebooks**: The `notebooks/` directory is excluded from linting for a reason. These are Databricks notebooks with special formatting. |
| 258 | + |
| 259 | +7. **Use uv for all Python commands**: Prefix commands with `uv run` to ensure correct virtual environment usage. |
| 260 | + |
| 261 | +8. **Don't modify uv.lock manually**: Use `uv sync` to update dependencies. |
| 262 | + |
| 263 | +9. **Test locally before pushing**: The CI checks are strict and will fail if linting/tests don't pass. |
| 264 | + |
| 265 | +10. **Understand the data structure**: The model requires specific data formats. If testing model execution, ensure proper test data is available or use existing test fixtures. |
| 266 | + |
| 267 | +## Quick Reference |
| 268 | + |
| 269 | +```bash |
| 270 | +# Setup (production + dev dependencies) |
| 271 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 272 | +uv sync --extra dev |
| 273 | + |
| 274 | +# Lint (MUST pass) |
| 275 | +uvx ruff check . |
| 276 | +uvx ruff format --check . |
| 277 | + |
| 278 | +# Test (MUST pass) |
| 279 | +uv run pytest tests/unit --verbose |
| 280 | + |
| 281 | +# Build |
| 282 | +uv build |
| 283 | + |
| 284 | +# Run model (requires data) |
| 285 | +uv run python -m nhp.model queue/params-sample.json -d data --type all |
| 286 | + |
| 287 | +# Build docs (requires docs extras) |
| 288 | +uv sync --extra docs |
| 289 | +uv run mkdocs build --clean |
| 290 | +``` |
| 291 | + |
| 292 | +**When in doubt, check the CI workflows in `.github/workflows/` - they define the exact validation steps used in the pipeline.** |
0 commit comments