Skip to content

Commit 94ec4a0

Browse files
authored
Merge pull request #504 from The-Strategy-Unit/copilot/add-copilot-instructions-file
Add comprehensive Copilot instructions for repository onboarding
2 parents 1194e1f + b80ff2f commit 94ec4a0

File tree

1 file changed

+292
-0
lines changed

1 file changed

+292
-0
lines changed

.github/copilot-instructions.md

Lines changed: 292 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
# NHP Model - Copilot Coding Agent Instructions
2+
3+
## Repository Overview
4+
5+
This is the **New Hospital Programme (NHP) Demand Model**, a Python package for healthcare activity prediction. The model provides modeling capabilities for inpatients, outpatients, and A&E (Accident & Emergency) services. It is built as a Python library using modern packaging tools and is deployed as both a Python package and a Docker container to Azure.
6+
7+
**Key Facts:**
8+
- **Project Type:** Python package/library with Docker containerization
9+
- **Python Version:** Requires Python 3.11 or higher (specified in pyproject.toml)
10+
- **Package Manager:** `uv` (modern Python package manager from Astral)
11+
- **Build System:** setuptools with setuptools-scm for versioning
12+
- **Primary Language:** Python
13+
- **Project Size:** Medium-sized Python project
14+
- **Main Modules:** nhp.model (core model code), nhp.docker (Docker runtime)
15+
16+
## Environment Setup and Build Instructions
17+
18+
### Initial Setup
19+
20+
**ALWAYS start by installing uv and project dependencies:**
21+
22+
```bash
23+
# Install uv using the recommended approach from Astral
24+
curl -LsSf https://astral.sh/uv/install.sh | sh
25+
26+
# Install project dependencies (production only)
27+
uv sync
28+
29+
# Install with dev dependencies for development/testing (RECOMMENDED for development)
30+
uv sync --extra dev
31+
32+
# Install with docs dependencies for documentation
33+
uv sync --extra docs
34+
35+
# Install multiple extras at once
36+
uv sync --extra dev --extra docs
37+
```
38+
39+
**Important:** The `uv sync` command only installs production dependencies. For development work (linting, testing), use `uv sync --extra dev` to install the dev dependencies.
40+
41+
**Python Version:** The project requires Python 3.11+. The CI uses Python 3.11 specifically via `uv python install` in workflows.
42+
43+
### Build Commands
44+
45+
**To build the package:**
46+
47+
```bash
48+
# Standard build - creates wheel and source distribution
49+
uv build
50+
51+
# Build for development (sets version to 0.dev0)
52+
SETUPTOOLS_SCM_PRETEND_VERSION=0.dev0 uv build
53+
```
54+
55+
The build creates:
56+
- `dist/nhp_model-<version>-py3-none-any.whl`
57+
- `dist/nhp_model-<version>.tar.gz`
58+
59+
**Note:** The Dockerfile includes a TODO comment about forcing version numbers during Docker builds. Currently it uses `ENV SETUPTOOLS_SCM_PRETEND_VERSION=v0.0.0` as a workaround.
60+
61+
### Testing
62+
63+
**Unit Tests (ALWAYS run these before committing):**
64+
65+
```bash
66+
# Run all unit tests
67+
uv run pytest tests/unit --verbose
68+
69+
# Run unit tests with coverage report
70+
uv run pytest --cov=. tests/unit --ignore=tests --cov-branch --cov-report xml:coverage.xml
71+
```
72+
73+
**Integration Tests:**
74+
75+
```bash
76+
# Integration tests require test data in a specific format
77+
# These are located in tests/integration/ but may require data setup
78+
uv run pytest tests/integration --verbose
79+
```
80+
81+
**All unit tests must pass. Test failures are NOT acceptable.**
82+
83+
### Linting and Formatting
84+
85+
**ALWAYS run linting before committing. All linting checks MUST pass:**
86+
87+
```bash
88+
# Run ruff linting check
89+
uvx ruff check .
90+
91+
# Run ruff format check (no auto-formatting)
92+
uvx ruff format --check .
93+
94+
# Auto-format code (if needed)
95+
uvx ruff format .
96+
97+
# Run type checking with ty
98+
uvx ty check .
99+
```
100+
101+
**Linting Configuration:**
102+
- Ruff config is in `pyproject.toml` under `[tool.ruff]`
103+
- Line length: 100 characters
104+
- Target Python version: 3.11
105+
- Excludes: `notebooks/` directory
106+
- Key rules: pydocstyle (D), pycodestyle (E/W), isort (I), pylint (PL), pandas-vet (PD), numpy (NPY), ruff-specific (RUF)
107+
- Docstring convention: Google style
108+
109+
**The notebooks directory is excluded from linting and should not be linted.**
110+
111+
### Documentation
112+
113+
```bash
114+
# Build documentation (requires docs dependencies)
115+
uv run mkdocs build --clean
116+
117+
# Serve documentation locally
118+
uv run mkdocs serve
119+
```
120+
121+
Documentation is deployed automatically to Connect via CI on main branch pushes.
122+
123+
### Running the Model
124+
125+
**Local execution:**
126+
127+
```bash
128+
# Run with sample parameters (requires data in specified path)
129+
uv run python -m nhp.model queue/params-sample.json -d data/synth --type all
130+
131+
# Run single model type
132+
uv run python -m nhp.model queue/params-sample.json -d data --type ip # inpatients
133+
uv run python -m nhp.model queue/params-sample.json -d data --type op # outpatients
134+
uv run python -m nhp.model queue/params-sample.json -d data --type aae # A&E
135+
136+
# Run specific model iteration for debugging
137+
uv run python -m nhp.model queue/params-sample.json -d data --model-run 1 --type ip
138+
```
139+
140+
**Command-line arguments:**
141+
- `params_file`: Path to JSON parameters file (default: `queue/params-sample.json`)
142+
- `-d, --data-path`: Path to data directory (default: `data`)
143+
- `-r, --model-run`: Which model iteration to run (default: 1)
144+
- `-t, --type`: Model type - `all`, `ip`, `op`, or `aae` (default: `all`)
145+
- `--save-full-model-results`: Save complete model results
146+
147+
**Data Requirements:**
148+
The model expects data in parquet format organized by fiscal year and dataset:
149+
- Format: `{data_path}/{file}/fyear={year}/dataset={dataset}/`
150+
- Required files: `ip`, `op`, `aae`, `demographic_factors`, `birth_factors`, `hsa_activity_tables`, `hsa_gams` (pickle)
151+
- Sample data location: `data/synth/` (synthetic dataset for testing - see GitHub issue #347)
152+
153+
## Project Structure
154+
155+
### Directory Layout
156+
157+
**Core Directories:**
158+
- `.github/workflows/` - CI/CD pipelines (linting, codecov, build, deploy)
159+
- `src/nhp/model/` - Core model: `__main__.py`, `model.py`, `inpatients.py`, `outpatients.py`, `aae.py`, `run.py`, `results.py`, `data/`
160+
- `src/nhp/docker/` - Docker runtime with Azure Storage integration
161+
- `tests/unit/` - Unit tests
162+
- `tests/integration/` - Integration tests (require data)
163+
- `docs/` - MkDocs documentation
164+
- `notebooks/` - Databricks notebooks (excluded from linting)
165+
- `queue/` - Parameter files (params-sample.json)
166+
167+
**Key Configuration Files:**
168+
- `pyproject.toml` - Project metadata, dependencies, ruff/pytest/setuptools config
169+
- `uv.lock` - Locked dependency versions (DO NOT modify manually)
170+
- `params-schema.json` - JSON schema for model parameters (deployed to GitHub Pages)
171+
172+
### Architecture Overview
173+
174+
**Model Hierarchy:**
175+
- `Model` (base class in model.py) - Common model functionality
176+
- `InpatientsModel` - Inpatient demand modeling
177+
- `OutpatientsModel` - Outpatient demand modeling
178+
- `AaEModel` - A&E demand modeling
179+
180+
**Execution Flow:**
181+
1. `__main__.py` parses CLI arguments and loads parameters
182+
2. `run.py` orchestrates model execution (single or parallel runs)
183+
3. `ModelIteration` runs a single model iteration
184+
4. Results are aggregated and saved by `results.py`
185+
186+
**Data Loading:**
187+
- Abstract `Data` interface allows multiple data sources
188+
- `Local` loads from local parquet files
189+
- `DatabricksNational` loads from Databricks (used in notebooks)
190+
191+
## CI/CD Validation Pipeline
192+
193+
### Pull Request Checks
194+
195+
**Every pull request triggers these workflows (ALL MUST PASS):**
196+
197+
1. **Linting** (`.github/workflows/linting.yaml`):
198+
- `ruff check` - Code quality checks
199+
- `ruff format --check` - Code formatting verification
200+
- `ty check .` - Type checking
201+
202+
2. **Code Coverage** (`.github/workflows/codecov.yaml`):
203+
- Runs unit tests with coverage
204+
- Uploads to Codecov
205+
- Requires passing tests
206+
207+
**IMPORTANT:** All linting and test checks must pass before merge. DO NOT skip or disable these checks.
208+
209+
### Main Branch / Release Workflows
210+
211+
On push to main or tags:
212+
213+
1. **build_app.yaml**: Builds Python wheel, uploads to Azure Storage and GitHub releases
214+
2. **build_schema.yaml**: Deploys params-schema.json to GitHub Pages
215+
3. **build_container.yaml**: Builds and pushes Docker image to GitHub Container Registry
216+
4. **deploy_docs.yaml**: Builds and deploys MkDocs documentation to RStudio Connect
217+
218+
### Docker Deployment
219+
220+
The model is containerized using:
221+
- Base image: `ghcr.io/astral-sh/uv:python3.11-alpine`
222+
- Build args: `app_version`, `data_version`, `storage_account`
223+
- Entry point: `python -m nhp.docker`
224+
- Tags: `dev` (PRs), `v*.*.*` (releases), `latest` (latest release)
225+
226+
## Common Issues and Workarounds
227+
228+
**Known Issues:**
229+
1. **Dockerfile Version**: Uses `ENV SETUPTOOLS_SCM_PRETEND_VERSION=v0.0.0` because setuptools-scm needs git metadata (TODO: build wheel and copy instead)
230+
2. **Data Structure**: Model expects parquet files at `{data_path}/{file}/fyear={year}/dataset={dataset}/`. Missing files cause runtime errors.
231+
3. **Notebooks**: `notebooks/` directory excluded from linting - don't lint these Databricks notebooks.
232+
233+
**Environment Variables (Docker):**
234+
- `APP_VERSION`, `DATA_VERSION` (default: "dev")
235+
- `STORAGE_ACCOUNT` (required for Azure), `BATCH_SIZE` (default: 16)
236+
- `.env` file supported via python-dotenv for local development
237+
238+
## Testing Strategy
239+
240+
- **Unit Tests**: `tests/unit/` - Mock-based, parameterized. **ALWAYS run before committing.**
241+
- **Integration Tests**: `tests/integration/` - Require properly formatted test data, test end-to-end runs
242+
- **Test Organization**: pytest-mock for mocking, fixtures in `tests/conftest.py`
243+
- **Coverage**: High coverage maintained via Codecov integration
244+
245+
## Best Practices for Coding Agents
246+
247+
1. **ALWAYS install dependencies first**: Run `uv sync --extra dev` before any development work.
248+
249+
2. **ALWAYS run linting before committing**: Run `uvx ruff check .` and `uvx ruff format --check .` - these MUST pass.
250+
251+
3. **ALWAYS run unit tests**: Run `uv run pytest tests/unit` before committing - all tests MUST pass.
252+
253+
4. **Follow Google docstring convention**: All public functions/classes must have Google-style docstrings (enforced by ruff).
254+
255+
5. **Respect line length**: Maximum 100 characters per line (ruff will enforce this).
256+
257+
6. **Don't modify notebooks**: The `notebooks/` directory is excluded from linting for a reason. These are Databricks notebooks with special formatting.
258+
259+
7. **Use uv for all Python commands**: Prefix commands with `uv run` to ensure correct virtual environment usage.
260+
261+
8. **Don't modify uv.lock manually**: Use `uv sync` to update dependencies.
262+
263+
9. **Test locally before pushing**: The CI checks are strict and will fail if linting/tests don't pass.
264+
265+
10. **Understand the data structure**: The model requires specific data formats. If testing model execution, ensure proper test data is available or use existing test fixtures.
266+
267+
## Quick Reference
268+
269+
```bash
270+
# Setup (production + dev dependencies)
271+
curl -LsSf https://astral.sh/uv/install.sh | sh
272+
uv sync --extra dev
273+
274+
# Lint (MUST pass)
275+
uvx ruff check .
276+
uvx ruff format --check .
277+
278+
# Test (MUST pass)
279+
uv run pytest tests/unit --verbose
280+
281+
# Build
282+
uv build
283+
284+
# Run model (requires data)
285+
uv run python -m nhp.model queue/params-sample.json -d data --type all
286+
287+
# Build docs (requires docs extras)
288+
uv sync --extra docs
289+
uv run mkdocs build --clean
290+
```
291+
292+
**When in doubt, check the CI workflows in `.github/workflows/` - they define the exact validation steps used in the pipeline.**

0 commit comments

Comments
 (0)