Skip to content

Commit a9360fa

Browse files
lmznick-fournier
andauthored
Add API Documentation with MkDocs (#31)
* read TSVs * bugfix when log dir is missing * copy over from 2023 runner * cache cleanup * add warning if linked_trip_id exists * add safeguard to prevent null IDs * mostly working 2019 * cleanup caching prints * Remove duplicate logging * pipeline runs for 2019! * clear deprecation warning * bugfix bad if else * Fix comments and update configuration to make 2019 / 2023 comparison easier * Fix bug where travel date wasn't getting set * quick purpose by mode analysis * added 2019 weights, debug with 0 or nulls * Add mkdocs-based documentation * Add MkDocs documentation with GitHub Pages deployment * Fix backslash * Allow deployment from mkdocs branch * Fix url * Update TOC and add section on generating API Documentation * Add mkdocs-include-markdown-plugin To include existing Readmes into mkdocs * Move read_write documentation into docstrings Rather than maintaining parallel documentation in Readme * Move link_trips documentation into docstrings * Improve documentation formatting and move more documentation to docstrings from Readme * Consolidate multiple versions of extract tours documentation And add imputation placeholder * Consolidate and move CTRAMP-formatting related documentation to python docstrings * Fix pre-commit errors * Fix case problem for include. * Consolidate and move daysim-formatting documentation to python docstrings * Update Readmes * Move final check documentation in python docstring * Fix spacing --------- Co-authored-by: nick-fournier <45876721+nick-fournier@users.noreply.github.com>
1 parent beb9283 commit a9360fa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1853
-1127
lines changed

.github/workflows/docs.yml

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
name: Deploy Documentation
2+
3+
on:
4+
push:
5+
branches: [main, mkdocs]
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: read
10+
pages: write
11+
id-token: write
12+
13+
# Allow one concurrent deployment
14+
concurrency:
15+
group: "pages"
16+
cancel-in-progress: false
17+
18+
jobs:
19+
build:
20+
name: Build Documentation
21+
runs-on: ubuntu-latest
22+
steps:
23+
- name: Checkout code
24+
uses: actions/checkout@v4
25+
26+
- name: Install uv
27+
uses: astral-sh/setup-uv@v3
28+
with:
29+
enable-cache: true
30+
31+
- name: Set up Python
32+
uses: actions/setup-python@v5
33+
with:
34+
python-version: "3.12"
35+
36+
- name: Install dependencies
37+
run: uv sync --group dev
38+
39+
- name: Build documentation
40+
run: uv run mkdocs build
41+
42+
- name: Upload artifact
43+
uses: actions/upload-pages-artifact@v3
44+
with:
45+
path: ./site
46+
47+
deploy:
48+
name: Deploy to GitHub Pages
49+
needs: build
50+
runs-on: ubuntu-latest
51+
environment:
52+
name: github-pages
53+
url: ${{ steps.deployment.outputs.page_url }}
54+
steps:
55+
- name: Deploy to GitHub Pages
56+
id: deployment
57+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,9 @@ instance/
7878
# Sphinx documentation
7979
docs/_build/
8080

81+
# MkDocs documentation
82+
site/
83+
8184
# PyBuilder
8285
.pybuilder/
8386
target/

README.md

Lines changed: 54 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,17 @@ Tools for processing and validating travel diary survey data into standardized f
99
- [Architecture](#architecture)
1010
- [Conceptual Diagram](#conceptual-diagram)
1111
- [Pipeline Steps](#pipeline-steps)
12-
- [Usage](#usage)
13-
- [Quick Start](#quick-start)
14-
- [1. Installing UV & Virtual Environment Setup](#1-installing-uv--virtual-environment-setup)
15-
- [2. Configuration](#2-configuration)
16-
- [3. Pipeline Runner](#3-pipeline-runner)
17-
- [Data Models and Validation](#data-models-and-validation)
18-
- [`step` Decorator and Validation](#step-decorator-and-validation)
19-
- [Documentation](#documentation)
12+
- [Quick Start](#quick-start)
13+
- [1. Installing UV & Virtual Environment Setup](#1-installing-uv--virtual-environment-setup)
14+
- [2. Configuration](#2-configuration)
15+
- [3. Pipeline Runner](#3-pipeline-runner)
16+
- [Data Models and Validation](#data-models-and-validation)
17+
- [`step` Decorator and Validation](#step-decorator-and-validation)
18+
- [Additional Documentation](#additional-documentation)
2019
- [Work Plan](#work-plan)
2120
- [Development](#development)
2221
- [Project Structure](#project-structure)
22+
- [Generating API Documentation](#generating-api-documentation)
2323
- [Running Tests](#running-tests)
2424
- [Code Quality](#code-quality)
2525
- [Pre-commit Hooks](#pre-commit-hooks)
@@ -51,6 +51,7 @@ Tools for processing and validating travel diary survey data into standardized f
5151
The usage pattern for the pipeline is a bit different than the typical numbered scripts you might see elsewhere. *There is no monolithic integrated script*. Instead there is a standardized data processing pipeline that is configurable via YAML files and executed via a runner script.
5252

5353
There are three main components:
54+
5455
* **Setup**
5556
* This contains the point of entry defined in `project/run.py` and
5657
* Pipeline configuration defined in `project/config.yaml`
@@ -179,20 +180,20 @@ The data processing pipeline consists of modular steps that transform raw survey
179180

180181
#### Core Processing Steps
181182

182-
1. **[Load Data](src/processing/read_write/README.md)** - Loads canonical survey tables from CSV, Parquet, or geospatial files into memory
183-
2. **[Cleaning](src/processing/cleaning/README.md)** - Project-specific data cleaning operations (e.g., fixing time/distance errors, adding missing records)
184-
3. **Imputation** *(placeholder)* - Imputes missing values for key variables (e.g., mode, purpose, locations)
185-
4. **[Link Trips](src/processing/link_trips/README.md)** - Aggregates individual trip segments into complete journey records by detecting mode changes and transfers
186-
5. **[Detect Joint Trips](src/processing/joint_trips/README.md)** - Identifies shared household trips using spatial-temporal similarity matching
187-
6. **[Extract Tours](src/processing/tours/README.md)** - Builds hierarchical tour structures (home-based tours and work-based subtours) from linked trips
188-
7. **Weighting** *(placeholder)* - Calculates expansion weights to match survey sample to population targets
189-
8. **[Format Output](src/processing/formatting/daysim/README.md)** - Transforms canonical data to model-specific formats (DaySim, ActivitySim, etc.)
190-
- **[DaySim Format](src/processing/formatting/daysim/README.md)** - Formats data for DaySim model input
191-
- **[CT-RAMP Format](src/processing/formatting/ctramp/README.md)** - Formats data for CT-RAMP model input
192-
9. **[Final Check](src/processing/final_check/README.md)** - Validates complete dataset against canonical schemas before export
193-
10. **[Write Data](src/processing/read_write/README.md)** - Writes processed tables to output files with optional validation
194-
195-
Each step README provides detailed documentation on:
183+
1. **[Load Data](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/read_write/)** - Loads canonical survey tables from CSV, Parquet, or geospatial files into memory
184+
2. **[Cleaning](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/cleaning/)** - Project-specific data cleaning operations (e.g., fixing time/distance errors, adding missing records)
185+
3. **[Imputation](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/imputation/)** *(placeholder)* - Imputes missing values for key variables (e.g., mode, purpose, locations)
186+
4. **[Link Trips](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/link_trips/)** - Aggregates individual trip segments into complete journey records by detecting mode changes and transfers
187+
5. **[Detect Joint Trips](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/detect_joint_trips/)** - Identifies shared household trips using spatial-temporal similarity matching
188+
6. **[Extract Tours](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/extract_tours/)** - Builds hierarchical tour structures (home-based tours and work-based subtours) from linked trips
189+
7. **[Weighting](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/weighting/)** *(placeholder)* - Calculates expansion weights to match survey sample to population targets
190+
8. **Format Output** - Transforms canonical data to model-specific formats (DaySim, ActivitySim, etc.)
191+
- **[DaySim Format](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/format_output/daysim/)** - Formats data for DaySim model input
192+
- **[CT-RAMP Format](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/format_output/ctramp/)** - Formats data for CT-RAMP model input
193+
9. **[Final Check](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/final_check/)** - Validates complete dataset against canonical schemas before export
194+
10. **[Write Data](https://bayareametro.github.io/travel-diary-survey-tools/pipeline_steps/read_write/)** - Writes processed tables to output files with optional validation
195+
196+
Each step links to documentation generated by the step's docstring, and provides detailed documentation on:
196197
- Input/output data requirements
197198
- Core algorithm and processing logic
198199
- Configuration parameters
@@ -443,6 +444,7 @@ def new_processing_step(
443444

444445
## Additional Documentation
445446
For more details, see:
447+
* [API Documentation](https://bayareametro.github.io/travel-diary-survey-tools/) - Auto-generated API documentation for data models, pipeline, and processing functions (deployed to GitHub Pages).
446448
* [Validation Framework Documentation](docs/VALIDATION_README.md) - Which goes into more detail on the validation framework architecture and usage.
447449
* [Column Requirements Documentation](docs/COLUMN_REQUIREMENTS.md) - Contains auto-generated tables and enums for easy reference on which fields are required for each processing step. Essentially summarizes the data models in a table.
448450

@@ -510,6 +512,36 @@ travel-diary-survey-tools/
510512
└── docs/ # Documentation
511513
```
512514

515+
### Generating API Documentation
516+
517+
The project uses [MkDocs](https://www.mkdocs.org/) with [Material theme](https://squidfunk.github.io/mkdocs-material/) to generate API documentation from docstrings.
518+
519+
**Building locally:**
520+
```bash
521+
# Build documentation
522+
uv run mkdocs build --strict
523+
524+
# Preview with live reload
525+
uv run mkdocs serve
526+
# View at http://127.0.0.1:8000
527+
```
528+
529+
**How it works:**
530+
- `mkdocstrings[python]` auto-generates docs from Python docstrings and type hints
531+
- `griffe-pydantic` extension handles Pydantic model documentation
532+
- `mkdocs-include-markdown-plugin` embeds algorithm documentation from processing module READMEs
533+
- Documentation structure defined in `mkdocs.yml`
534+
- Source files in `docs/` directory (markdown files reference Python modules)
535+
536+
**Adding new pages:**
537+
1. Create markdown file in `docs/`
538+
2. Add to navigation in `mkdocs.yml`
539+
3. Reference Python modules using `::: module.path.ClassName` syntax
540+
541+
**Deployment:**
542+
- Automatic via GitHub Actions on push to `main` branch
543+
- Published to: https://bayareametro.github.io/travel-diary-survey-tools/
544+
- Workflow defined in [`.github/workflows/docs.yml`](.github/workflows/docs.yml)
513545

514546
### Running Tests
515547
Tests can be run using `pytest` via VSCode extension or command line:

docs/codebook.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Codebook
2+
3+
The codebook modules define enumerated value labels and standardized coding schemes used throughout the survey processing pipeline.
4+
5+
## Overview
6+
7+
Codebook enumerations use the `LabeledEnum` pattern to provide both numeric codes and human-readable labels. These are used for:
8+
9+
- Data validation and type checking
10+
- Consistent coding across different survey years
11+
- Output formatting for travel demand models
12+
- Documentation and data dictionaries
13+
14+
## Usage Example
15+
16+
```python
17+
from data_canon.codebook.trips import Mode, Purpose
18+
19+
# Access code and label
20+
mode_code = Mode.WALK_TRANSIT.value # 11
21+
mode_label = Mode.WALK_TRANSIT.label # "Walk to transit"
22+
23+
# Validate and look up
24+
purpose = Purpose(4) # Purpose.SHOPPING_ERRANDS
25+
print(purpose.label) # "Appointment, shopping, or errands (e.g., gas)"
26+
```
27+
28+
---
29+
30+
::: data_canon.codebook.households
31+
32+
::: data_canon.codebook.vehicles
33+
34+
::: data_canon.codebook.persons
35+
36+
::: data_canon.codebook.trips
37+
38+
::: data_canon.codebook.tours
39+
40+
::: data_canon.codebook.days
41+
42+
## Project/Format-specific
43+
44+
::: data_canon.codebook.daysim
45+
46+
::: data_canon.codebook.ctramp

docs/index.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Travel Diary Survey Tools
2+
3+
Documentation for travel diary survey data processing tools.
4+
5+
## Overview
6+
7+
This project provides tools to process and analyze travel diary survey data with standardized data models and validation.
8+
9+
## Documentation Structure
10+
11+
### [Codebook](codebook.md)
12+
Enumerated value labels and coding schemes for survey data fields. Includes definitions for:
13+
14+
- Trip purposes, modes, and characteristics
15+
- Person demographics and employment
16+
- Household attributes
17+
- Tour patterns
18+
- Model-specific codes (DaySim, CTRAMP)
19+
20+
### [Data Models](models/index.md)
21+
Pydantic data models for validation and processing:
22+
23+
- Survey data models (households, persons, trips, tours)
24+
- Model-specific output formats (DaySim, CTRAMP)
25+
- Validation rules and constraints
26+
27+
## Quick Links
28+
29+
- [Project README](https://github.com/BATS/travel-diary-survey-tools/blob/main/README.md)
30+
- [Column Requirements](COLUMN_REQUIREMENTS.md)
31+
- [Validation Documentation](VALIDATION_README.md)

docs/models/ctramp.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# CTRAMP Models
2+
3+
Output file format models for the CT-RAMP (Coordinated Travel-Regional Activity Modeling Platform) travel demand model.
4+
5+
::: data_canon.models.ctramp
6+
options:
7+
show_root_heading: true
8+
members:
9+
- HouseholdCTRAMPModel
10+
- PersonCTRAMPModel
11+
- MandatoryLocationCTRAMPModel
12+
- IndividualTourCTRAMPModel
13+
- JointTourCTRAMPModel
14+
- IndividualTripCTRAMPModel
15+
- JointTripCTRAMPModel

docs/models/daysim.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# DaySim Models
2+
3+
Output file format models for the DaySim activity-based travel demand model.
4+
5+
Based on [DaySim Input Data File Documentation](https://github.com/RSGInc/DaySim/wiki/docs/Daysim%20Input%20Data%20File%20Documentation.docx)
6+
7+
::: data_canon.models.daysim
8+
options:
9+
show_root_heading: true
10+
members:
11+
- HouseholdDaysimModel
12+
- PersonDaysimModel
13+
- HouseholdDayDaysimModel
14+
- PersonDayDaysimModel
15+
- TourDaysimModel
16+
- LinkedTripDaysimModel

docs/models/index.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Data Models
2+
3+
Pydantic data models provide validation and type checking for survey data processing.
4+
5+
## Overview
6+
7+
Data models represent individual records (rows) and define:
8+
9+
- Required and optional fields
10+
- Field validation rules and constraints
11+
- Foreign key relationships between tables
12+
- Pipeline step requirements
13+
14+
Models use Pydantic's `BaseModel` with custom field validators to ensure data quality throughout the processing pipeline.
15+
16+
## Key Features
17+
18+
### Field Validation
19+
Each field includes validation rules:
20+
```python
21+
age: AgeCategory = step_field(required_in_steps=["extract_tours"])
22+
home_lat: float = step_field(ge=-90, le=90, required_in_steps=["extract_tours"])
23+
```
24+
25+
### Foreign Key Relationships
26+
Models enforce referential integrity:
27+
```python
28+
hh_id: int = step_field(
29+
ge=1,
30+
fk_to="households.hh_id",
31+
required_child=True,
32+
)
33+
```
34+
35+
### Pipeline Step Requirements
36+
Fields specify which processing steps require them:
37+
```python
38+
person_num: int = step_field(ge=1, required_in_steps=["format_ctramp", "format_daysim"])
39+
```
40+
41+
## Usage Example
42+
43+
```python
44+
from data_canon.models.survey import PersonModel
45+
46+
person = PersonModel(
47+
person_id=1,
48+
hh_id=100,
49+
person_num=1,
50+
age=AgeCategory.AGE_35_64,
51+
gender=Gender.FEMALE,
52+
employment=Employment.FULL_TIME,
53+
student=Student.NOT_STUDENT,
54+
# ... other fields
55+
)
56+
```
57+
58+
## Survey Data Models
59+
60+
Core data models used in the processing pipeline for households, persons, days, trips, and tours.
61+
62+
::: data_canon.models.survey.HouseholdModel
63+
64+
::: data_canon.models.survey.PersonModel
65+
66+
::: data_canon.models.survey.PersonDayModel
67+
68+
::: data_canon.models.survey.UnlinkedTripModel
69+
70+
::: data_canon.models.survey.LinkedTripModel
71+
72+
::: data_canon.models.survey.TourModel
73+
74+
::: data_canon.models.survey.JointTripModel
75+
76+
## Travel Model-formatted Data Models
77+
78+
### [DaySim Models](daysim.md)
79+
Output file format models for the DaySim activity-based travel demand model.
80+
81+
### [CTRAMP Models](ctramp.md)
82+
Output file format models for the CT-RAMP travel demand model.

0 commit comments

Comments
 (0)