Skip to content

Commit 2d274d7

Browse files
author
Nick Fournier
authored
Phase 10A Populationsim Updates (#192)
* use modern pyproject package definition * pin python to 3.8 for now * Cleanup test runners * Allow hard constraints on balancer. * minor performance enhancements and bug fixes. * repop fix * add oceanside repop example * Fix oceanside inputs * bugfix summarize empty * Major cleanup of tests and examples to share data and configs. * Pytests gitaction * Minor update to test gha * cleanup gha testing * disable linting for now. * bugfix weighting test * simplify test_steps * Normalize df to hash * debug data hash. * debug * test * more debug * test sorting * debug * further sort * debug * debugging * more debug * Revert "more debug" This reverts commit 0972171. * Revert "debugging" This reverts commit b86cf01. * debug * more debug... * Linux - Windowx ortools bugfix. * Cleanup tests and stabilize. * Working refactor of activitysim pipeline into populationsim * linting * Possible fix for repop error. * Cleanup unused code * cleanup dependencies and test python versions * Cleanup imports * Pinned versions to work with python 3.12 * Dropped support for Python 3.13 because ortools must be <=3.12 * Cleaned up future warnings, expanded tests, and resurrected the lp_cvx option. * iter version * Add pre-commit * Fixed test bug. * Import bugfix * Numba balancer * Implemented Numba for significant perf improvement. Need to cleanup SimultanousListBalancer. * Test fix. But needs organizing in sub_balance and do_balance. * Update test_balancer.py * cleanup uv lock * Organize into modules * split numba functions * fixed import paths * more organizing * Added configurable optimizer timeout parameter in settings. Also further cleanup. * Cleanup unused code. * Added CLI option * Bugfix CLI option * Bugfixes * Revert "Bugfixes" This reverts commit 4474f60. * Bugfix max delta * Hardcode constants instead of as args * Update pyproject.toml * Fixed issue #196
1 parent 358409c commit 2d274d7

File tree

193 files changed

+115983
-93802
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

193 files changed

+115983
-93802
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
3+
4+
name: Python package
5+
6+
on:
7+
push:
8+
branches: [ "master", "develop"]
9+
pull_request:
10+
branches: [ "master" ]
11+
12+
jobs:
13+
build:
14+
15+
runs-on: ubuntu-latest
16+
strategy:
17+
fail-fast: false
18+
matrix:
19+
python-version: ["3.9", "3.10", "3.11", "3.12"]
20+
21+
steps:
22+
- uses: actions/checkout@v4
23+
24+
- name: Install uv and set the python version
25+
uses: astral-sh/setup-uv@v5
26+
with:
27+
python-version: ${{ matrix.python-version }}
28+
version: "0.6.14"
29+
30+
- name: Install the project
31+
run: uv sync --all-extras --dev
32+
33+
- name: Lint with ruff
34+
uses: astral-sh/ruff-action@v3
35+
36+
- name: Run tests
37+
run: uv run pytest

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
sandbox/
1+
scripts/calm_validation_results/
2+
23
regress/
3-
example_test_no_integerizing/
4-
example_mtc/
54
.idea
65
.ipynb_checkpoints
76

7+
.coverage.*
88

99
# Byte-compiled / optimized / DLL files
1010
__pycache__/

.pre-commit-config.yaml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v5.0.0 # Use latest stable version
4+
hooks:
5+
# - id: check-yaml
6+
- id: end-of-file-fixer
7+
- id: trailing-whitespace
8+
9+
- repo: https://github.com/psf/black
10+
rev: 24.3.0 # Use latest Black version
11+
hooks:
12+
- id: black
13+
language_version: python3 # Ensures compatibility with Python 3+
14+
15+
- repo: https://github.com/astral-sh/ruff-pre-commit
16+
rev: v0.3.3 # Replace with latest Ruff release
17+
hooks:
18+
- id: ruff
19+
args: [--fix] # Optional: auto-fix simple issues

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12

.vscode/launch.json

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
// Use IntelliSense to learn about possible attributes.
3+
// Hover to view descriptions of existing attributes.
4+
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
5+
"version": "0.2.0",
6+
"configurations": [
7+
{
8+
"name": "Python Debugger: Current File",
9+
"type": "debugpy",
10+
"request": "launch",
11+
"program": "${file}",
12+
"console": "integratedTerminal",
13+
"justMyCode": true,
14+
}
15+
]
16+
}

.vscode/settings.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"python.testing.pytestArgs": [
3+
"populationsim",
4+
"tests"
5+
],
6+
"python.testing.unittestEnabled": false,
7+
"python.testing.pytestEnabled": true,
8+
"ruff.enable": true,
9+
}

MANIFEST.in

Lines changed: 0 additions & 7 deletions
This file was deleted.

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,16 @@ easily adapted for statewide, regional, and urban transportation planning
1010
needs. PopulationSim is implemented in the
1111
[ActivitySim](https://github.com/activitysim/activitysim) framework.
1212

13+
## Command-Line Interface
14+
15+
PopulationSim can be run directly from the command line:
16+
17+
```bash
18+
populationsim -c /path/to/configs -d /path/to/data -o /path/to/output
19+
```
20+
21+
See the [examples directory](examples/) for more information on using the command-line interface.
22+
1323
## Documentation
1424

1525
https://activitysim.github.io/populationsim/

docs/application_configuration.rst

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ PopulationSim is configured using the settings.yaml file. PopulationSim can be c
121121

122122
:regular mode:
123123

124-
The regular configuration runs PopulationSim from beginning to end and produces a new synthetic population. This can run either single-process or multi-processed to save on runtime.
124+
The regular configuration runs PopulationSim from beginning to end and produces a new synthetic population. This can run either single-process or multi-processed to save on runtime.
125125

126126
:repop mode:
127127

@@ -263,17 +263,17 @@ This sub-directory is populated at the end of the PopulationSim run. The table b
263263
Configuring Settings File
264264
~~~~~~~~~~~~~~~~~~~~~~~~~
265265

266-
PopulationSim is configured using the *configs/settings.yaml* file. The user has the flexibility to specify algorithm functionality, list geographies, invoke tracing, provide inputs specifications, select outputs, list the steps to run, and specify multiprocess settings.
266+
PopulationSim is configured using the *configs/settings.yaml* file. The user has the flexibility to specify algorithm functionality, list geographies, invoke tracing, provide inputs specifications, select outputs, list the steps to run, and specify multiprocess settings.
267267

268268
.. note::
269-
When running PopulationSim, multiple settings files can be specified so long as the ``inherit_settings: True`` setting is included in
269+
When running PopulationSim, multiple settings files can be specified so long as the ``inherit_settings: True`` setting is included in
270270
subsequent files. This feature is used for the multi-processing configuration described below. To utilize this feature, once can run PopulationSim
271-
with the following command: ``python run_populationsim.py -c configs_mp -c configs``. This command specifies two config folders, each with
271+
with the following command: ``python run_populationsim.py -c configs_mp -c configs``. This command specifies two config folders, each with
272272
a settings file, and the ``configs_mp`` settings inherit from the earlier ``configs`` settings.
273273

274274
The settings shown below are from the PopulationSim application for the CALM region as an example of how a run can be configured. The meta geography for CALM region is named as *Region*, the seed geography is *PUMA* and the two sub-seed geographies are *TRACT* and *TAZ*. The settings below are for this four geography application, but the user can configure PopulationSim for any number of geographies and use different geography names.
275275

276-
Some of the setting are configured differently for the *repop* mode. The settings specific to the *repop* mode are described in the :ref:`settings_repop` section. The settings specific to the *multiprocessing* setup are described in the :ref:`settings_mp` section.
276+
Some of the setting are configured differently for the *repop* mode. The settings specific to the *repop* mode are described in the :ref:`settings_repop` section. The settings specific to the *multiprocessing* setup are described in the :ref:`settings_mp` section.
277277

278278
**Algorithm/Software Configuration**:
279279

@@ -395,11 +395,11 @@ Note that Seed-Households, Seed-Persons and Geographic CrossWalk are all require
395395
- tablename: households
396396
filename : seed_households.csv
397397
index_col: hh_id
398-
column_map:
398+
rename_columns:
399399
hhnum: hh_id
400400
- tablename: persons
401401
filename : seed_persons.csv
402-
column_map:
402+
rename_columns:
403403
hhnum: hh_id
404404
SPORDER: per_num
405405
# drop mixed type fields that appear to have been incorrectly generated
@@ -414,7 +414,7 @@ Note that Seed-Households, Seed-Persons and Geographic CrossWalk are all require
414414
- naicsp07
415415
- tablename: geo_cross_walk
416416
filename : geo_cross_walk.csv
417-
column_map:
417+
rename_columns:
418418
TRACTCE: TRACT
419419
- tablename: TAZ_control_data
420420
filename : control_totals_taz.csv
@@ -454,7 +454,7 @@ Note that Seed-Households, Seed-Persons and Geographic CrossWalk are all require
454454
+--------------+---------------------------------------------------------------------------------------+
455455
| index_col | Name of the unique ID field in the seed household data |
456456
+--------------+---------------------------------------------------------------------------------------+
457-
| column_map | Column map of fields to be renamed. The format for the column map is as follows: |br| |
457+
| rename_columns | Column map of fields to be renamed. The format for the column map is as follows: |br| |
458458
| | ``Name in CSV: New Name`` |
459459
+--------------+---------------------------------------------------------------------------------------+
460460
| drop_columns | List of columns to be dropped from the input data |
@@ -627,17 +627,17 @@ For detailed information on software implementation refer to :ref:`core_componen
627627
Configuring Settings File for Multiprocessing
628628
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
629629

630-
This sections describes the settings that are additionally configured for running PopulationSim with
631-
multiprocessing to reduce runtime. PopulationSim uses ActivitySim's multiprocessing capabilities, which
630+
This sections describes the settings that are additionally configured for running PopulationSim with
631+
multiprocessing to reduce runtime. PopulationSim uses ActivitySim's multiprocessing capabilities, which
632632
are described in more detail `here <https://activitysim.github.io/activitysim/howitworks.html#multiprocessing>`_.
633633

634-
The example below can be found in the ``example_calm\configs_mp\settings.yaml`` file. The group of model steps
635-
identified as ``mp_seed_balancing`` and starting with ``input_pre_processor``
636-
are run single process until the next group of model steps identified as ``mp_sub_balancing_TAZ`` and starting with
634+
The example below can be found in the ``example_calm\configs_mp\settings.yaml`` file. The group of model steps
635+
identified as ``mp_seed_balancing`` and starting with ``input_pre_processor``
636+
are run single process until the next group of model steps identified as ``mp_sub_balancing_TAZ`` and starting with
637637
``sub_balancing.geography=TAZ`` is reached, at which time PopulationSim runs these steps in parallel using two processors
638-
by slicing the problem into separate geographic batches based on the ``slice_geography: TRACT`` setting. It then
639-
returns to single process with the final group of model steps identified as ``mp_summarize`` and
640-
beginning with ``expand_households``.
638+
by slicing the problem into separate geographic batches based on the ``slice_geography: TRACT`` setting. It then
639+
returns to single process with the final group of model steps identified as ``mp_summarize`` and
640+
beginning with ``expand_households``.
641641

642642
::
643643

@@ -666,8 +666,8 @@ beginning with ``expand_households``.
666666
- trace_TAZ_weights
667667
- name: mp_summarize
668668
begin: expand_households
669-
670-
669+
670+
671671
+-------------------------------+--------------------------------------------------------------------------------------------------------------+
672672
| Attribute | Description |
673673
+===============================+==============================================================================================================+
@@ -859,7 +859,7 @@ Some conventions for writing expressions:
859859
* Expressions must be vectorized expressions and can use most numpy and pandas expressions.
860860
* When editing the CSV files in Excel, use single quote ' or space at the start of a cell to get Excel to accept the expression
861861

862-
.. _importance:
862+
.. _importance:
863863

864864
What are importance weights
865865
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -882,18 +882,18 @@ Where, :math:`z_{i}` are relaxation factors and :math:`a_{in}` are incidence val
882882

883883
Where, :math:`u_{i}` are the penalties termed as importance factors or importance weights in PopulationSim.
884884

885-
:math:`x_{n}` and :math:`z_{i}` are the parameters solved by the optimization while importance weights (:math:`u_{i}`) are the hyperparameters that are exposed to the user and impact the optimization externally. The objective of the relative entropy optimization is to find a set of weights that are uniform and satisfy marginal controls. The importance weights allow the user to trade-off between these objectives. High importance weights (e.g., 1E10) on all controls result in a hard constrained optimization which gives a high preference to matching marginal controls. Low importance weights (e.g., <50) results in an almost unconstrained problem. The user may also specify different importance weights for each marginal control. In this case, the controls with higher importance weights are given preference over the ones with low importance weights. Therefore, both absolute and relative value of the importance weights impacts the optimization problem and the solution.
885+
:math:`x_{n}` and :math:`z_{i}` are the parameters solved by the optimization while importance weights (:math:`u_{i}`) are the hyperparameters that are exposed to the user and impact the optimization externally. The objective of the relative entropy optimization is to find a set of weights that are uniform and satisfy marginal controls. The importance weights allow the user to trade-off between these objectives. High importance weights (e.g., 1E10) on all controls result in a hard constrained optimization which gives a high preference to matching marginal controls. Low importance weights (e.g., <50) results in an almost unconstrained problem. The user may also specify different importance weights for each marginal control. In this case, the controls with higher importance weights are given preference over the ones with low importance weights. Therefore, both absolute and relative value of the importance weights impacts the optimization problem and the solution.
886886

887-
.. _setting-importance:
887+
.. _setting-importance:
888888

889889
Setting importance weights
890890
~~~~~~~~~~~~~~~~~~~~~~~~~~~
891891

892892
Given the flexibility that importance weights offer to the user, they need to be tuned to get the desired optimality in the outputs for the given seed sample and marginal controls. The quality of the outputs is defined by a uniformity measure of the weights and goodness of fit across marginal controls. Here are general guidelines on setting importance weights:
893893

894894
* Start with a reasonable importance factor value across all controls (e.g., 1000 has typically worked well for multiple regions). This excludes the control on the total number of households which should be set to very high importance to ensure that the right number of households is generated for each zone.
895-
* After achieving reasonable goodness of fit across controls, the importance weights can be increased/decreased to favor one control over the other, or all importance weights can be reduced to improve the uniformity of the weights. Which controls to favor depends on the type of application and the quality of the marginal data.
896-
* The importance weights are generally updated in factors of 10. The user may need to run PopulationSim multiple times using various combinations of importance weights to reach the desired quality of outputs.
895+
* After achieving reasonable goodness of fit across controls, the importance weights can be increased/decreased to favor one control over the other, or all importance weights can be reduced to improve the uniformity of the weights. Which controls to favor depends on the type of application and the quality of the marginal data.
896+
* The importance weights are generally updated in factors of 10. The user may need to run PopulationSim multiple times using various combinations of importance weights to reach the desired quality of outputs.
897897

898898

899899

0 commit comments

Comments
 (0)