Skip to content

Commit 1a94629

Browse files
dhenslejpn--yueshuaingasiripanichaletzdy
authored
Estimation Enhancements (#917)
* multiprocess initial commit * blacken * parquet format for EDBs * adding pkl, fixing edb concat and write * fixing double naming of coefficient files * blacken * fixing missing cdap coefficients file, write pickle function * combact edb writing, index duplication, parquet datatypes * sorting dest choice bundles * adding coalesce edbs as its own step * CI testing initial commit * infer.py CI testing * estimation sampling for non-mandatory and joint tours * adding survey choice to choices_df in interaction_sample * adding option to delete the mp edb subdirs * changes supporting sandag abm3 estimation mode * running test sandag example through trip dest sample * Estimation Pydantic (#2) * pydantic for estimation settings * allow df as type in config * fix table_info * repair for Pydantic * df is attribute * Estimation settings pydantic update * new compact formatting * handling multiple columns for parquet write * dropping duplicate columns * actually removing duplicate columns * dfs with correct indexes and correct mp sorting * ignore index on sort for mp coalesce edbs * updating estimation checks to allow for non-zero household_sample_size * Re-estimation (#3) * pydantic for estimation settings * allow df as type in config * fix table_info * auto ownership * repair for pydantic * update for ruff * updated for simple models * repair for Pydantic * simple simulate and location choice * df is attribute * scheduling * stop freq * test locations * cdap * nonmand_and_joint_tour_dest_choice * nonmand_tour_freq * fix ci to stop using mamba * test updates * use larch6 from pip * use numba for stop freq * fix for pandas 1.5 * fix stop freq test for numba * Sharrow Cache Dir Setting (#893) * setting necessary filesystem changes from settings file * set for multiprocessing * repair github actions * github action updates (#903) * script to make data * unified script for making data * remove older * bug * doc note * load from parquet if available * add original alt ids to EDB output when using compact * fix MP race * script arg to skip to EDB * clean up CDAP and blacken * refactor model_estimation_table_types change to estimation_table_types, to avoid pydantic namespace clash * repair drop_dupes * blacken * location choice with compact * choice_def for compact * spec changes for simple-simulate * re-estimation demo for auto ownership * clean up status messages * change name to stop pydantic warnings * edit configs * default estimation sample size is same as regular sample size * allow location alts not in cv format * dummy zones for location choice * update scheduling model estimation * various cleanup * stop freq * tidy build script * update 02 school location for larger example * update notebook 04 * editable model re-estimation for location choice * fix test names * update notebooks * cdap print filenames as loading * notebook 07 * tests thru 07 * notebooks 08 09 * build the data first * runnable script * change larch version dependency * keep pandas<2 * notebooks 10 11 * notebook 12 * remove odd print * add matplotlib * notebook 13 14 * test all the notebooks * add xlsxwriter to tests * notebook 15 * CDAP revise model spec demo * notebook 16 * notebook 17 * longer timeout * notebook 18 * notebook 19 * notebook 20 * smaller notebook 15 * configurable est mode setup * notebook 21 * notebook 22 * config sample size in GA * notebook 23 * updates for larch and graphviz * change default to compact * compare model 03 * test updates * rename test targets * repair_av_zq * move doctor up * add another repair * oops --------- Co-authored-by: David Hensle <[email protected]> * Removing estimation.yaml settings that are no longer needed * fixing unit tests, setting parquet edb default * one more missed estimation.yaml * using df.items for pandas 2 compatibility * tidy doc * updating edb file name for NMTF * updating numba and pandas in the conda env files * Improve test stability (#4) * handle dev versions of Larch * test stability * pin multimethod < 2.0 * add availability_expression * starting est docs * Resolve package version conflicts (#923) * limit multimethod version to 2.0 and earlier * add multimethod version to other settings * [makedocs] update installer download link * [makedocs] update branch docs * GitHub Actions updates (#926) * use libmamba solver * add permissions [makedocs] * add write permission for dev docs [makedocs] * conda-solver: classic * trace proto tables if available, otherwise synthetic population (#901) Co-authored-by: Jeffrey Newman <[email protected]> * release instructions (#927) * use libmamba solver * add permissions [makedocs] * add write permission for dev docs [makedocs] * conda-solver: classic * include workflow dispatch option for tests * update release instructions * add installer build to instructions * Pin mamba for now, per conda-incubator/setup-miniconda#392 * conda-remove-defaults * when no unavailability parameters are included * some general estimation docs * Use pandas 2 for docbuild environment (#928) * fix link * allow failure to import larch * workflow * blacken * try some pins * speed up docbuild * use pandas 2 for docs * oops wrong file * restore foundation * Update HOW_TO_RELEASE.md * refactor(shadow_pricing.py): remove a duplicated `default_segment_to_name_dict` (#930) * fix typo * fixing disaggregate accessibility bug in zone sampler * Revert "fixing disaggregate accessibility bug in zone sampler" This reverts commit be5d093. * notes on size terms * clean up docbuild * fix version check * add some doc * tidy * estimation docs * more on alternative avail * model evaluation * add doc on component_model * documentation enhancements * larch6 is now larch>6 * branch docs on workflow_dispatch * missing doc section on model respec --------- Co-authored-by: Yue Shuai <[email protected]> Co-authored-by: David Hensle <[email protected]> Co-authored-by: amarin <[email protected]> Co-authored-by: Ali Etezady <[email protected]> Co-authored-by: Sijia Wang <[email protected]> * handling missing data or availability conditions * add docs on locking size terms * include constants in CDAP * bump larch requirement * require larch 6.0.40 * add xlsxwriter to envs * require larch 6.0.41 * add links * fix typos and formatting * cdap hh and per parquet read match csv * add missing x_validator for mode choice and nonmand tour freq * add tour mode choice edit example * add to docs * union not addition on sets * restore nb kernel * blacken * replacing conda with uv in estimation tests * add requests to github-action dependencies * running with created virtual env instead * Fix estimation notebook tests (#8) * Update scheduling.py --------- Co-authored-by: Jeffrey Newman <[email protected]> Co-authored-by: Yue Shuai <[email protected]> Co-authored-by: amarin <[email protected]> Co-authored-by: Ali Etezady <[email protected]> Co-authored-by: Sijia Wang <[email protected]>
1 parent 10f8f44 commit 1a94629

File tree

161 files changed

+68669
-42774
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

161 files changed

+68669
-42774
lines changed

.github/workflows/core_tests.yml

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,83 @@ jobs:
321321
run: |
322322
uv run pytest activitysim/estimation/test/test_larch_estimation.py --durations=0
323323
324+
estimation_notebooks:
325+
needs: foundation
326+
env:
327+
python-version: "3.10"
328+
label: win-64
329+
defaults:
330+
run:
331+
shell: pwsh
332+
name: Estimation Notebooks Test
333+
runs-on: windows-latest
334+
steps:
335+
- uses: actions/checkout@v4
336+
337+
- name: "Set up Python"
338+
uses: actions/setup-python@v5
339+
with:
340+
python-version-file: ".python-version"
341+
342+
- name: Install uv
343+
uses: astral-sh/setup-uv@v5
344+
with:
345+
version: "0.7.12"
346+
enable-cache: true
347+
cache-dependency-glob: "uv.lock"
348+
349+
- name: setup graphviz
350+
uses: ts-graphviz/setup-graphviz@v2
351+
352+
- name: Install activitysim
353+
run: |
354+
uv sync --locked --group github-action
355+
356+
- name: Create Estimation Data
357+
run: >
358+
uv run --group github-action python activitysim/examples/example_estimation/notebooks/est_mode_setup.py
359+
--household_sample_size 5000
360+
361+
- name: Test Estimation Notebooks
362+
run: >
363+
uv run --group github-action pytest activitysim/examples/example_estimation/notebooks
364+
--nbmake-timeout=3000
365+
--ignore=activitysim/examples/example_estimation/notebooks/01_estimation_mode.ipynb
366+
--ignore-glob=activitysim/examples/example_estimation/notebooks/test-estimation-data/**
367+
368+
estimation_edb_creation:
369+
needs: foundation
370+
env:
371+
python-version: "3.10"
372+
label: win-64
373+
defaults:
374+
run:
375+
shell: pwsh
376+
name: estimation_edb_creation_test
377+
runs-on: windows-latest
378+
steps:
379+
- uses: actions/checkout@v4
380+
381+
- name: Install uv
382+
uses: astral-sh/setup-uv@v5
383+
with:
384+
version: "0.7.12"
385+
enable-cache: true
386+
cache-dependency-glob: "uv.lock"
387+
388+
- name: "Set up Python"
389+
uses: actions/setup-python@v5
390+
with:
391+
python-version-file: ".python-version"
392+
393+
- name: Install activitysim
394+
run: |
395+
uv sync --locked --only-group github-action
396+
397+
- name: Test Estimation EDB Creation
398+
run: |
399+
uv run pytest activitysim/estimation/test/test_edb_creation/test_edb_formation.py --durations=0
400+
324401
expression-profiling:
325402
needs: foundation
326403
env:
@@ -397,4 +474,4 @@ jobs:
397474
github_token: ${{ secrets.GITHUB_TOKEN }}
398475
# Token is created automatically by Github Actions, no other config needed
399476
publish_dir: ./docs/_build/html
400-
destination_dir: develop
477+
destination_dir: develop

activitysim/abm/models/cdap.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ def cdap_simulate(
195195
estimator.write_coefficients(coefficients_df, model_settings)
196196
estimator.write_table(
197197
cdap_interaction_coefficients,
198-
"interaction_coefficients",
198+
"cdap_interaction_coefficients",
199199
index=False,
200200
append=False,
201201
)
@@ -204,7 +204,7 @@ def cdap_simulate(
204204
spec = cdap.get_cached_spec(state, hhsize)
205205
estimator.write_table(spec, "spec_%s" % hhsize, append=False)
206206
if add_joint_tour_utility:
207-
joint_spec = cdap.get_cached_joint_spec(hhsize)
207+
joint_spec = cdap.get_cached_joint_spec(state, hhsize)
208208
estimator.write_table(
209209
joint_spec, "joint_spec_%s" % hhsize, append=False
210210
)

activitysim/abm/models/disaggregate_accessibility.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -764,11 +764,12 @@ def get_disaggregate_logsums(
764764
state.filesystem, model_name + ".yaml"
765765
)
766766
model_settings.SAMPLE_SIZE = disagg_model_settings.DESTINATION_SAMPLE_SIZE
767-
estimator = estimation.manager.begin_estimation(state, trace_label)
768-
if estimator:
769-
location_choice.write_estimation_specs(
770-
state, estimator, model_settings, model_name + ".yaml"
771-
)
767+
# estimator = estimation.manager.begin_estimation(state, trace_label)
768+
# if estimator:
769+
# location_choice.write_estimation_specs(
770+
# state, estimator, model_settings, model_name + ".yaml"
771+
# )
772+
estimator = None
772773

773774
# Append table references in settings with "proto_"
774775
# This avoids having to make duplicate copies of config files for disagg accessibilities

activitysim/abm/models/joint_tour_frequency.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,16 +192,19 @@ def joint_tour_frequency(
192192
print(f"len(joint_tours) {len(joint_tours)}")
193193

194194
different = False
195+
# need to check households as well because the full survey sample may not be used
196+
# (e.g. if we set household_sample_size in settings.yaml)
195197
survey_tours_not_in_tours = survey_tours[
196198
~survey_tours.index.isin(joint_tours.index)
199+
& survey_tours.household_id.isin(households.index)
197200
]
198201
if len(survey_tours_not_in_tours) > 0:
199202
print(f"survey_tours_not_in_tours\n{survey_tours_not_in_tours}")
200203
different = True
201204
tours_not_in_survey_tours = joint_tours[
202205
~joint_tours.index.isin(survey_tours.index)
203206
]
204-
if len(survey_tours_not_in_tours) > 0:
207+
if len(tours_not_in_survey_tours) > 0:
205208
print(f"tours_not_in_survey_tours\n{tours_not_in_survey_tours}")
206209
different = True
207210
assert not different

activitysim/abm/models/location_choice.py

Lines changed: 2 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
from activitysim.core.interaction_sample_simulate import interaction_sample_simulate
2020
from activitysim.core.util import reindex
2121

22-
2322
"""
2423
The school/workplace location model predicts the zones in which various people will
2524
work or attend school.
@@ -140,7 +139,7 @@ def _location_sample(
140139

141140
sample_size = model_settings.SAMPLE_SIZE
142141

143-
if estimator:
142+
if estimator and model_settings.ESTIMATION_SAMPLE_SIZE >= 0:
144143
sample_size = model_settings.ESTIMATION_SAMPLE_SIZE
145144
logger.info(
146145
f"Estimation mode for {trace_label} using sample size of {sample_size}"
@@ -423,7 +422,7 @@ def location_presample(
423422

424423
# choose a MAZ for each DEST_TAZ choice, choice probability based on MAZ size_term fraction of TAZ total
425424
maz_choices = tour_destination.choose_MAZ_for_TAZ(
426-
state, taz_sample, MAZ_size_terms, trace_label
425+
state, taz_sample, MAZ_size_terms, trace_label, model_settings
427426
)
428427

429428
assert DEST_MAZ in maz_choices
@@ -512,38 +511,6 @@ def run_location_sample(
512511
trace_label=trace_label,
513512
)
514513

515-
# adding observed choice to alt set when running in estimation mode
516-
if estimator:
517-
# grabbing survey values
518-
survey_persons = estimation.manager.get_survey_table("persons")
519-
if "school_location" in trace_label:
520-
survey_choices = survey_persons["school_zone_id"].reset_index()
521-
elif ("workplace_location" in trace_label) and ("external" not in trace_label):
522-
survey_choices = survey_persons["workplace_zone_id"].reset_index()
523-
else:
524-
return choices
525-
survey_choices.columns = ["person_id", "alt_dest"]
526-
survey_choices = survey_choices[
527-
survey_choices["person_id"].isin(choices.index)
528-
& (survey_choices.alt_dest > 0)
529-
]
530-
# merging survey destination into table if not available
531-
joined_data = survey_choices.merge(
532-
choices, on=["person_id", "alt_dest"], how="left", indicator=True
533-
)
534-
missing_rows = joined_data[joined_data["_merge"] == "left_only"]
535-
missing_rows["pick_count"] = 1
536-
if len(missing_rows) > 0:
537-
new_choices = missing_rows[
538-
["person_id", "alt_dest", "prob", "pick_count"]
539-
].set_index("person_id")
540-
choices = choices.append(new_choices, ignore_index=False).sort_index()
541-
# making probability the mean of all other sampled destinations by person
542-
# FIXME is there a better way to do this? Does this even matter for estimation?
543-
choices["prob"] = choices["prob"].fillna(
544-
choices.groupby("person_id")["prob"].transform("mean")
545-
)
546-
547514
return choices
548515

549516

activitysim/abm/models/non_mandatory_tour_frequency.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -289,14 +289,22 @@ def non_mandatory_tour_frequency(
289289
)
290290

291291
if estimator:
292-
estimator.write_spec(model_settings, bundle_directory=True)
292+
bundle_directory = True
293+
# writing to separte subdirectory for each segment if multiprocessing
294+
if state.settings.multiprocess:
295+
bundle_directory = False
296+
estimator.write_spec(model_settings, bundle_directory=bundle_directory)
293297
estimator.write_model_settings(
294-
model_settings, model_settings_file_name, bundle_directory=True
298+
model_settings,
299+
model_settings_file_name,
300+
bundle_directory=bundle_directory,
295301
)
296302
# preserving coefficients file name makes bringing back updated coefficients more straightforward
297303
estimator.write_coefficients(coefficients_df, segment_settings)
298304
estimator.write_choosers(chooser_segment)
299-
estimator.write_alternatives(alternatives, bundle_directory=True)
305+
estimator.write_alternatives(
306+
alternatives, bundle_directory=bundle_directory
307+
)
300308

301309
# FIXME #interaction_simulate_estimation_requires_chooser_id_in_df_column
302310
# shuold we do it here or have interaction_simulate do it?
@@ -435,8 +443,10 @@ def non_mandatory_tour_frequency(
435443
if estimator:
436444
# make sure they created the right tours
437445
survey_tours = estimation.manager.get_survey_table("tours").sort_index()
446+
# need the household_id check below incase household_sample_size != 0
438447
non_mandatory_survey_tours = survey_tours[
439-
survey_tours.tour_category == "non_mandatory"
448+
(survey_tours.tour_category == "non_mandatory")
449+
& survey_tours.household_id.isin(persons.household_id)
440450
]
441451
# need to remove the pure-escort tours from the survey tours table for comparison below
442452
if state.is_table("school_escort_tours"):

activitysim/abm/models/school_escorting.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -503,7 +503,10 @@ def school_escorting(
503503
coefficients_df, file_name=stage.upper() + "_COEFFICIENTS"
504504
)
505505
estimator.write_choosers(choosers)
506-
estimator.write_alternatives(alts, bundle_directory=True)
506+
if state.settings.multiprocess:
507+
estimator.write_alternatives(alts, bundle_directory=False)
508+
else:
509+
estimator.write_alternatives(alts, bundle_directory=True)
507510

508511
# FIXME #interaction_simulate_estimation_requires_chooser_id_in_df_column
509512
# shuold we do it here or have interaction_simulate do it?

activitysim/abm/models/stop_frequency.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -191,9 +191,15 @@ def stop_frequency(
191191

192192
if estimator:
193193
estimator.write_spec(segment_settings, bundle_directory=False)
194-
estimator.write_model_settings(
195-
model_settings, model_settings_file_name, bundle_directory=True
196-
)
194+
# writing to separte subdirectory for each segment if multiprocessing
195+
if state.settings.multiprocess:
196+
estimator.write_model_settings(
197+
model_settings, model_settings_file_name, bundle_directory=False
198+
)
199+
else:
200+
estimator.write_model_settings(
201+
model_settings, model_settings_file_name, bundle_directory=True
202+
)
197203
estimator.write_coefficients(coefficients_df, segment_settings)
198204
estimator.write_choosers(chooser_segment)
199205

@@ -265,7 +271,11 @@ def stop_frequency(
265271

266272
survey_trips = estimation.manager.get_survey_table(table_name="trips")
267273
different = False
268-
survey_trips_not_in_trips = survey_trips[~survey_trips.index.isin(trips.index)]
274+
# need the check below on household_id incase household_sample_size != 0
275+
survey_trips_not_in_trips = survey_trips[
276+
~survey_trips.index.isin(trips.index)
277+
& survey_trips.household_id.isin(trips.household_id)
278+
]
269279
if len(survey_trips_not_in_trips) > 0:
270280
print(f"survey_trips_not_in_trips\n{survey_trips_not_in_trips}")
271281
different = True

0 commit comments

Comments
 (0)