This repository contains the code to test the Parquet file format with SUMO. It also shows the power of Parquet + DeckGL
dockeranddocker-composeinstalledsumoinstalled (only using the Python tools)pythoninstalledpandasandpolars
duckdbfor quick Parquet transformations
Screen.Recording.2024-08-21.at.8.50.13.PM.mov
The app in point_webworker displays the power of using geoparquet as the intermediate format for SUMO.
It reads in the emission output from SUMO and replays the movement of all cars in Tuscaloosa on a second by second basis.
An example of using it is:
# Convert parquet to geoparquet (w/ DuckDB)
bash ./scripts/parquet_2_geoparquet.sh ./SUMO_SIM/final_model_20240725/NEMA_tls_micro/emission.parquet ./SUMO_SIM/final_model_20240725/NEMA_tls_micro/emissions-geoparquet
# set this to a path to a geoparquet file (this can be generated from the above script)
export PARQUET_FILE="./SUMO_SIM/final_model_20240725/NEMA_tls_micro/emissions-geoparquet/time_group=2/data_0.parquet"
# set this to the path to the webworker app
export APP_DIR="./parquet-test/point_webworker"
cd docker/webapp
# the docker compose I setup for this is annoying. Excuse my --no-cache
docker compose build --no-cache && docker compose up
I wrote a quick script to demo the difference in SUMO -> Intermediate -> Python DataFrame speed using the emissions export when saving the intermediate as .xml vs .parquet
To run the tests, you must first build the docker images associated with SUMO w and w/o parquet (I seperate images to build once at SUMO HEAD and once at SUMO HEAD + Parquet support)
cd docker && docker-compose buildbash ./scipts/run_sim_docker.sh ./<PATH TO SIM DIR>
i.e.
bash ./scripts/run_sim_docker.sh ./SUMO_SIM/final_model_20240725/default_tls_micro| Format | Sim Time (s) | Read Time (s) | Total Time (s) | File Size (MB) |
|---|---|---|---|---|
| XML | 393.6 | 127.5 | 521.18 | 1200 |
| Parquet | 363.1 | 0.972 | 364.10 | 336 |