Real-time courier delivery simulation on Databricks. Uses Lakeflow Spark Declarative Pipelines for streaming spatial analytics and a Databricks App for live map visualization.
Demonstrates: GPS tracking, destination proximity via ST_DWithin (POI point vs ping), ST_Buffer rings on the map for 1 km / 100 m context, route display, and Overture Maps road network — Unity Catalog spatial SQL (ST_Point, ST_Buffer, ST_Intersects, ST_DWithin on supported warehouses).
Simulation (notebook or CLI)
└─ emits JSON events → Unity Catalog Volume
Lakeflow SDP Pipeline (serverless, continuous streaming)
├── Streaming tables — raw ingest via Auto Loader
└── Materialized views — spatial analytics (geofences, routes, road network)
Databricks App (courier-delivery-map)
└── Live pydeck map with 15 s auto-refresh
The Databricks Asset Bundle (dab/) is the primary end-to-end deliverable — it deploys both the pipeline and the app together. The standalone simulation notebook can also be used independently without the full DAB deployment.
The Python simulator assigns delivery targets from two shuffled queues — inner (central fraction of the bbox) and outer (the rest) — with a default 75% / 25% preference for inner-first vs outer-first picks, to spread deliveries beyond dense downtown cores. See python/README.md and docs/user_guide_simulation.md.
lakeflow_sdp/
├── dab/ # PRIMARY DELIVERABLE — Databricks Asset Bundle
│ ├── databricks.yml # Bundle config (targets: dev, prod)
│ ├── resources/
│ │ ├── courier_delivery_pipeline.pipeline.yml
│ │ └── courier_delivery_app.app.yml
│ ├── src/app/ # Databricks App source (Dash + pydeck)
│ │ ├── app.py # Layout, auto-refresh, viewport tracking
│ │ ├── queries.py # SQL fetch per layer
│ │ ├── layers.py # pydeck layer builders
│ │ ├── app.yaml # App runtime config + env vars
│ │ └── requirements.txt
│ └── README.md # DAB-specific deploy guide
│
├── python/ # lakeflow_delivery_sim package
│ ├── src/lakeflow_delivery_sim/
│ │ ├── config.py # SimConfig, OvertureSource, AUSTIN_BBOX
│ │ ├── runner.py # High-level run_sim_with_progress()
│ │ ├── simulation.py # Core simulation engine
│ │ ├── events.py # Event types + JSON serialization
│ │ ├── routing.py # OSMnx graph + route planning
│ │ ├── overture.py # Overture data loading + division lookup
│ │ ├── stac_download.py # Overture STAC download (idempotent)
│ │ └── geofence.py # Geofence checks
│ ├── tests/
│ └── pyproject.toml
│
├── notebooks/
│ └── courier_delivery_simulation.ipynb # Standalone simulation notebook
│
├── scripts/ # Build and deploy scripts
│ ├── build_dab_local.py # Validate DAB locally (no deploy)
│ ├── push_dab_to_workspace.py # Build WHL + deploy DAB
│ ├── push_whl_to_volume.py # Upload WHL to Volume
│ ├── push_notebook_to_workspace.py
│ ├── databricks_config.example.env # Copy to databricks_config.env
│ └── databricks_config.env # Local auth config (gitignored)
│
├── docs/ # Reference documentation
│ └── user_guide_simulation.md # Simulation usage guide
│
└── .cursor/commands/ # Cursor IDE shortcuts
├── sdp-build-dab-local # Validate DAB locally
├── sdp-push-dab # Build + deploy to Databricks
├── sdp-push-whl # Upload WHL only
└── sdp-push-notebook # Push simulation notebook
notebooks/courier_delivery_simulation.ipynb runs the delivery simulation independently — no DAB deployment required. Use this to:
- Download Overture Maps data (places, roads, divisions) for any city
- Run a live simulation (e.g. 50 couriers over Austin TX for 5 minutes)
- Generate bulk historical event data (date ranges, multiple windows)
- Look up named administrative divisions (e.g. Travis County bounding box)
The notebook runs on a Databricks cluster and writes events to a Unity Catalog Volume. It also works locally with python/ installed and Overture data downloaded to disk.
Quick start (in notebook):
from lakeflow_delivery_sim.config import SimConfig, OvertureSource, AUSTIN_BBOX
from lakeflow_delivery_sim.runner import run_sim_with_progress
config = SimConfig(
source=OvertureSource.databricks(catalog="main", schema="courier_delivery"),
bounds=AUSTIN_BBOX,
num_couriers=50,
)
mgr = config.create_manager()
result = run_sim_with_progress(mgr, event_dir, seed=42)
# Wall limit after first assign defaults to config.sim_duration_sec; pass sim_duration=... to override.The Databricks Asset Bundle in dab/ deploys the full demo stack:
| Resource | Name | Description |
|---|---|---|
| DLT Pipeline | [dev] Courier Delivery Spatial Pipeline |
Streaming ingest + spatial MVs |
| Databricks App | courier-delivery-map-dev |
Live pydeck map |
| Table | Type | Description |
|---|---|---|
simulation_event |
Streaming | Raw sim config events |
gps_event |
Streaming | GPS pings with ST_Point geometry |
poi_event |
Streaming | Planned routes per delivery |
delivery_completed_event |
Streaming | Completed deliveries |
delivery_abandoned_event |
Streaming | Timed-out deliveries |
simulation_bbox |
MV | Latest sim bounding box polygon |
poi_1km_geofence |
MV | 1 km ST_Buffer polygon per POI for display; courier proximity uses ST_DWithin (POI point vs GPS) in app SQL |
poi_100m_geofence |
MV | 100 m ST_Buffer for display; proximity via ST_DWithin in app SQL |
active_route |
MV | Current in-flight route per courier |
road_network |
MV | Overture road segments (from Volume) |
-
Databricks CLI (system-level Go binary, not
pip install databricks-cli):brew install databricks # macOS brew upgrade databricks # must be 0.239.0+ for apps support databricks --version
-
Python .venv:
python3 -m venv .venv && .venv/bin/pip install -e 'python/[databricks]'
-
Config: Copy
scripts/databricks_config.example.envtoscripts/databricks_config.envand set:DATABRICKS_HOST+DATABRICKS_TOKEN(orDATABRICKS_CONFIG_PROFILE)SDP_VOL_SIM_DIR— Volume path for the WHL and simulation event outputSDP_WS_DAB— Workspace base path; short form preferred, e.g.Users/you@databricks.com/Lakeflow(seedab/README.mdand.cursor/commands/sdp-push-dab.md)
Validate locally (no deploy):
bash .cursor/commands/sdp-build-dab-local.sh
# or
python scripts/build_dab_local.pyDeploy to Databricks:
bash .cursor/commands/sdp-push-dab.sh
# or
python scripts/push_dab_to_workspace.pyRuns three steps: WHL build + upload to Volume, databricks bundle deploy, then databricks workspace import-dir so .../files/ contains the pipeline SQL, explorations notebook, and app code.
Deploys to: {SDP_WS_DAB}/courier-delivery-sdp/dev/ (bundle files under .../dev/files/).
- Start pipeline: Workflows → Delta Live Tables →
[dev] Courier Delivery Spatial Pipeline→ Start - Run simulation: Open the exploration notebook in the workspace at
.../courier-delivery-sdp/dev/files/src/courier_delivery_pipeline/explorations/austin_delivery_sim, download Overture data (first time), then run the simulation cells - Open app: Compute → Apps →
courier-delivery-map-dev→ Open
See dab/README.md for the full walkthrough, layer guide, and cleanup instructions.
Recommended flow for a clean, repeatable demo (e.g. showing full table refresh and live data):
- Deploy the DAB — Run
bash .cursor/commands/sdp-push-dab.sh(orpython scripts/push_dab_to_workspace.py). Ensure the pipeline and app exist in the workspace. - Run the notebook — Open the exploration notebook (
austin_delivery_sim) and Run all. The notebook will download Overture data (first time), clean the event directory, then run the simulation and write events to the Volume. - Full table refresh — After the notebook gets past the cleaning phase (event dir cleared), stop the current SDP pipeline (Workflows → DLT → Stop). Then start it again with Run with full table refresh. This drops and recreates tables so the pipeline ingests from a known-good state.
- Open the app — Once the app is fully started (Compute → Apps →
courier-delivery-map-dev→ Open), open it to watch progress. The map and dashboard refresh every 15 seconds. If tables are still refreshing, the app shows a short message and retries; it does not hang.
Tips: Open the app before or during the full refresh if you want to show “Tables refreshing… → data appears” as the pipeline comes back. The app is resilient: during full refresh it may show “Temporary error — retry in 15s” or “unreadable” counts until tables exist again, then data appears on the next cycle.
For a step-by-step script, timing, and troubleshooting, see dab/HOW_TO_DEMO.md.
The python/ package is the simulation engine. It can run standalone (local Overture data) or on Databricks (Unity Catalog tables). See python/README.md for the full API reference.
Key cities with pre-defined bounding boxes:
from lakeflow_delivery_sim.config import get_sample_bboxes
bboxes = get_sample_bboxes()
# Austin TX, San Francisco CA, Chicago IL, New York NY,
# Seattle WA, Denver CO, London UK, Paris FRInstall:
pip install -e 'python/[databricks]' # Databricks + Spark deps
pip install -e 'python/[dev]' # local dev + tests# Set up venv
python3 -m venv .venv
source .venv/bin/activate
pip install -e 'python/[dev]'
# Run tests
cd python && pytest
# Run a quick simulation locally (requires Overture data)
lakeflow-delivery-sim run --local-path data/overture/austin --num-couriers 10 --sim-duration 60scripts/databricks_config.env is gitignored — copy from scripts/databricks_config.example.env after clone. To see everything Git is ignoring: git status --ignored --short | grep '^!!'. Full table of patterns: docs/gitignore-reference.md.