Courier Delivery Spatial Demo — Lakeflow SDP

Real-time courier delivery simulation on Databricks. Uses Lakeflow Spark Declarative Pipelines for streaming spatial analytics and a Databricks App for live map visualization.

Demonstrates: GPS tracking, destination proximity via ST_DWithin (POI point vs ping), ST_Buffer rings on the map for 1 km / 100 m context, route display, and Overture Maps road network — Unity Catalog spatial SQL (ST_Point, ST_Buffer, ST_Intersects, ST_DWithin on supported warehouses).

Blog: The "Where" Problem: Real-Time Spatial Engineering with Lakeflow SDP

Overview

Simulation (notebook or CLI)
  └─ emits JSON events → Unity Catalog Volume

Lakeflow SDP Pipeline  (serverless, continuous streaming)
  ├── Streaming tables  — raw ingest via Auto Loader
  └── Materialized views — spatial analytics (geofences, routes, road network)

Databricks App  (courier-delivery-map)
  └── Live pydeck map with 15 s auto-refresh

The Databricks Asset Bundle (dab/) is the primary end-to-end deliverable — it deploys both the pipeline and the app together. The standalone simulation notebook can also be used independently without the full DAB deployment.

The Python simulator assigns delivery targets from two shuffled queues — inner (central fraction of the bbox) and outer (the rest) — with a default 75% / 25% preference for inner-first vs outer-first picks, to spread deliveries beyond dense downtown cores. See python/README.md and docs/user_guide_simulation.md.

Repository Structure

lakeflow_sdp/
├── dab/                         # PRIMARY DELIVERABLE — Databricks Asset Bundle
│   ├── databricks.yml           # Bundle config (targets: dev, prod)
│   ├── resources/
│   │   ├── courier_delivery_pipeline.pipeline.yml
│   │   └── courier_delivery_app.app.yml
│   ├── src/app/                 # Databricks App source (Dash + pydeck)
│   │   ├── app.py               # Layout, auto-refresh, viewport tracking
│   │   ├── queries.py           # SQL fetch per layer
│   │   ├── layers.py            # pydeck layer builders
│   │   ├── app.yaml             # App runtime config + env vars
│   │   └── requirements.txt
│   └── README.md                # DAB-specific deploy guide
│
├── python/                      # lakeflow_delivery_sim package
│   ├── src/lakeflow_delivery_sim/
│   │   ├── config.py            # SimConfig, OvertureSource, AUSTIN_BBOX
│   │   ├── runner.py            # High-level run_sim_with_progress()
│   │   ├── simulation.py        # Core simulation engine
│   │   ├── events.py            # Event types + JSON serialization
│   │   ├── routing.py           # OSMnx graph + route planning
│   │   ├── overture.py          # Overture data loading + division lookup
│   │   ├── stac_download.py     # Overture STAC download (idempotent)
│   │   └── geofence.py          # Geofence checks
│   ├── tests/
│   └── pyproject.toml
│
├── notebooks/
│   └── courier_delivery_simulation.ipynb   # Standalone simulation notebook
│
├── scripts/                     # Build and deploy scripts
│   ├── build_dab_local.py       # Validate DAB locally (no deploy)
│   ├── push_dab_to_workspace.py # Build WHL + deploy DAB
│   ├── push_whl_to_volume.py    # Upload WHL to Volume
│   ├── push_notebook_to_workspace.py
│   ├── databricks_config.example.env  # Copy to databricks_config.env
│   └── databricks_config.env    # Local auth config (gitignored)
│
├── docs/                        # Reference documentation
│   └── user_guide_simulation.md # Simulation usage guide
│
└── .cursor/commands/            # Cursor IDE shortcuts
    ├── sdp-build-dab-local      # Validate DAB locally
    ├── sdp-push-dab             # Build + deploy to Databricks
    ├── sdp-push-whl             # Upload WHL only
    └── sdp-push-notebook        # Push simulation notebook

Standalone Simulation Notebook

notebooks/courier_delivery_simulation.ipynb runs the delivery simulation independently — no DAB deployment required. Use this to:

Download Overture Maps data (places, roads, divisions) for any city
Run a live simulation (e.g. 50 couriers over Austin TX for 5 minutes)
Generate bulk historical event data (date ranges, multiple windows)
Look up named administrative divisions (e.g. Travis County bounding box)

The notebook runs on a Databricks cluster and writes events to a Unity Catalog Volume. It also works locally with python/ installed and Overture data downloaded to disk.

Quick start (in notebook):

from lakeflow_delivery_sim.config import SimConfig, OvertureSource, AUSTIN_BBOX
from lakeflow_delivery_sim.runner import run_sim_with_progress

config = SimConfig(
    source=OvertureSource.databricks(catalog="main", schema="courier_delivery"),
    bounds=AUSTIN_BBOX,
    num_couriers=50,
)
mgr = config.create_manager()
result = run_sim_with_progress(mgr, event_dir, seed=42)
# Wall limit after first assign defaults to config.sim_duration_sec; pass sim_duration=... to override.

DAB — Primary Deliverable

The Databricks Asset Bundle in dab/ deploys the full demo stack:

Resource	Name	Description
DLT Pipeline	`[dev] Courier Delivery Spatial Pipeline`	Streaming ingest + spatial MVs
Databricks App	`courier-delivery-map-dev`	Live pydeck map

What the pipeline produces

Table	Type	Description
`simulation_event`	Streaming	Raw sim config events
`gps_event`	Streaming	GPS pings with ST_Point geometry
`poi_event`	Streaming	Planned routes per delivery
`delivery_completed_event`	Streaming	Completed deliveries
`delivery_abandoned_event`	Streaming	Timed-out deliveries
`simulation_bbox`	MV	Latest sim bounding box polygon
`poi_1km_geofence`	MV	1 km ST_Buffer polygon per POI for display; courier proximity uses `ST_DWithin` (POI point vs GPS) in app SQL
`poi_100m_geofence`	MV	100 m ST_Buffer for display; proximity via `ST_DWithin` in app SQL
`active_route`	MV	Current in-flight route per courier
`road_network`	MV	Overture road segments (from Volume)

Prerequisites

Databricks CLI (system-level Go binary, not pip install databricks-cli):

brew install databricks        # macOS
brew upgrade databricks        # must be 0.239.0+ for apps support
databricks --version

Python .venv:

python3 -m venv .venv && .venv/bin/pip install -e 'python/[databricks]'

Config: Copy scripts/databricks_config.example.env to scripts/databricks_config.env and set:
- DATABRICKS_HOST + DATABRICKS_TOKEN (or DATABRICKS_CONFIG_PROFILE)
- SDP_VOL_SIM_DIR — Volume path for the WHL and simulation event output
- SDP_WS_DAB — Workspace base path; short form preferred, e.g. Users/you@databricks.com/Lakeflow (see dab/README.md and .cursor/commands/sdp-push-dab.md)

Build and deploy

Validate locally (no deploy):

bash .cursor/commands/sdp-build-dab-local.sh
# or
python scripts/build_dab_local.py

Deploy to Databricks:

bash .cursor/commands/sdp-push-dab.sh
# or
python scripts/push_dab_to_workspace.py

Runs three steps: WHL build + upload to Volume, databricks bundle deploy, then databricks workspace import-dir so .../files/ contains the pipeline SQL, explorations notebook, and app code.

Deploys to: {SDP_WS_DAB}/courier-delivery-sdp/dev/ (bundle files under .../dev/files/).

After deploy

Start pipeline: Workflows → Delta Live Tables → [dev] Courier Delivery Spatial Pipeline → Start
Run simulation: Open the exploration notebook in the workspace at .../courier-delivery-sdp/dev/files/src/courier_delivery_pipeline/explorations/austin_delivery_sim, download Overture data (first time), then run the simulation cells
Open app: Compute → Apps → courier-delivery-map-dev → Open

See dab/README.md for the full walkthrough, layer guide, and cleanup instructions.

How to Demo

Recommended flow for a clean, repeatable demo (e.g. showing full table refresh and live data):

Deploy the DAB — Run bash .cursor/commands/sdp-push-dab.sh (or python scripts/push_dab_to_workspace.py). Ensure the pipeline and app exist in the workspace.
Run the notebook — Open the exploration notebook (austin_delivery_sim) and Run all. The notebook will download Overture data (first time), clean the event directory, then run the simulation and write events to the Volume.
Full table refresh — After the notebook gets past the cleaning phase (event dir cleared), stop the current SDP pipeline (Workflows → DLT → Stop). Then start it again with Run with full table refresh. This drops and recreates tables so the pipeline ingests from a known-good state.
Open the app — Once the app is fully started (Compute → Apps → courier-delivery-map-dev → Open), open it to watch progress. The map and dashboard refresh every 15 seconds. If tables are still refreshing, the app shows a short message and retries; it does not hang.

Tips: Open the app before or during the full refresh if you want to show “Tables refreshing… → data appears” as the pipeline comes back. The app is resilient: during full refresh it may show “Temporary error — retry in 15s” or “unreadable” counts until tables exist again, then data appears on the next cycle.

For a step-by-step script, timing, and troubleshooting, see dab/HOW_TO_DEMO.md.

lakeflow_delivery_sim Package

The python/ package is the simulation engine. It can run standalone (local Overture data) or on Databricks (Unity Catalog tables). See python/README.md for the full API reference.

Key cities with pre-defined bounding boxes:

from lakeflow_delivery_sim.config import get_sample_bboxes
bboxes = get_sample_bboxes()
# Austin TX, San Francisco CA, Chicago IL, New York NY,
# Seattle WA, Denver CO, London UK, Paris FR

Install:

pip install -e 'python/[databricks]'   # Databricks + Spark deps
pip install -e 'python/[dev]'          # local dev + tests

Local Development

# Set up venv
python3 -m venv .venv
source .venv/bin/activate
pip install -e 'python/[dev]'

# Run tests
cd python && pytest

# Run a quick simulation locally (requires Overture data)
lakeflow-delivery-sim run --local-path data/overture/austin --num-couriers 10 --sim-duration 60

Git and secrets

scripts/databricks_config.env is gitignored — copy from scripts/databricks_config.example.env after clone. To see everything Git is ignoring: git status --ignored --short | grep '^!!'. Full table of patterns: docs/gitignore-reference.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Courier Delivery Spatial Demo — Lakeflow SDP

Blog: The "Where" Problem: Real-Time Spatial Engineering with Lakeflow SDP

Overview

Repository Structure

Standalone Simulation Notebook

DAB — Primary Deliverable

What the pipeline produces

Prerequisites

Build and deploy

After deploy

How to Demo

lakeflow_delivery_sim Package

Local Development

Git and secrets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.cursor		.cursor
dab		dab
docs		docs
notebooks		notebooks
python		python
scripts		scripts
.gitignore		.gitignore
CODEOWNERS.txt		CODEOWNERS.txt
LICENSE.md		LICENSE.md
NOTICE.md		NOTICE.md
README.md		README.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

Courier Delivery Spatial Demo — Lakeflow SDP

Blog: The "Where" Problem: Real-Time Spatial Engineering with Lakeflow SDP

Overview

Repository Structure

Standalone Simulation Notebook

DAB — Primary Deliverable

What the pipeline produces

Prerequisites

Build and deploy

After deploy

How to Demo

lakeflow_delivery_sim Package

Local Development

Git and secrets

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages