Skip to content

databricks-solutions/lakeflow-deliveries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Courier Delivery Spatial Demo — Lakeflow SDP

Real-time courier delivery simulation on Databricks. Uses Lakeflow Spark Declarative Pipelines for streaming spatial analytics and a Databricks App for live map visualization.

Demonstrates: GPS tracking, destination proximity via ST_DWithin (POI point vs ping), ST_Buffer rings on the map for 1 km / 100 m context, route display, and Overture Maps road network — Unity Catalog spatial SQL (ST_Point, ST_Buffer, ST_Intersects, ST_DWithin on supported warehouses).


Overview

Simulation (notebook or CLI)
  └─ emits JSON events → Unity Catalog Volume

Lakeflow SDP Pipeline  (serverless, continuous streaming)
  ├── Streaming tables  — raw ingest via Auto Loader
  └── Materialized views — spatial analytics (geofences, routes, road network)

Databricks App  (courier-delivery-map)
  └── Live pydeck map with 15 s auto-refresh

The Databricks Asset Bundle (dab/) is the primary end-to-end deliverable — it deploys both the pipeline and the app together. The standalone simulation notebook can also be used independently without the full DAB deployment.

The Python simulator assigns delivery targets from two shuffled queuesinner (central fraction of the bbox) and outer (the rest) — with a default 75% / 25% preference for inner-first vs outer-first picks, to spread deliveries beyond dense downtown cores. See python/README.md and docs/user_guide_simulation.md.


Repository Structure

lakeflow_sdp/
├── dab/                         # PRIMARY DELIVERABLE — Databricks Asset Bundle
│   ├── databricks.yml           # Bundle config (targets: dev, prod)
│   ├── resources/
│   │   ├── courier_delivery_pipeline.pipeline.yml
│   │   └── courier_delivery_app.app.yml
│   ├── src/app/                 # Databricks App source (Dash + pydeck)
│   │   ├── app.py               # Layout, auto-refresh, viewport tracking
│   │   ├── queries.py           # SQL fetch per layer
│   │   ├── layers.py            # pydeck layer builders
│   │   ├── app.yaml             # App runtime config + env vars
│   │   └── requirements.txt
│   └── README.md                # DAB-specific deploy guide
│
├── python/                      # lakeflow_delivery_sim package
│   ├── src/lakeflow_delivery_sim/
│   │   ├── config.py            # SimConfig, OvertureSource, AUSTIN_BBOX
│   │   ├── runner.py            # High-level run_sim_with_progress()
│   │   ├── simulation.py        # Core simulation engine
│   │   ├── events.py            # Event types + JSON serialization
│   │   ├── routing.py           # OSMnx graph + route planning
│   │   ├── overture.py          # Overture data loading + division lookup
│   │   ├── stac_download.py     # Overture STAC download (idempotent)
│   │   └── geofence.py          # Geofence checks
│   ├── tests/
│   └── pyproject.toml
│
├── notebooks/
│   └── courier_delivery_simulation.ipynb   # Standalone simulation notebook
│
├── scripts/                     # Build and deploy scripts
│   ├── build_dab_local.py       # Validate DAB locally (no deploy)
│   ├── push_dab_to_workspace.py # Build WHL + deploy DAB
│   ├── push_whl_to_volume.py    # Upload WHL to Volume
│   ├── push_notebook_to_workspace.py
│   ├── databricks_config.example.env  # Copy to databricks_config.env
│   └── databricks_config.env    # Local auth config (gitignored)
│
├── docs/                        # Reference documentation
│   └── user_guide_simulation.md # Simulation usage guide
│
└── .cursor/commands/            # Cursor IDE shortcuts
    ├── sdp-build-dab-local      # Validate DAB locally
    ├── sdp-push-dab             # Build + deploy to Databricks
    ├── sdp-push-whl             # Upload WHL only
    └── sdp-push-notebook        # Push simulation notebook

Standalone Simulation Notebook

notebooks/courier_delivery_simulation.ipynb runs the delivery simulation independently — no DAB deployment required. Use this to:

  • Download Overture Maps data (places, roads, divisions) for any city
  • Run a live simulation (e.g. 50 couriers over Austin TX for 5 minutes)
  • Generate bulk historical event data (date ranges, multiple windows)
  • Look up named administrative divisions (e.g. Travis County bounding box)

The notebook runs on a Databricks cluster and writes events to a Unity Catalog Volume. It also works locally with python/ installed and Overture data downloaded to disk.

Quick start (in notebook):

from lakeflow_delivery_sim.config import SimConfig, OvertureSource, AUSTIN_BBOX
from lakeflow_delivery_sim.runner import run_sim_with_progress

config = SimConfig(
    source=OvertureSource.databricks(catalog="main", schema="courier_delivery"),
    bounds=AUSTIN_BBOX,
    num_couriers=50,
)
mgr = config.create_manager()
result = run_sim_with_progress(mgr, event_dir, seed=42)
# Wall limit after first assign defaults to config.sim_duration_sec; pass sim_duration=... to override.

DAB — Primary Deliverable

The Databricks Asset Bundle in dab/ deploys the full demo stack:

Resource Name Description
DLT Pipeline [dev] Courier Delivery Spatial Pipeline Streaming ingest + spatial MVs
Databricks App courier-delivery-map-dev Live pydeck map

What the pipeline produces

Table Type Description
simulation_event Streaming Raw sim config events
gps_event Streaming GPS pings with ST_Point geometry
poi_event Streaming Planned routes per delivery
delivery_completed_event Streaming Completed deliveries
delivery_abandoned_event Streaming Timed-out deliveries
simulation_bbox MV Latest sim bounding box polygon
poi_1km_geofence MV 1 km ST_Buffer polygon per POI for display; courier proximity uses ST_DWithin (POI point vs GPS) in app SQL
poi_100m_geofence MV 100 m ST_Buffer for display; proximity via ST_DWithin in app SQL
active_route MV Current in-flight route per courier
road_network MV Overture road segments (from Volume)

Prerequisites

  1. Databricks CLI (system-level Go binary, not pip install databricks-cli):

    brew install databricks        # macOS
    brew upgrade databricks        # must be 0.239.0+ for apps support
    databricks --version
  2. Python .venv:

    python3 -m venv .venv && .venv/bin/pip install -e 'python/[databricks]'
  3. Config: Copy scripts/databricks_config.example.env to scripts/databricks_config.env and set:

    • DATABRICKS_HOST + DATABRICKS_TOKEN (or DATABRICKS_CONFIG_PROFILE)
    • SDP_VOL_SIM_DIR — Volume path for the WHL and simulation event output
    • SDP_WS_DAB — Workspace base path; short form preferred, e.g. Users/you@databricks.com/Lakeflow (see dab/README.md and .cursor/commands/sdp-push-dab.md)

Build and deploy

Validate locally (no deploy):

bash .cursor/commands/sdp-build-dab-local.sh
# or
python scripts/build_dab_local.py

Deploy to Databricks:

bash .cursor/commands/sdp-push-dab.sh
# or
python scripts/push_dab_to_workspace.py

Runs three steps: WHL build + upload to Volume, databricks bundle deploy, then databricks workspace import-dir so .../files/ contains the pipeline SQL, explorations notebook, and app code.

Deploys to: {SDP_WS_DAB}/courier-delivery-sdp/dev/ (bundle files under .../dev/files/).

After deploy

  1. Start pipeline: Workflows → Delta Live Tables → [dev] Courier Delivery Spatial Pipeline → Start
  2. Run simulation: Open the exploration notebook in the workspace at .../courier-delivery-sdp/dev/files/src/courier_delivery_pipeline/explorations/austin_delivery_sim, download Overture data (first time), then run the simulation cells
  3. Open app: Compute → Apps → courier-delivery-map-dev → Open

See dab/README.md for the full walkthrough, layer guide, and cleanup instructions.

How to Demo

Recommended flow for a clean, repeatable demo (e.g. showing full table refresh and live data):

  1. Deploy the DAB — Run bash .cursor/commands/sdp-push-dab.sh (or python scripts/push_dab_to_workspace.py). Ensure the pipeline and app exist in the workspace.
  2. Run the notebook — Open the exploration notebook (austin_delivery_sim) and Run all. The notebook will download Overture data (first time), clean the event directory, then run the simulation and write events to the Volume.
  3. Full table refresh — After the notebook gets past the cleaning phase (event dir cleared), stop the current SDP pipeline (Workflows → DLT → Stop). Then start it again with Run with full table refresh. This drops and recreates tables so the pipeline ingests from a known-good state.
  4. Open the app — Once the app is fully started (Compute → Apps → courier-delivery-map-dev → Open), open it to watch progress. The map and dashboard refresh every 15 seconds. If tables are still refreshing, the app shows a short message and retries; it does not hang.

Tips: Open the app before or during the full refresh if you want to show “Tables refreshing… → data appears” as the pipeline comes back. The app is resilient: during full refresh it may show “Temporary error — retry in 15s” or “unreadable” counts until tables exist again, then data appears on the next cycle.

For a step-by-step script, timing, and troubleshooting, see dab/HOW_TO_DEMO.md.


lakeflow_delivery_sim Package

The python/ package is the simulation engine. It can run standalone (local Overture data) or on Databricks (Unity Catalog tables). See python/README.md for the full API reference.

Key cities with pre-defined bounding boxes:

from lakeflow_delivery_sim.config import get_sample_bboxes
bboxes = get_sample_bboxes()
# Austin TX, San Francisco CA, Chicago IL, New York NY,
# Seattle WA, Denver CO, London UK, Paris FR

Install:

pip install -e 'python/[databricks]'   # Databricks + Spark deps
pip install -e 'python/[dev]'          # local dev + tests

Local Development

# Set up venv
python3 -m venv .venv
source .venv/bin/activate
pip install -e 'python/[dev]'

# Run tests
cd python && pytest

# Run a quick simulation locally (requires Overture data)
lakeflow-delivery-sim run --local-path data/overture/austin --num-couriers 10 --sim-duration 60

Git and secrets

scripts/databricks_config.env is gitignored — copy from scripts/databricks_config.example.env after clone. To see everything Git is ignoring: git status --ignored --short | grep '^!!'. Full table of patterns: docs/gitignore-reference.md.

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors