Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions multi_region_serving/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.vscode
.databricks
.scratch
60 changes: 60 additions & 0 deletions multi_region_serving/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Multi-region Serving

This Databricks Asset Bundle (DAB) is an example tool used to sync resources between main
workspaces and remote workspaces to simplify the workflow for serving models or features
across multiple regions.

## How to use this example
1. Download this example

2. Make changes as needed. Some files to highlight:
* databricks.yml - DAB bundle configuration including variable names and default values.
* src/manage_endpoint.ipynb - Notebook for create / update serving endpoints.
* src/manage_share.ipynb - Notebook for syncing dependencies of a shared model.

## How to trigger the workflows

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

2. Authenticate to your Databricks workspaces, if you have not done so already:
```
$ databricks configure
```

3. Validate bundle variables

If you don't want to set a default value for any variables defined in `databricks.yaml`, you
need to provide the variables when running any commands. You can validate if all variables are
provided
```
$ MY_BUNDLE_VARS="share_name=<SHARE_NAME>,model_name=<MODEL_NAME>,model_version=<MODEL_VERSION>,endpoint_name=<ENDPOINT_NAME>,notification_email=<EMAIL_ADDRESS>"
$ databricks bundle validate --var=$MY_BUNDLE_VARS
```

4. To deploy a copy to your main workspace:
```
$ databricks bundle deploy --target main --var=$MY_BUNDLE_VARS
```
(Note that "main" is the target name defined in databricks.yml)

This deploys everything that's defined for this project.
For example, the default template would deploy a job called
`[dev yourname] manage_serving_job` to your workspace.
You can find that job by opening your workpace and clicking on **Workflows**.

5. Similarly, to deploy a remote workspace, type:
```
$ databricks bundle -p <DATABRICKS_PROFILE> deploy --target remote1 --var=$MY_BUNDLE_VARS
```

Use `-p` to specify the databricks profile used by this command. The profile need to be
configured in `~/.databrickscfg`.

6. To run the workflow to sync a share, use the "run" command:
```
$ databricks bundle -t main -p <DATABRICKS_PROFILE> run manage_share_job --var=$MY_BUNDLE_VARS
```

7. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.
40 changes: 40 additions & 0 deletions multi_region_serving/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# This is a Databricks asset bundle definition for manage_serving.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: manage_serving

variables:
notification_email:
description: Experiment name for the model training.
model_name:
description: Model name for the model training.
remote_model_name:
description: The model name in receipient workspace. This might be similar with the origional model name with a new catalog name in the receipient workspace.
model_version:
description: Model name for the model training.
endpoint_name:
description: Name of the endpoint to deploy.
share_name:
description: Name of the share.

include:
- resources/*.yml

targets:
main:
# The default target uses 'mode: development' to create a development copy.
# - Deployed resources get prefixed with '[dev my_user_name]'
# - Any job schedules and triggers are paused by default.
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
mode: development
default: true
workspace:
host: https://myworkspace.databricks.com

remote1:
# The remote workspace that serves the model
mode: development
workspace:
host: https://myworkspace-remote.databricks.com


29 changes: 29 additions & 0 deletions multi_region_serving/requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## requirements-dev.txt: dependencies for local development.
##
## For defining dependencies used by jobs in Databricks Workflows, see
## https://docs.databricks.com/dev-tools/bundles/library-dependencies.html

## Add code completion support for DLT
databricks-dlt

## pytest is the default package used for testing
pytest

## Dependencies for building wheel files
setuptools
wheel

## databricks-connect can be used to run parts of this project locally.
## See https://docs.databricks.com/dev-tools/databricks-connect.html.
##
## databricks-connect is automatically installed if you're using Databricks
## extension for Visual Studio Code
## (https://docs.databricks.com/dev-tools/vscode-ext/dev-tasks/databricks-connect.html).
##
## To manually install databricks-connect, either follow the instructions
## at https://docs.databricks.com/dev-tools/databricks-connect.html
## to install the package system-wide. Or uncomment the line below to install a
## version of db-connect that corresponds to the Databricks Runtime version used
## for this project.
#
# databricks-connect>=15.4,<15.5
18 changes: 18 additions & 0 deletions multi_region_serving/resources/manage_serving.job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
resources:
jobs:
manage_serving_job:
name: manage_serving_job
email_notifications:
on_failure:
- ${var.notification_email}
tasks:
- task_key: notebook_task
notebook_task:
notebook_path: ../src/manage_endpoint.ipynb
parameters:
- name: endpoint_name
default: ${var.endpoint_name}
- name: model_name
default: ${var.remote_model_name}
- name: model_version
default: ${var.model_version}
18 changes: 18 additions & 0 deletions multi_region_serving/resources/manage_share.job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
resources:
jobs:
manage_share_job:
name: manage_share_job
email_notifications:
on_failure:
- ${var.notification_email}
tasks:
- task_key: notebook_task
notebook_task:
notebook_path: ../src/manage_share.ipynb
parameters:
- name: model_name
default: ${var.model_name}
- name: max_number_of_versions_to_sync
default: '10'
- name: share_name
default: ${var.share_name}
4 changes: 4 additions & 0 deletions multi_region_serving/scratch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# scratch

This folder is reserved for personal, exploratory notebooks.
By default these are not committed to Git, as 'scratch' is listed in .gitignore.
25 changes: 25 additions & 0 deletions multi_region_serving/src/lib/rest_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import urllib.request
import json
from databricks.sdk.runtime import spark


class RestClient:
def __init__(self, context):
self.base_url = "https://" + spark.conf.get("spark.databricks.workspaceUrl")
self.token = context.apiToken().get()

def get_share_info(self, share_name: str):
return self._get(
f"api/2.1/unity-catalog/shares/{share_name}?include_shared_data=true"
)

def _get(self, uri):
url = f"{self.base_url}/{uri}"
headers = {"Authorization": f"Bearer {self.token}"}
req = urllib.request.Request(url, headers=headers)
try:
response = urllib.request.urlopen(req)
return json.load(response)
except urllib.error.HTTPError as e:
result = e.read().decode()
print((e.code, result))
79 changes: 79 additions & 0 deletions multi_region_serving/src/manage_endpoint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Databricks notebook source
# MAGIC %md
# MAGIC # Create or Update Model Serving Endpoint
# MAGIC
# MAGIC Create or Update the deployed serving endpoints with a new model version.
# MAGIC
# MAGIC * Make sure you've created online tables for all the required feature tables.
# MAGIC * Run this job on the workspace where you want to serve the model.

# COMMAND ----------

# MAGIC %pip install databricks-sdk>=0.38.0
# MAGIC %restart_python

# COMMAND ----------

dbutils.widgets.text("endpoint_name", defaultValue="")
dbutils.widgets.text("model_name", defaultValue="")
dbutils.widgets.text("model_version", defaultValue="")

# COMMAND ----------

ARGS = dbutils.widgets.getAll()

endpoint_name = ARGS["endpoint_name"]
model_name = ARGS["model_name"]
model_version = ARGS["model_version"]

# COMMAND ----------

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ServedEntityInput, EndpointCoreConfigInput
from databricks.sdk.errors import ResourceDoesNotExist

workspace = WorkspaceClient()

# COMMAND ----------

try:
endpoint = workspace.serving_endpoints.get(name=endpoint_name)
except ResourceDoesNotExist as e:
endpoint = None

if endpoint is None:
workspace.serving_endpoints.create(
name=endpoint_name,
config=EndpointCoreConfigInput(
served_entities=[
ServedEntityInput(
entity_name=model_name,
entity_version=model_version,
scale_to_zero_enabled=True,
workload_size="Small",
)
]
),
)
print(f"Created endpoint {endpoint_name}")
elif endpoint.pending_config is not None:
print(f"A pending update for endpoint {endpoint_name} is being processed.")
elif (
endpoint.config.served_entities[0].entity_name != model_name
or endpoint.config.served_entities[0].entity_version != model_version
):
# Update endpoint
workspace.serving_endpoints.update_config(
name=endpoint_name,
served_entities=[
ServedEntityInput(
entity_name=model_name,
entity_version=model_version,
scale_to_zero_enabled=True,
workload_size="Small",
)
],
)
print(f"Updated endpoint {endpoint_name}")
else:
print("Endpoint already up-to-date")
Loading