Skip to content

Commit 08ba9ff

Browse files
authored
pipelines: simple end to end test (#3253)
## Changes Testing end to end pipelines commands in cli-pipelines template. ## Tests Generate cli-pipelines template project with init Deploy and run the pipeline Create a 2nd pipeline, deploy and run it. Stop both pipelines. Destroy the project.
1 parent f97b9ee commit 08ba9ff

File tree

16 files changed

+376
-0
lines changed

16 files changed

+376
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Local = true
2+
Cloud = false
3+
4+
[EnvMatrix]
5+
DATABRICKS_CLI_DEPLOYMENT = ["terraform", "direct-exp"]
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
2+
=== E2E Test: Complete pipeline lifecycle (init, deploy, run, stop, destroy)
3+
=== Initialize pipeline project
4+
>>> [PIPELINES] init --output-dir output
5+
6+
Welcome to the template for pipelines!
7+
8+
9+
Your new project has been created in the 'my_project' directory!
10+
11+
Refer to the README.md file for "getting started" instructions!
12+
13+
=== Deploy pipeline
14+
>>> [PIPELINES] deploy
15+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/my_project/dev/files...
16+
Deploying resources...
17+
Updating deployment state...
18+
Deployment complete!
19+
20+
=== Run pipeline
21+
>>> [PIPELINES] run
22+
Update URL: [DATABRICKS_URL]/#joblist/pipelines/[UUID]/updates/[UUID]
23+
24+
25+
=== Edit project by creating and running a new second pipeline
26+
>>> [PIPELINES] deploy
27+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/my_project/dev/files...
28+
Deploying resources...
29+
Updating deployment state...
30+
Deployment complete!
31+
32+
=== Assert the second pipeline is created
33+
>>> [CLI] pipelines get [UUID]
34+
{
35+
"creator_user_name":"[USERNAME]",
36+
"last_modified":[UNIX_TIME_MILLIS],
37+
"name":"[dev [USERNAME]] my_project_pipeline_2",
38+
"pipeline_id":"[UUID]",
39+
"run_as_user_name":"[USERNAME]",
40+
"spec": {
41+
"channel":"CURRENT",
42+
"deployment": {
43+
"kind":"BUNDLE",
44+
"metadata_file_path":"/Workspace/Users/[USERNAME]/.bundle/my_project/dev/state/metadata.json"
45+
},
46+
"development":true,
47+
"edition":"ADVANCED",
48+
"id":"[UUID]",
49+
"name":"[dev [USERNAME]] my_project_pipeline_2",
50+
"storage":"dbfs:/pipelines/[UUID]"
51+
},
52+
"state":"IDLE"
53+
}
54+
55+
>>> [PIPELINES] run my_project_pipeline_2
56+
Update URL: [DATABRICKS_URL]/#joblist/pipelines/[UUID]/updates/[UUID]
57+
58+
59+
=== Stop both pipelines before destroy
60+
>>> [PIPELINES] stop my_project_pipeline
61+
Stopping my_project_pipeline...
62+
my_project_pipeline has been stopped.
63+
64+
>>> [PIPELINES] stop my_project_pipeline_2
65+
Stopping my_project_pipeline_2...
66+
my_project_pipeline_2 has been stopped.
67+
68+
=== Destroy project
69+
>>> [PIPELINES] destroy --auto-approve
70+
The following resources will be deleted:
71+
delete pipeline my_project_pipeline
72+
delete pipeline my_project_pipeline_2
73+
74+
All files and directories at the following location will be deleted: /Workspace/Users/[USERNAME]/.bundle/my_project/dev
75+
76+
Deleting files...
77+
Destroy complete!
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Typings for Pylance in Visual Studio Code
2+
# see https://github.com/microsoft/pyright/blob/main/docs/builtins.md
3+
from databricks.sdk.runtime import *
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"recommendations": [
3+
"databricks.databricks",
4+
"ms-python.vscode-pylance",
5+
"redhat.vscode-yaml"
6+
]
7+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"python.analysis.stubPath": ".vscode",
3+
"databricks.python.envFile": "${workspaceFolder}/.env",
4+
"jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
5+
"jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
6+
"python.testing.pytestArgs": [
7+
"."
8+
],
9+
"python.testing.unittestEnabled": false,
10+
"python.testing.pytestEnabled": true,
11+
"python.analysis.extraPaths": ["resources/my_project_pipeline"],
12+
"files.exclude": {
13+
"**/*.egg-info": true,
14+
"**/__pycache__": true,
15+
".pytest_cache": true,
16+
},
17+
"[python]": {
18+
"editor.defaultFormatter": "ms-python.black-formatter",
19+
"editor.formatOnSave": true,
20+
},
21+
}
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# my_project
2+
3+
The 'my_project' project was generated by using the CLI Pipelines template.
4+
5+
## Setup
6+
7+
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
8+
9+
2. Install the Pipelines CLI:
10+
```
11+
$ databricks install-pipelines-cli
12+
```
13+
14+
3. Authenticate to your Databricks workspace, if you have not done so already:
15+
```
16+
$ pipelines auth login
17+
```
18+
19+
4. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
20+
https://docs.databricks.com/dev-tools/vscode-ext.html. Or the PyCharm plugin from
21+
https://www.databricks.com/blog/announcing-pycharm-integration-databricks.
22+
23+
## Pipeline Structure
24+
25+
This folder defines all source code for the my_project_pipeline pipeline:
26+
27+
- `explorations`: Ad-hoc notebooks used to explore the data processed by this pipeline.
28+
- `transformations`: All dataset definitions and transformations.
29+
- `utilities` (optional): Utility functions and Python modules used in this pipeline.
30+
31+
## Getting Started
32+
33+
To get started, go to the `transformations` folder -- most of the relevant source code lives there:
34+
35+
* By convention, every dataset under `transformations` is in a separate file.
36+
* Take a look at the sample under "sample_trips_my_project.py" to get familiar with the syntax.
37+
Read more about the syntax at https://docs.databricks.com/dlt/python-ref.html.
38+
39+
For more tutorials and reference material, see https://docs.databricks.com/dlt.
40+
41+
## Deploying pipelines
42+
43+
1. To deploy a development copy of this project, type:
44+
```
45+
$ pipelines deploy --target dev
46+
```
47+
(Note that "dev" is the default target, so the `--target` parameter
48+
is optional here.)
49+
50+
2. Similarly, to deploy a production copy, type:
51+
```
52+
$ pipelines deploy --target prod
53+
```
54+
55+
3. To run a pipeline, use the "run" command:
56+
```
57+
$ pipelines run
58+
```
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# This is a Databricks pipelines definition for my_project.
2+
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
3+
bundle:
4+
name: my_project
5+
uuid: [UUID]
6+
7+
include:
8+
- resources/*.yml
9+
- resources/*/*.yml
10+
- ./*.yml
11+
12+
# Variable declarations. These variables are assigned in the dev/prod targets below.
13+
variables:
14+
catalog:
15+
description: The catalog to use
16+
schema:
17+
description: The schema to use
18+
notifications:
19+
description: The email addresses to use for failure notifications
20+
21+
targets:
22+
dev:
23+
# The default target uses 'mode: development' to create a development copy.
24+
# - Deployed pipelines get prefixed with '[dev my_user_name]'
25+
mode: development
26+
default: true
27+
workspace:
28+
host: [DATABRICKS_URL]
29+
variables:
30+
catalog: hive_metastore
31+
schema: ${workspace.current_user.short_name}
32+
notifications: []
33+
34+
prod:
35+
mode: production
36+
workspace:
37+
host: [DATABRICKS_URL]
38+
# We explicitly deploy to /Workspace/Users/[USERNAME] to make sure we only have a single copy.
39+
root_path: /Workspace/Users/[USERNAME]/.bundle/${bundle.name}/${bundle.target}
40+
permissions:
41+
- user_name: [USERNAME]
42+
level: CAN_MANAGE
43+
variables:
44+
catalog: hive_metastore
45+
schema: default
46+
notifications: [[USERNAME]]
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"application/vnd.databricks.v1+cell": {
7+
"cellMetadata": {},
8+
"inputWidgets": {},
9+
"nuid": "[UUID]",
10+
"showTitle": false,
11+
"tableResultSettingsMap": {},
12+
"title": ""
13+
}
14+
},
15+
"source": [
16+
"### Example Exploratory Notebook\n",
17+
"\n",
18+
"Use this notebook to explore the data generated by the pipeline in your preferred programming language.\n",
19+
"\n",
20+
"**Note**: This notebook is not executed as part of the pipeline."
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": 0,
26+
"metadata": {
27+
"application/vnd.databricks.v1+cell": {
28+
"cellMetadata": {},
29+
"inputWidgets": {},
30+
"nuid": "[UUID]",
31+
"showTitle": false,
32+
"tableResultSettingsMap": {},
33+
"title": ""
34+
}
35+
},
36+
"outputs": [],
37+
"source": [
38+
"# !!! Before performing any data analysis, make sure to run the pipeline to materialize the sample datasets. The tables referenced in this notebook depend on that step.\n",
39+
"\n",
40+
"display(spark.sql(\"SELECT * FROM hive_metastore.[USERNAME].my_project\"))"
41+
]
42+
}
43+
],
44+
"metadata": {
45+
"application/vnd.databricks.v1+notebook": {
46+
"computePreferences": null,
47+
"dashboards": [],
48+
"environmentMetadata": null,
49+
"inputWidgetPreferences": null,
50+
"language": "python",
51+
"notebookMetadata": {
52+
"pythonIndentUnit": 2
53+
},
54+
"notebookName": "sample_exploration",
55+
"widgets": {}
56+
},
57+
"language_info": {
58+
"name": "python"
59+
}
60+
},
61+
"nbformat": 4,
62+
"nbformat_minor": 0
63+
}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
resources:
2+
pipelines:
3+
my_project_pipeline:
4+
name: my_project_pipeline
5+
serverless: true
6+
catalog: ${var.catalog}
7+
schema: ${var.schema}
8+
root_path: "."
9+
libraries:
10+
- glob:
11+
include: transformations/**
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
resources:
2+
pipelines:
3+
my_project_pipeline_2:
4+
name: my_project_pipeline_2

0 commit comments

Comments
 (0)