Skip to content

Commit fdd9f8c

Browse files
authored
Implement dynamic_version attribute in artifacts section (#2520)
## Changes Add opt-in field `artifacts.*.dynamic_version` (bool). If set to true, then wheels produced or referenced by this artifact will be "patched" to break dependency caching. The patching involves modifying the version to append current timestamp after `+` sign and rewriting internals of wheel, such as METADATA and RECORD. Details of patching: #2427 The patching involves two phases: 1. During build phase we check what wheels where produced (or references via files.source field) and apply PatchWheel on those. 2. During deploy phase we go through places where wheels are referenced and update unpatched files to patched. The places are jobs/task/libraries, jobs/task/for_each_task/libraries, jobs/environments/spec/dependencies. ## Why We would like to switch default template to pyproject.toml without having to carry on setuptools dependency. Currently we're relying on setuptools to patch the version at build time: #1034 Previously I added integration tests showing this problem: #2477 #2519 Those tests showed that the problem exists only for interactive clusters and only when DATA_SECURITY_MODE=USER_ISOLATION. However, this PR enables patching everywhere, since we don't have any guarantees about this behaviour. ## Tests New local acceptance test whl_dynamic to check that patching works across all 3 places. New integration tests to check that patched wheels can be run properly in classic and serverless environments: interactive_cluster_dynamic_version, serverless_dynamic_version.
1 parent 104d231 commit fdd9f8c

File tree

24 files changed

+740
-47
lines changed

24 files changed

+740
-47
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,6 @@
1212
* Added support for dashboards in deployment bind/unbind commands ([#2516](https://github.com/databricks/cli/pull/2516))
1313
* Added support for registered models in deployment bind/unbind commands ([#2556](https://github.com/databricks/cli/pull/2556))
1414
* Added a mismatch check when host is defined in config and as an env variable ([#2549](https://github.com/databricks/cli/pull/2549))
15+
* New attribute on artifacts entries: `dynamic_version`. When set to true, it patches the wheel with dynamic version suffix so it is always used by Databricks environments, even if original wheel version is the same. Intended for development loop on interactive clusters. ([#2520](https://github.com/databricks/cli/pull/2520))
1516

1617
### API Changes

acceptance/bin/ziplist.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/usr/bin/env python3
2+
"""
3+
List files in zip archive
4+
"""
5+
6+
import sys
7+
import zipfile
8+
from pathlib import Path
9+
10+
11+
for zip_path in sys.argv[1:]:
12+
with zipfile.ZipFile(zip_path.strip()) as z:
13+
for info in z.infolist():
14+
print(info.filename)
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
bundle:
2+
name: python-wheel
3+
4+
artifacts:
5+
my_test_code:
6+
type: whl
7+
path: "./my_test_code"
8+
# using 'python' there because 'python3' does not exist in virtualenv on windows
9+
build: python setup.py bdist_wheel
10+
dynamic_version: true
11+
my_prebuilt_whl:
12+
type: whl
13+
files:
14+
- source: prebuilt/other_test_code-0.0.1-py3-none-any.whl
15+
dynamic_version: true
16+
17+
resources:
18+
jobs:
19+
test_job:
20+
name: "[${bundle.target}] My Wheel Job"
21+
tasks:
22+
- task_key: TestTask
23+
existing_cluster_id: "0717-132531-5opeqon1"
24+
python_wheel_task:
25+
package_name: "my_test_code"
26+
entry_point: "run"
27+
libraries:
28+
- whl: ./my_test_code/dist/*.whl
29+
- whl: prebuilt/other_test_code-0.0.1-py3-none-any.whl
30+
for_each_task:
31+
inputs: "[1]"
32+
task:
33+
task_key: SubTask
34+
existing_cluster_id: "0717-132531-5opeqon1"
35+
python_wheel_task:
36+
package_name: "my_test_code"
37+
entry_point: "run"
38+
libraries:
39+
- whl: ./my_test_code/dist/*.whl
40+
- task_key: ServerlessTestTask
41+
python_wheel_task:
42+
package_name: "my_test_code"
43+
entry_point: "run"
44+
environment_key: "test_env"
45+
environments:
46+
- environment_key: "test_env"
47+
spec:
48+
client: "1"
49+
dependencies:
50+
- ./my_test_code/dist/*.whl
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
2+
>>> [CLI] bundle validate -o json
3+
{
4+
"my_prebuilt_whl": {
5+
"dynamic_version": true,
6+
"files": [
7+
{
8+
"source": "[TEST_TMP_DIR]/prebuilt/other_test_code-0.0.1-py3-none-any.whl"
9+
}
10+
],
11+
"path": "[TEST_TMP_DIR]",
12+
"type": "whl"
13+
},
14+
"my_test_code": {
15+
"build": "python setup.py bdist_wheel",
16+
"dynamic_version": true,
17+
"path": "[TEST_TMP_DIR]/my_test_code",
18+
"type": "whl"
19+
}
20+
}
21+
22+
>>> [CLI] bundle deploy
23+
Building my_test_code...
24+
Uploading .databricks/bundle/default/patched_wheels/my_prebuilt_whl_other_test_code/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl...
25+
Uploading .databricks/bundle/default/patched_wheels/my_test_code_my_test_code/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl...
26+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/python-wheel/default/files...
27+
Deploying resources...
28+
Updating deployment state...
29+
Deployment complete!
30+
31+
=== There are 2 original wheels and 2 patched ones
32+
>>> find.py --expect 4 whl
33+
.databricks/bundle/default/patched_wheels/my_prebuilt_whl_other_test_code/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl
34+
.databricks/bundle/default/patched_wheels/my_test_code_my_test_code/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl
35+
my_test_code/dist/my_test_code-0.0.1-py3-none-any.whl
36+
prebuilt/other_test_code-0.0.1-py3-none-any.whl
37+
38+
=== Verify contents of the zip file
39+
>>> find.py --expect 1 .databricks/.*my_test_code.*whl
40+
src/__init__.py
41+
src/__main__.py
42+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/METADATA
43+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/WHEEL
44+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/entry_points.txt
45+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/top_level.txt
46+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/RECORD
47+
48+
=== Expecting 2 patched wheels in libraries section in /jobs/create
49+
>>> jq -s .[] | select(.path=="/api/2.1/jobs/create") | .body.tasks out.requests.txt
50+
[
51+
{
52+
"environment_key": "test_env",
53+
"python_wheel_task": {
54+
"entry_point": "run",
55+
"package_name": "my_test_code"
56+
},
57+
"task_key": "ServerlessTestTask"
58+
},
59+
{
60+
"existing_cluster_id": "0717-132531-5opeqon1",
61+
"for_each_task": {
62+
"inputs": "[1]",
63+
"task": {
64+
"existing_cluster_id": "0717-132531-5opeqon1",
65+
"libraries": [
66+
{
67+
"whl": "/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
68+
}
69+
],
70+
"python_wheel_task": {
71+
"entry_point": "run",
72+
"package_name": "my_test_code"
73+
},
74+
"task_key": "SubTask"
75+
}
76+
},
77+
"libraries": [
78+
{
79+
"whl": "/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
80+
},
81+
{
82+
"whl": "/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
83+
}
84+
],
85+
"python_wheel_task": {
86+
"entry_point": "run",
87+
"package_name": "my_test_code"
88+
},
89+
"task_key": "TestTask"
90+
}
91+
]
92+
93+
=== Expecting 2 patched wheels to be uploaded
94+
>>> jq .path
95+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
96+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
97+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/files/my_test_code/dist/my_test_code-0.0.1-py3-none-any.whl"
98+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/files/prebuilt/other_test_code-0.0.1-py3-none-any.whl"
99+
100+
=== Updating the local wheel and deploying again
101+
>>> [CLI] bundle deploy
102+
Building my_test_code...
103+
Uploading .databricks/bundle/default/patched_wheels/my_prebuilt_whl_other_test_code/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl...
104+
Uploading .databricks/bundle/default/patched_wheels/my_test_code_my_test_code/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl...
105+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/python-wheel/default/files...
106+
Deploying resources...
107+
Updating deployment state...
108+
Deployment complete!
109+
110+
=== Verify contents, it should now have new_module.py
111+
>>> find.py --expect 1 .databricks/.*my_test_code.*whl
112+
src/__init__.py
113+
src/__main__.py
114+
src/new_module.py
115+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/METADATA
116+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/WHEEL
117+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/entry_points.txt
118+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/top_level.txt
119+
my_test_code-0.0.1+[TIMESTAMP_NS].dist-info/RECORD
120+
121+
=== Expecting 2 patched wheels in libraries section in /jobs/reset
122+
>>> jq -s .[] | select(.path=="/api/2.1/jobs/reset") | .body.new_settings.tasks out.requests.txt
123+
[
124+
{
125+
"environment_key": "test_env",
126+
"python_wheel_task": {
127+
"entry_point": "run",
128+
"package_name": "my_test_code"
129+
},
130+
"task_key": "ServerlessTestTask"
131+
},
132+
{
133+
"existing_cluster_id": "0717-132531-5opeqon1",
134+
"for_each_task": {
135+
"inputs": "[1]",
136+
"task": {
137+
"existing_cluster_id": "0717-132531-5opeqon1",
138+
"libraries": [
139+
{
140+
"whl": "/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
141+
}
142+
],
143+
"python_wheel_task": {
144+
"entry_point": "run",
145+
"package_name": "my_test_code"
146+
},
147+
"task_key": "SubTask"
148+
}
149+
},
150+
"libraries": [
151+
{
152+
"whl": "/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
153+
},
154+
{
155+
"whl": "/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
156+
}
157+
],
158+
"python_wheel_task": {
159+
"entry_point": "run",
160+
"package_name": "my_test_code"
161+
},
162+
"task_key": "TestTask"
163+
}
164+
]
165+
166+
=== Expecting 2 pached wheels to be uploaded (Bad: it is currently uploaded twice)
167+
>>> jq .path
168+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
169+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/artifacts/.internal/other_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl"
170+
"/api/2.0/workspace-files/import-file/Workspace/Users/[USERNAME]/.bundle/python-wheel/default/files/my_test_code/dist/my_test_code-0.0.1-py3-none-any.whl"
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
trap "rm -fr out.requests.txt databricks.yml my_test_code prebuilt" EXIT
2+
3+
cp -r $TESTDIR/../whl_explicit/my_test_code .
4+
mkdir prebuilt
5+
cp -r $TESTDIR/../whl_prebuilt_multiple/dist/lib/other_test_code-0.0.1-py3-none-any.whl prebuilt
6+
7+
trace $CLI bundle validate -o json | jq .artifacts
8+
9+
trace $CLI bundle deploy
10+
11+
title "There are 2 original wheels and 2 patched ones"
12+
trace find.py --expect 4 whl
13+
14+
title "Verify contents of the zip file"
15+
trace find.py --expect 1 '.databricks/.*my_test_code.*whl' | xargs ziplist.py
16+
17+
title "Expecting 2 patched wheels in libraries section in /jobs/create"
18+
trace jq -s '.[] | select(.path=="/api/2.1/jobs/create") | .body.tasks' out.requests.txt
19+
20+
title "Expecting 2 patched wheels to be uploaded"
21+
trace jq .path < out.requests.txt | grep import | grep whl | sort
22+
23+
rm out.requests.txt
24+
25+
title "Updating the local wheel and deploying again"
26+
touch my_test_code/src/new_module.py
27+
trace $CLI bundle deploy
28+
29+
title "Verify contents, it should now have new_module.py"
30+
trace find.py --expect 1 '.databricks/.*my_test_code.*whl' | xargs ziplist.py
31+
32+
title "Expecting 2 patched wheels in libraries section in /jobs/reset"
33+
trace jq -s '.[] | select(.path=="/api/2.1/jobs/reset") | .body.new_settings.tasks' out.requests.txt
34+
35+
title "Expecting 2 pached wheels to be uploaded (Bad: it is currently uploaded twice)"
36+
trace jq .path < out.requests.txt | grep import | grep whl | sort
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[[Repls]]
2+
Old = '\\\\'
3+
New = '/'
4+
5+
[[Server]]
6+
Pattern = "POST /api/2.1/jobs/reset"
7+
Response.Body = '{}'
Binary file not shown.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
bundle:
2+
name: wheel-task
3+
4+
workspace:
5+
root_path: "~/.bundle/$UNIQUE_NAME"
6+
7+
artifacts:
8+
python_artifact:
9+
type: whl
10+
build: uv build --wheel
11+
files:
12+
- source: dist/*.whl
13+
dynamic_version: true
14+
15+
resources:
16+
clusters:
17+
test_cluster:
18+
cluster_name: "test-cluster-$UNIQUE_NAME"
19+
spark_version: "$DEFAULT_SPARK_VERSION"
20+
node_type_id: "$NODE_TYPE_ID"
21+
num_workers: 1
22+
data_security_mode: $DATA_SECURITY_MODE
23+
24+
jobs:
25+
some_other_job:
26+
name: "[${bundle.target}] Test Wheel Job $UNIQUE_NAME"
27+
tasks:
28+
- task_key: TestTask
29+
existing_cluster_id: "${resources.clusters.test_cluster.cluster_id}"
30+
python_wheel_task:
31+
package_name: my_test_code
32+
entry_point: run
33+
parameters:
34+
- "one"
35+
- "two"
36+
libraries:
37+
- whl: ./dist/*.whl
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
2+
>>> [CLI] bundle deploy
3+
Building python_artifact...
4+
Uploading .databricks/bundle/default/patched_wheels/python_artifact_my_test_code/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl...
5+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/[UNIQUE_NAME]/files...
6+
Deploying resources...
7+
Updating deployment state...
8+
Deployment complete!
9+
10+
>>> [CLI] bundle run some_other_job
11+
Run URL: [DATABRICKS_URL]/?o=[NUMID]#job/[NUMID]/run/[NUMID]
12+
13+
[TIMESTAMP] "[default] Test Wheel Job [UNIQUE_NAME]" RUNNING
14+
[TIMESTAMP] "[default] Test Wheel Job [UNIQUE_NAME]" TERMINATED SUCCESS
15+
Hello from my func
16+
Got arguments:
17+
['my_test_code', 'one', 'two']
18+
19+
=== Make a change to code without version change and run the job again
20+
>>> [CLI] bundle deploy
21+
Building python_artifact...
22+
Uploading .databricks/bundle/default/patched_wheels/python_artifact_my_test_code/my_test_code-0.0.1+[TIMESTAMP_NS]-py3-none-any.whl...
23+
Uploading bundle files to /Workspace/Users/[USERNAME]/.bundle/[UNIQUE_NAME]/files...
24+
Deploying resources...
25+
Updating deployment state...
26+
Deployment complete!
27+
28+
>>> [CLI] bundle run some_other_job
29+
Run URL: [DATABRICKS_URL]/?o=[NUMID]#job/[NUMID]/run/[NUMID]
30+
31+
[TIMESTAMP] "[default] Test Wheel Job [UNIQUE_NAME]" RUNNING
32+
[TIMESTAMP] "[default] Test Wheel Job [UNIQUE_NAME]" TERMINATED SUCCESS
33+
UPDATED MY FUNC
34+
Got arguments:
35+
['my_test_code', 'one', 'two']
36+
37+
>>> [CLI] bundle destroy --auto-approve
38+
The following resources will be deleted:
39+
delete cluster test_cluster
40+
delete job some_other_job
41+
42+
All files and directories at the following location will be deleted: /Workspace/Users/[USERNAME]/.bundle/[UNIQUE_NAME]
43+
44+
Deleting files...
45+
Destroy complete!
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
envsubst < databricks.yml.tmpl > databricks.yml
2+
cp -r $TESTDIR/../interactive_cluster/{setup.py,my_test_code} .
3+
trap "errcode trace '$CLI' bundle destroy --auto-approve" EXIT
4+
trace $CLI bundle deploy
5+
trace $CLI bundle run some_other_job
6+
7+
title "Make a change to code without version change and run the job again"
8+
update_file.py my_test_code/__main__.py 'Hello from my func' 'UPDATED MY FUNC'
9+
trace $CLI bundle deploy
10+
trace $CLI bundle run some_other_job

0 commit comments

Comments
 (0)