-
Notifications
You must be signed in to change notification settings - Fork 125
Initial implementation of the local cache layer #3678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…able caching in populate_current_user_cached
| "bundle_mode": "TYPE_UNSPECIFIED", | ||
| "workspace_artifact_path_type": "WORKSPACE_FILE_SYSTEM" | ||
| "workspace_artifact_path_type": "WORKSPACE_FILE_SYSTEM", | ||
| "local_cache_measurements_ms": [...redacted...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) maybe worth recording the keys for the int map.
| // TestFingerprintStability tests that the fingerprintToHash function returns the same hash for the same input. | ||
| func TestFingerprintStability(t *testing.T) { | ||
| fingerprint1 := struct { | ||
| Key string `json:"key"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there's an embedded struct with an override for the same field:
struct foo {
A string
B string
}
struct bar {
foo
A string
}
Do bar{A:"abc"} and bar{foo{A:"abc"}} compute to the same has or different?
This is a scenario that arises in dashboards so atleast we should document what happens in a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be a separate hash, but why dashboards matter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why dashboards matter here?
Maybe someday we might want to cache dashboards... Not a real usecase but atleast worth pointing out that the hashing is not bulletproof.
|
|
||
| // getAuthorizationHeader extracts the Authorization header from the workspace client configuration. | ||
| // If it fails to authenticate, it returns an empty string. | ||
| func (b *Bundle) getAuthorizationHeader() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When doing the cache miss analysis, it'll be interesting to corellate that with auth type. I would expect oauth-u2m to have more misses due to short lived API tokens.
| === First call in a session is expected to be a cache miss: | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] using cache key: [SHA256_HASH] | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] failed to stat cache file: (redacted) | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] cache miss, computing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should include function name that is being computed? This maybe more relevant in the future where is more than one function.
| account Databricks Account Commands | ||
| api Perform Databricks API call | ||
| auth Authentication related commands | ||
| cache Local cache related commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description it's not obvious that is user-level cache (and not whatever we have in .databricks). Perhaps make it explicit "user-level cache"? cc @juliacrawf-db
| Use: "cache", | ||
| Short: "Local cache related commands", | ||
| Long: "Manage local cache used by the Databricks CLI for improved performance", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the short/long for this.
pietern
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewnester Please take a look at the PR summary as well.
The implementation has drifted from the original summary.
|
Commit: 100a4a6
22 interesting tests: 14 flaky, 7 RECOVERED, 1 SKIP
Top 50 slowest tests (at least 2 minutes):
|
Changes
This PR introduces a local file-based cache layer for the Databricks CLI to improve performance of repeated operations.
New libs/cache package:
Cache Modes:
New CLI command:
Bundle Integration:
Why
This is the first attempt to speed up subsequent
databricks bundlecommands that a bundle developer runs while developing a bundleTests
libs/cache