Initial implementation of the local cache layer#3678
Conversation
…able caching in populate_current_user_cached
| "bundle_mode": "TYPE_UNSPECIFIED", | ||
| "workspace_artifact_path_type": "WORKSPACE_FILE_SYSTEM" | ||
| "workspace_artifact_path_type": "WORKSPACE_FILE_SYSTEM", | ||
| "local_cache_measurements_ms": [...redacted...] |
There was a problem hiding this comment.
(optional) maybe worth recording the keys for the int map.
| // TestFingerprintStability tests that the fingerprintToHash function returns the same hash for the same input. | ||
| func TestFingerprintStability(t *testing.T) { | ||
| fingerprint1 := struct { | ||
| Key string `json:"key"` |
There was a problem hiding this comment.
What happens if there's an embedded struct with an override for the same field:
struct foo {
A string
B string
}
struct bar {
foo
A string
}
Do bar{A:"abc"} and bar{foo{A:"abc"}} compute to the same has or different?
This is a scenario that arises in dashboards so atleast we should document what happens in a unit test.
There was a problem hiding this comment.
This has to be a separate hash, but why dashboards matter here?
There was a problem hiding this comment.
but why dashboards matter here?
Maybe someday we might want to cache dashboards... Not a real usecase but atleast worth pointing out that the hashing is not bulletproof.
|
|
||
| // getAuthorizationHeader extracts the Authorization header from the workspace client configuration. | ||
| // If it fails to authenticate, it returns an empty string. | ||
| func (b *Bundle) getAuthorizationHeader() string { |
There was a problem hiding this comment.
When doing the cache miss analysis, it'll be interesting to corellate that with auth type. I would expect oauth-u2m to have more misses due to short lived API tokens.
| === First call in a session is expected to be a cache miss: | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] using cache key: [SHA256_HASH] | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] failed to stat cache file: (redacted) | ||
| [DEBUG_TIMESTAMP] Debug: [Local Cache] cache miss, computing |
There was a problem hiding this comment.
nit: should include function name that is being computed? This maybe more relevant in the future where is more than one function.
| account Databricks Account Commands | ||
| api Perform Databricks API call | ||
| auth Authentication related commands | ||
| cache Local cache related commands |
There was a problem hiding this comment.
From the description it's not obvious that is user-level cache (and not whatever we have in .databricks). Perhaps make it explicit "user-level cache"? cc @juliacrawf-db
| Use: "cache", | ||
| Short: "Local cache related commands", | ||
| Long: "Manage local cache used by the Databricks CLI for improved performance", | ||
| } |
There was a problem hiding this comment.
Please update the short/long for this.
pietern
left a comment
There was a problem hiding this comment.
@andrewnester Please take a look at the PR summary as well.
The implementation has drifted from the original summary.
|
Commit: 100a4a6
22 interesting tests: 14 flaky, 7 RECOVERED, 1 SKIP
Top 50 slowest tests (at least 2 minutes):
|
Changes
This PR introduces a local file-based cache layer for the Databricks CLI to improve performance of repeated operations.
New libs/cache package:
Cache Modes:
New CLI command:
Bundle Integration:
Why
This is the first attempt to speed up subsequent
databricks bundlecommands that a bundle developer runs while developing a bundleTests
libs/cache