Client modifications needed for the local cache#3175
Open
bbockelm wants to merge 2 commits intoPelicanPlatform:mainfrom
Open
Client modifications needed for the local cache#3175bbockelm wants to merge 2 commits intoPelicanPlatform:mainfrom
bbockelm wants to merge 2 commits intoPelicanPlatform:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands the Pelican client’s library capabilities to support the refactored local cache implementation by centralizing transfer/stat logic in the client and adding new features (director response caching, early metadata, ETag handling, byte-range downloads, and token-provider abstractions).
Changes:
- Add a prefix-matching TTL cache for director responses and integrate it into transfer job setup.
- Introduce early-transfer metadata and ETag propagation/validation in download/upload paths, plus byte-range download support.
- Add
TokenProvider(including federation-token support) and thread the new token options through client APIs, tests, and director query logic (incl. cache-embedded mode).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| client/sharing_url.go | Update director query call signature to include cache-mode flag. |
| client/pelican_fs.go | Switch to internal director lookup + new token generator; update stat signature. |
| client/main_test.go | Add test coverage around SciTokens acceptability logic (new scenario). |
| client/main.go | Add ETag to FileInfo and introduce TransferMetadata struct for early headers. |
| client/handle_http_test.go | Update downloadHTTP call sites for new signature; add tests for ETag, metadata channel, byte-range, and upload ETag. |
| client/handle_http.go | Core implementation: director caching integration, fed token support, cache-embedded mode, ETag handling, byte-range support, metadata channel, and close panic mitigation. |
| client/director_test.go | Update queryDirector signature in tests. |
| client/director.go | Add cache-embedded director routing and internal getDirectorInfoForPath API. |
| client/dir_resp_cache_test.go | New tests for TTL/prefix matching and path reconstitution behavior. |
| client/dir_resp_cache.go | New implementation of prefix-matching director response cache with TTL + in-flight coalescing. |
| client/bearer_auth_test.go | Update token generator construction in tests. |
| client/acquire_token.go | Introduce TokenProvider and static provider; adjust token acceptability parsing and token-source tracking behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The refactored local cache work uses the client for all its "heavy lifting"; the goal is to keep all download, upload, and stat work centralized into a single location so any bugs (or improvements) only need to be written once. Accordingly, we really had to "bulk up" some of the functionality of the client to make it work better as a library and not just a download/upload tool. The two potentially-breaking changes in this work are: - Director responses are cached by the transfer engine in a TTLCache. Previously, each separate item transferred incurred the cost of a director lookup. - The internal "stat" function prefers to "stat" against a cache in cases where we will be downloading from the cache. If the object turns out to be a collection, it'll redo the stat with the collections URL. Other important changes include: - Makes the `ETag` and other headers evailable to the client even before the data transfer starts. This will be used by the cache to determine where / how to store the object. - Some `ETag` awareness in the download paths: if different caches provide the same object with different `ETag`s, abort the transfer. - Downloads can be done for a given byte-range instead of a whole object. - If the PUT response includes an ETag, then this is included in the transfer results returned to the client. - Uploads and downloads can be done against a Reader/Writer instead of a file on the filesystem. - The client can be put in "cache mode", contacting origins directly even for origins that refuse direct reads from end-users. For obvious reasons, this is not exposed to the CLI. - The client can provide a second, "federation token", that the cache uses to provide it's not an end-user to origins that reject. - Token generation is now an interface, `TokenProvider`, in case the caller has a separate implementation of the logic. Further, this dulls the edge on - but doesn't fix - a medium-sized design flaw in the client: both the transfer client and transfer engine try to close the results channel as if they own it. So, if you are careless about the ordering of shutdown, you can trigger a panic on close. This fixes the panic but not the underlying "shared ownership" problem.
edebd7f to
8fe6c81
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The refactored local cache work uses the client for all its "heavy lifting"; the goal is to keep all download, upload, and stat work centralized into a single location so any bugs (or improvements) only need to be written once.
Accordingly, we really had to "bulk up" some of the functionality of the client to make it work better as a library and not just a download/upload tool.
The two potentially-breaking changes in this work are:
Other important changes include:
ETagand other headers evailable to the client even before the data transfer starts. This will be used by the cache to determine where / how to store the object.ETagawareness in the download paths: if different caches provide the same object with differentETags, abort the transfer.TokenProvider, in case the caller has a separate implementation of the logic.Further, this dulls the edge on - but doesn't fix - a medium-sized design flaw in the client: both the transfer client and transfer engine try to close the results channel as if they own it. So, if you are careless about the ordering of shutdown, you can trigger a panic on close. This fixes the panic but not the underlying "shared ownership" problem.