fix(cache): include artifact URIs in cache key to prevent incorrect r…#12751
fix(cache): include artifact URIs in cache key to prevent incorrect r…#12751Aman-Cool wants to merge 3 commits intokubeflow:masterfrom
Conversation
|
🎉 Welcome to the Kubeflow Pipelines repo! 🎉 Thanks for opening your first PR! We're excited to have you onboard 🚀 Next steps:
Feel free to ask questions in the comments. |
|
Hi @Aman-Cool. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…euse Previously only artifact names were used in cache key generation, causing incorrect cache hits when different data sources had same artifact names. Now uses 'name@uri' format to ensure different URIs produce different keys. Signed-off-by: Aman-Cool <aman017102007@gmail.com>
114ad57 to
3f53c0f
Compare
|
@mprahl @droctothorpe ,Fixes incorrect cache reuse by including input artifact URIs in cache keys; existing cached runs may miss once, as expected. |
|
/ok-to-test |
zazulam
left a comment
There was a problem hiding this comment.
@Aman-Cool Thanks for this contribution, but can you remove the comments from the proto file? I think just documenting it in cache.go would be fine.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@zazulam , Removed the comments from the proto file and kept the documentation in |
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
1a8c39d to
9a01cd0
Compare
|
@Aman-Cool can you run the repo's pre-commit hooks? |
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
|
@zazulam ,Pre-commit hooks have been run locally and the resulting fixes are pushed. |
Fix Cache Key Collisions for Input Artifacts with Different URIs
Summary
This change fixes an issue in KFP v2 cache key generation where only input artifact logical names were considered.
As a result, tasks could incorrectly reuse cached outputs even when they were executed with different input data locations (URIs).
Problem
Cache keys were generated using artifact names only, ignoring the artifact URI that identifies the actual data source.
If the same pipeline was run multiple times with the same artifact name but different URIs, the cache would treat them as identical and return previously cached results.
This caused silent and incorrect cache hits when:
Fix
The cache key now includes the artifact URI alongside the name (
name@uri) when generating the cache identifier.This ensures cache hits occur only when input artifacts truly refer to the same data.
Artifacts without a URI continue to fall back to name-only behavior for backward compatibility.
Tests
Existing cache tests were updated, and new tests were added to validate that:
Behavior Change
Previously cached executions will no longer match under the new cache key format, resulting in cache misses.
This is expected and correct, as it prevents reuse of outputs computed from different inputs.
Result