Skip to content

fix(cache): include artifact URIs in cache key to prevent incorrect r…#12751

Open
Aman-Cool wants to merge 3 commits intokubeflow:masterfrom
Aman-Cool:fix/cache-key-include-artifact-uris
Open

fix(cache): include artifact URIs in cache key to prevent incorrect r…#12751
Aman-Cool wants to merge 3 commits intokubeflow:masterfrom
Aman-Cool:fix/cache-key-include-artifact-uris

Conversation

@Aman-Cool
Copy link

Fix Cache Key Collisions for Input Artifacts with Different URIs

Summary

This change fixes an issue in KFP v2 cache key generation where only input artifact logical names were considered.
As a result, tasks could incorrectly reuse cached outputs even when they were executed with different input data locations (URIs).

Problem

Cache keys were generated using artifact names only, ignoring the artifact URI that identifies the actual data source.
If the same pipeline was run multiple times with the same artifact name but different URIs, the cache would treat them as identical and return previously cached results.

This caused silent and incorrect cache hits when:

  • Re-running pipelines with new dataset versions
  • Sharing pipeline templates across teams or tenants
  • Running experiments or A/B tests on different input data

Fix

The cache key now includes the artifact URI alongside the name (name@uri) when generating the cache identifier.
This ensures cache hits occur only when input artifacts truly refer to the same data.

Artifacts without a URI continue to fall back to name-only behavior for backward compatibility.

Tests

Existing cache tests were updated, and new tests were added to validate that:

  • Same artifact name + different URI → different cache fingerprints
  • Same name + same URI → cache hit
  • Artifacts without URIs remain supported

Behavior Change

Previously cached executions will no longer match under the new cache key format, resulting in cache misses.
This is expected and correct, as it prevents reuse of outputs computed from different inputs.

Result

  • Correct cache behavior for dataset versioning and multi-tenant usage
  • No more silent reuse of outputs from different data sources
  • Deterministic and trustworthy caching semantics

@github-actions
Copy link

🎉 Welcome to the Kubeflow Pipelines repo! 🎉

Thanks for opening your first PR! We're excited to have you onboard 🚀

Next steps:

Feel free to ask questions in the comments.

@google-oss-prow
Copy link

Hi @Aman-Cool. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…euse

Previously only artifact names were used in cache key generation, causing
incorrect cache hits when different data sources had same artifact names.
Now uses 'name@uri' format to ensure different URIs produce different keys.

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
@Aman-Cool Aman-Cool force-pushed the fix/cache-key-include-artifact-uris branch from 114ad57 to 3f53c0f Compare January 31, 2026 21:45
@Aman-Cool
Copy link
Author

@mprahl @droctothorpe ,Fixes incorrect cache reuse by including input artifact URIs in cache keys; existing cached runs may miss once, as expected.

@hbelmiro
Copy link
Contributor

hbelmiro commented Feb 2, 2026

/ok-to-test

Copy link
Contributor

@zazulam zazulam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aman-Cool Thanks for this contribution, but can you remove the comments from the proto file? I think just documenting it in cache.go would be fine.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hbelmiro for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Aman-Cool
Copy link
Author

@zazulam , Removed the comments from the proto file and kept the documentation in cache.go only, as suggested.

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
@Aman-Cool Aman-Cool force-pushed the fix/cache-key-include-artifact-uris branch from 1a8c39d to 9a01cd0 Compare February 2, 2026 16:32
@zazulam
Copy link
Contributor

zazulam commented Feb 2, 2026

@Aman-Cool can you run the repo's pre-commit hooks?

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
@Aman-Cool
Copy link
Author

@zazulam ,Pre-commit hooks have been run locally and the resulting fixes are pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants