Skip to content

Conversation

Maximilien-R
Copy link
Contributor

@Maximilien-R Maximilien-R commented Jul 28, 2025

This PR improves the performance of TaskRun reconciliation by resolving StepActions concurrently and refactors the resolution logic for better efficiency.

The problem

Currently, when a Task contains multiple steps that reference StepActions, the resolution of these references is performed sequentially. This can lead to significant delays in starting a TaskRun, particularly when using remote resolvers like git, as each resolution adds to the total time.

Additionally, the existing code performs a deep copy of every step, regardless of whether it references a StepAction, leading to unnecessary memory allocations.

The changes

This pull request introduces two main improvements to StepAction resolution:

  1. Concurrent resolution: StepActions are now resolved concurrently using an errgroup. This reduces the time required to process TaskRuns that contain multiple steps with remote StepAction references, such as those from a git repository.

  2. Code refactoring: The resolution logic in taskspec.go has been refactored for clarity and maintainability. This includes:

    • Introducing a HasStepRefs function for an early exit if no StepActions need to be resolved.
    • Creating a resolveStepRef function to encapsulate the logic of resolving a single StepAction.
    • Splitting the process into two phases: concurrent resolution and sequential merging of results.
    • Adding a updateTaskRunProvenance function to handle status updates cleanly.
    • Optimizing DeepCopy to only occur when a step.Ref is present.

/kind feature

The resolution of `StepActions` within a `TaskRun` is now performed concurrently, which can significantly reduce the time it takes for a `TaskRun` to start, especially when using multiple remote `StepActions`.

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 28, 2025
Copy link

linux-foundation-easycla bot commented Jul 28, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Maximilien-R / name: Maximilien Raulic (9b1f2e7)

@tekton-robot tekton-robot requested review from dibyom and twoGiants July 28, 2025 15:46
@tekton-robot tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 28, 2025
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 87.5% 88.3% 0.8
pkg/reconciler/taskrun/resources/taskspec.go 100.0% 97.1% -2.9

@Maximilien-R
Copy link
Contributor Author

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 28, 2025
@Maximilien-R Maximilien-R force-pushed the feat/parallel-stepaction-resolution branch from fbe1cd7 to e48f5d1 Compare July 28, 2025 19:21
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 87.5% 88.3% 0.8
pkg/reconciler/taskrun/resources/taskspec.go 100.0% 97.1% -2.9

@waveywaves waveywaves self-assigned this Jul 29, 2025
Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, it looks good!
A few minor comments, nothing blocking.
/approve

}

// HasStepRefs provides a fast check to see if any steps in a TaskSpec contain a reference to a StepAction.
func HasStepRefs(taskSpec *v1.TaskSpec) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exported only so that it may have dedicated unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, initially the function was private but not wanting to modify the test file too much, I made it public, thinking that it could be a function that could be useful in other contexts.

However, I introduced this commit to make it private if that makes more sense to you.

Let me know what your preference is and I'd be happy to squash or remove that commit.

Copy link
Member

@afrittoli afrittoli Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Maximilien-R for the extra commit. Either way is probably ok.

We usually don't export functions unless we need to, but we also have a policy (which we don't always honour), to only tests for exported functions, meaning that other functions can only be tested indirectly through their calling function.

In this case, I'm not sure which of the two policies would win, it seems reasonable to have unit tests specifically for that function. @vdemeester any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrittoli @vdemeester should we document this under the contribution guide if not documented already ?

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2025
Copy link
Member

@waveywaves waveywaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default-step-action-parallelism-limit doesn't specify exactly what is being parallelized, and don't think these would be running in parallel as your PR adds changes to throttle the concurrency of StepAction resolution go routines (g.Go() usage) which doesn't guarantee parallel CPU execution. I believe using the term concurrency here is much better.

What do you think about updating the config key to another identifier which reflects this better default-step-ref-concurrency-limit (considering this is to resolve the references), or default-step-action-concurrency-limit (replacing the existing parallemism from the key mentioned in this PR to concurrency) or maybe something else ...

@waveywaves
Copy link
Member

/retest

@Maximilien-R Maximilien-R force-pushed the feat/parallel-stepaction-resolution branch from c3c4e4a to c2ae737 Compare August 4, 2025 12:04
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 87.5% 88.3% 0.8
pkg/reconciler/taskrun/resources/taskspec.go 100.0% 97.1% -2.9

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 87.5% 88.3% 0.8
pkg/reconciler/taskrun/resources/taskspec.go 100.0% 97.1% -2.9

@Maximilien-R
Copy link
Contributor Author

@afrittoli, I have applied the various proposed corrections as well as converting the public HasStepRefs function into a private function, let me know if this seems more relevant to you.

@waveywaves As suggested, I replaced the various occurrences of the notion of "parallelism" with "concurrency" and "step action" with "step ref", indeed, this seems more coherent to me.

I also took the opportunity to expand the documentation around the configuration key to make things more explicit and detailed.

Note: I've pushed unit commits to make it easier to review and identify the changes made. When you're satisfied with the results, I'd be glad to squash everything into one commit and update the body and message of my initial commit.

@waveywaves
Copy link
Member

/retest

Copy link
Member

@waveywaves waveywaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #8925 (comment), I see a -2.9% delta on one of the files wrt unit test coverage. Can we try having a smaller delta <0.3-0.5%?

Apart from that it looks good, I can lgtm after we have this one update 👼

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, waveywaves

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [afrittoli,waveywaves]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Maximilien-R Maximilien-R force-pushed the feat/parallel-stepaction-resolution branch from c2ae737 to 72fc43f Compare August 20, 2025 09:02
@Maximilien-R Maximilien-R changed the title feat: resolve steps referencing StepActions in parallel feat: resolve steps referencing StepActions concurrently Aug 20, 2025
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 87.5% 88.3% 0.8

@Maximilien-R
Copy link
Contributor Author

Maximilien-R commented Aug 20, 2025

Hi @waveywaves,

I added two more commits:

  • This one to add missing test cases to cover missing branches.
  • This one to remove unnecessary too conservative conditions which, actually, can't be hit with the current implementation of the underlying functions.

I also took the opportunity to rebase my branch on main and squash my previous commits into one while modifying the content of the commit message to replace occurrences of parallel with concurrent.

If the two added commits are fine with you I can squash them too 👍

@waveywaves
Copy link
Member

@Maximilien-R thank you for your work, I'll review it soon, if you can squash it that would be great

Avoids unnecessary DeepCopy operations on steps that do not reference a StepAction.

Introduces concurrent resolution of steps that reference StepActions to improve the performance of TaskRun reconciliation, especially when using remote resolvers like git. The key changes include:
- `hasStepRefs` function: A new function that quickly checks if a `TaskSpec` contains any steps referencing `StepActions`. This allows for an early exit if no resolution is needed, avoiding unnecessary work.
- `resolveStepRef` function: This new function encapsulates the logic for resolving a single `StepAction` reference. It handles fetching the remote resource, merging the `StepAction` with the step's specification, and returning the resolved step
- Two-phase resolution: The `GetStepActionsData` function is now split into two distinct phases:
  - Concurrent Resolution: All `StepAction` references are resolved concurrently using an `errgroup`.
  - Sequential Merging: The resolved steps and their provenance are merged into the final step list and the `TaskRun` status sequentially.
- `updateTaskRunProvenance` function: A dedicated function for updating the TaskRun's status with provenance information.

The maximum number of StepActions that can be resolved concurrently is defined by the default config and its `default-step-ref-concurrency-limit` key.
@Maximilien-R Maximilien-R force-pushed the feat/parallel-stepaction-resolution branch from 72fc43f to 9b1f2e7 Compare August 20, 2025 10:12
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/default.go 87.5% 88.3% 0.8

@waveywaves
Copy link
Member

/ok-to-test

@tekton-robot tekton-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Aug 21, 2025
@waveywaves
Copy link
Member

/lgtm

thank you for your work on this very useful feature!

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 21, 2025
@JordanGoasdoue
Copy link

/lgtm

thank you for your work on this very useful feature!

@waveywaves It seems the e2e-tests failed, is it possible to rerun it ?
We would like to have this feature in the next release 🙏

Thank you

@waveywaves
Copy link
Member

/retest

@tekton-robot tekton-robot merged commit 8669ca1 into tektoncd:main Aug 26, 2025
47 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants