Add Task to scale down replicas for addons and karpenter in order to fix the aiml-load Pipeline #576

chithreshazad · 2026-01-26T18:23:15Z

Description / Motivation:
Addons like coredns and ebs-csi-controllers have PDB set due to which Karpenter is not able to scale down the nodepools. Added Task to scale these addons replicas to 0 before scaling down nodepools in aiml-load pipeline.

In order to stop karpenter logs Task after nodepools are scaled down, I am adding another step in the Task which sets the Karpenter replicas to zero. This is done to force stop karpenter logs which keeps running even when job is done and prevents the Teardown step to run.

Desktop Testing: Tested by triggering Tekton test run.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

hakuna-matatah · 2026-01-27T11:43:10Z

tests/tekton-resources/pipelines/eks/awscli-eks-aiml-load.yaml

      kind: Task
      name: helm-karpenter-install
  - name: get-karp-logs
+    timeout: "4h"


why explicit timeout 4h ?

This is for cases where the pipeline fails in any of the steps before the stop-karpenter-logs step. If we don't timeout then this step will keep running which blocks the Teardown step. 4h is more than enough time for the pipeline to successfully finish based on my analysis of the runs.

Gotcha!

If we don't timeout then this step will keep running which blocks the Teardown step

default timeout of pipeline will kick in if not set.

I think instead of arbitrary timeout, given we scale down karp pods, what we can do is, in get-karp-logs task, we can check if karp pods are deleted and exit the task.

Looks like it is embedded here already -

kubernetes-iteration-toolkit/tests/tekton-resources/tasks/teardown/karpenter/kubectl-get-karpenter-logs.yaml

Lines 48 to 49 in 3ec214d

# Follow logs continuously - will exit when pod is deleted

kubectl logs "$pod" -n $(params.namespace) -f &

Were you not seeing task exiting even after scale-down karp step executed ?

Were you not seeing task exiting even after scale-down karp step executed ?

Yes it does exit after scale-down karp step executed. As mentioned previously the timeout is only for scenario if we don't reach this scale-down karp step (like it failed somewhere before) though I haven't seen that in my testing so far.

I will remove the timeout for now. We can deal with it if we see problems with this step when running this Pipeline in prod.

tests/tekton-resources/tasks/teardown/karpenter/kubectl-karpenter-scale.yaml

…oad Pipeline Description / Motivation: Addons like coredns and ebs-csi-controllers have PDB set due to which Karpenter is not able to scale down the nodepools. Added Task to scale these addons replicas to 0 before scaling down nodepools in aiml-load pipeline. In order to stop karpenter logs Task after nodepools are scaled down, I am adding another step in the Task which sets the Karpenter replicas to zero. This is done to force stop karpenter logs which keeps running even when job is done and prevents the Teardown step to run. Desktop Testing: Tested by triggering Tekton test run.

hakuna-matatah reviewed Jan 27, 2026

View reviewed changes

chithreshazad force-pushed the pdb-fixes branch from 97d7857 to d8256ee Compare January 27, 2026 21:24

chithreshazad force-pushed the pdb-fixes branch from d8256ee to cdab558 Compare January 28, 2026 18:35

hakuna-matatah approved these changes Jan 28, 2026

View reviewed changes

hakuna-matatah merged commit 3c5dfb8 into awslabs:main Jan 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Task to scale down replicas for addons and karpenter in order to fix the aiml-load Pipeline #576

Add Task to scale down replicas for addons and karpenter in order to fix the aiml-load Pipeline #576

Uh oh!

chithreshazad commented Jan 26, 2026 •

edited

Loading

Uh oh!

hakuna-matatah Jan 27, 2026

Uh oh!

chithreshazad Jan 27, 2026

Uh oh!

hakuna-matatah Jan 28, 2026

Uh oh!

hakuna-matatah Jan 28, 2026

Uh oh!

hakuna-matatah Jan 28, 2026

Uh oh!

chithreshazad Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# Follow logs continuously - will exit when pod is deleted
	kubectl logs "$pod" -n $(params.namespace) -f &

Add Task to scale down replicas for addons and karpenter in order to fix the aiml-load Pipeline #576

Add Task to scale down replicas for addons and karpenter in order to fix the aiml-load Pipeline #576

Uh oh!

Conversation

chithreshazad commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hakuna-matatah Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

chithreshazad Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

hakuna-matatah Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

hakuna-matatah Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

hakuna-matatah Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

chithreshazad Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chithreshazad commented Jan 26, 2026 •

edited

Loading