**Describe the bug** Seems related to https://github.com/aws-controllers-k8s/community/issues/2592 JobRun objects are kept on the cluster indefinitely, although there's a ttl set to delete them 24h after completion. Removing the finalizer `finalizers.emrcontainers.services.k8s.aws/JobRun` by patching the object seems to resolve it. **Steps to reproduce** I have an Argo CronWorkflow that runs every 20 minutes. It creates 6 emr job runs sequentially. They are not deleted automatically and accumulate over time. After reaching a few thousand of those objects, the controller tries to cancel them (even those that are in a non-cancellable/completed state) and gets stuck in a reconciliation error loop. Example for an error: ``` {"level":"error","ts":"2025-07-16T08:05:05.079Z","msg":"Reconciler error","controller":"jobrun","controllerGroup":"emrcontainers.services.k8s.aws","controllerKind":"JobRun","JobRun":{"name":"redacted","namespace":"[[redacted]]"},"namespace":"redacted","name":"redacted","reconcileID":"e1e8eef7-e5b2-4d6b-941f-b6fe13d2de47","error":"operation error EMR containers: CancelJobRun, failed to get rate limit token, retry quota exceeded, 1 available, 5 requested","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255"} ``` Example of a completed job run: ``` Name: [[redacted]] Namespace: [[redacted]] Labels: <none> Annotations: <none> API Version: emrcontainers.services.k8s.aws/v1alpha1 Kind: JobRun Metadata: Creation Timestamp: 2025-08-04T15:41:22Z Deletion Grace Period Seconds: 0 Deletion Timestamp: 2025-08-04T16:47:36Z Finalizers: finalizers.emrcontainers.services.k8s.aws/JobRun Generation: 2 Owner References: API Version: argoproj.io/v1alpha1 Block Owner Deletion: true Controller: true Kind: Workflow Name: [[redacted]] UID: 0ab79df5-f92d-4c7c-89b0-646369c0237b Resource Version: 95869895 UID: 3b2fd48a-48e7-4e63-b703-4ef75459bf99 Spec: Configuration Overrides: ApplicationConfiguration: - classification: spark-defaults properties: spark.kubernetes.container.image: [[redacted]] spark.kubernetes.driver.podTemplateFile: [[redacted]] spark.kubernetes.executor.podTemplateFile: [[redacted]] spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-sparkspill.mount.path: /var/spark/spill spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-sparkspill.mount.readOnly: "false" spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-sparkspill.mount.sizeLimit: 80Gi spark.kubernetes.node.selector.topology.kubernetes.io/zone: us-east-1b spark.local.dir: /var/spark/spill - classification: emr-containers-defaults properties: logging.image: [[redacted]] Execution Role ARN: [[redacted]] Job Driver: Spark Submit Job Driver: Entry Point: redacted Entry Point Arguments: --step app-sync --run-id 20250804152002 Spark Submit Parameters: --conf spark.executor.instances=20 --conf spark.executor.memory=21G --conf spark.driver.memory=20G --conf spark.executor.cores=4 --conf spark.driver.cores=6 Name: redacted Release Label: emr-7.7.0-latest Virtual Cluster Ref: From: Name: redacted Status: Ack Resource Metadata: Arn: redacted Owner Account ID: redacted Region: us-east-1 Conditions: Last Transition Time: 2025-08-04T16:47:25Z Status: True Type: ACK.ReferencesResolved Status: True Type: ACK.ResourceSynced Message: ValidationException: Job run 0000000363lnqe40g2e is not in a cancellable state Status: True Type: ACK.Terminal Id: 0000000363lnqe40g2e State: COMPLETED Events: <none> ``` My workflow manifest: ``` apiVersion: argoproj.io/v1alpha1 kind: CronWorkflow metadata: name: [[redacted]] namespace: [[redacted]] spec: ttlStrategy: secondsAfterCompletion: 300 schedules: - '*/20 * * * *' workflowSpec: serviceAccountName: emr-application-creator entrypoint: main metrics: prometheus: - name: [[redacted]] help: "Duration gauge" gauge: value: "{{workflow.duration}}" synchronization: mutexes: - name: [[redacted]] templates: - name: main dag: tasks: [[redacted]] - name: data-pipeline-job inputs: parameters: - name: step - name: run-id ttlStrategy: secondsAfterCompletion: 300 resource: action: create setOwnerReference: true successCondition: status.state == COMPLETED failureCondition: status.state == FAILED manifest: | apiVersion: emrcontainers.services.k8s.aws/v1alpha1 kind: JobRun metadata: name: {{workflow.name}}-{{inputs.parameters.step}} namespace: [[redacted]] spec: name: {{workflow.name}}-{{inputs.parameters.step}} virtualClusterRef: from: name: [[redacted]] executionRoleARN: [[redacted]] releaseLabel: emr-7.7.0-latest jobDriver: sparkSubmitJobDriver: entryPoint: [[redacted]] entryPointArguments: - --step - '{{inputs.parameters.step}}' - --run-id - '{{inputs.parameters.run-id}}' sparkSubmitParameters: "--conf spark.executor.instances=20 \ --conf spark.executor.memory=21G \ --conf spark.driver.memory=20G \ --conf spark.executor.cores=4 \ --conf spark.driver.cores=6" configurationOverrides: | ApplicationConfiguration: - classification: spark-defaults properties: spark.kubernetes.container.image: [[redacted]] spark.kubernetes.driver.podTemplateFile: [[redacted]] spark.kubernetes.executor.podTemplateFile: [[redacted]] spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-sparkspill.mount.path: /var/spark/spill spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-sparkspill.mount.readOnly: "false" spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-sparkspill.mount.sizeLimit: 80Gi spark.kubernetes.node.selector.topology.kubernetes.io/zone: us-east-1b spark.local.dir: /var/spark/spill - classification: emr-containers-defaults properties: logging.image: [[redacted]] ``` **Expected outcome** I expect JobRuns older than the provided TTL to be deleted from the cluster. **Environment** * Kubernetes version: 1.32 * Using EKS (yes/no), if so version? eks.13 / v1.32.5-eks-5d4a308 * AWS service targeted (S3, RDS, etc.) - EMR * EMR containers version - 1.0.26