-
-
Notifications
You must be signed in to change notification settings - Fork 899
Description
Provide environment information
System:
- OS: Linux 6.12 Debian GNU/Linux 11 (bullseye) 11 (bullseye)
- CPU: (4) x64 AMD EPYC 7R13 Processor
- Memory: 4.57 GB / 7.55 GB
- Container: Yes
- Shell: 5.1.4 - /bin/bash
Binaries:
- Node: 20.11.1 - /usr/local/bin/node
- npm: 10.2.4 - /usr/local/bin/npm
- pnpm: 8.15.5 - /usr/local/bin/pnpm
Deployment:
- Trigger.dev: v4.0.0-beta.23
- Helm Chart: v4.0.0-beta.18
- Registry: AWS ECR
- Authentication: EKS IRSA (IAM Roles for Service Accounts)
Describe the bug
When running npx trigger.dev@v4-beta deploy against a self-hosted instance on EKS using IRSA for ECR authentication, deployments fail if DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN is not set.
The CLI returns the following error:
Failed to start deployment: Failed to get deployment image ref
And the webapp logs show a ValidationError from the AWS SDK:
{
"assumeRole":{},
"sessionName":"TriggerWebappECRAccess_1753172908266_70oxc8",
"error":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
"http":{"requestId":"P4kJ62bmgEtC8hVTFmwrk","path":"/api/v1/deployments"},
"level":"error",
"message":"Failed to assume role"
}
{
"cause":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
"level":"error",
"message":"Failed to get deployment image ref"
}This seems related to the code always attempting an AssumeRole operation, even when the pod already has the necessary ECR permissions via IRSA's default credential chain. This requires users to configure workarounds that add complexity and may differ from typical practices for IRSA.
Root Cause
The implementation in initializeDeployment.server.ts unconditionally passes an assumeRole object to getDeploymentImageRef, even when the corresponding environment variables are undefined:
// This always creates an object, never undefined
assumeRole: {
roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN, // undefined in my case
externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID, // also undefined
}This prevents the code from using the default credential chain and causes the AWS SDK to throw a ValidationError because roleArn is null.
Use Case Comparison
It seems there might be two different use cases for ECR integration:
- Trigger.dev Cloud (Cross-Account):
AssumeRoleis necessary for the central webapp to access a user's ECR in another account. This works as expected. - Self-Hosted on EKS (Same Account): The webapp and ECR are in the same account, and the pod already has direct permissions via IRSA. In this common setup, an
AssumeRoleoperation is typically not needed.
The current implementation appears to be designed for the first use case.
Reproduction repo
Not applicable - this is a deployment configuration issue.
To reproduce
-
Create IRSA for Trigger.dev on EKS:
eksctl create iamserviceaccount \ --cluster=my-cluster \ --namespace=trigger-dev \ --name=trigger-dev-webapp \ --attach-policy-arn=arn:aws:iam::<ACCOUNT_ID>:policy/TriggerDevECRAccess \ --approve
-
Deploy Trigger.dev with Helm and Kustomize:
values.yaml:registry: deploy: false repositoryNamespace: "trigger" external: host: "<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com" auth: enabled: false
kustomization.yaml:patches: - path: sa-patch.yaml target: kind: ServiceAccount name: trigger-dev-webapp
sa-patch.yaml:apiVersion: v1 kind: ServiceAccount metadata: name: trigger-dev-webapp annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/eksctl-my-cluster-addon-iamserviceaccount-trigger-dev-webapp"
-
Attempt to deploy a project:
npx trigger.dev@v4-beta deploy
-
Observe the
ValidationErrorin the webapp logs.
Expected Behavior
When DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN is not set, npx trigger.dev@v4-beta deploy should succeed by using the pod's default AWS credential chain (provided by IRSA), without attempting an AssumeRole operation.
Additional information
Suggested Fix
A possible solution could be to only construct the assumeRole object when the DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN environment variable is explicitly set. This would allow the default credential chain to be used when the variable is absent.
// In apps/webapp/app/v3/services/initializeDeployment.server.ts
+ const assumeRole = env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
+ ? {
+ roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
+ externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
+ }
+ : undefined;
const [imageRefError, imageRefResult] = await tryCatch(
getDeploymentImageRef({
// ... other params
- assumeRole: {
- roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
- externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
- },
+ assumeRole,
})
);Discussion on Workarounds
Without this change, users in a same-account IRSA setup need to implement workarounds that have notable drawbacks:
- Setting the ARN to the pod's own role: This requires configuring the role's trust policy to allow self-assumption, which is an uncommon pattern and adds extra STS API calls.
- Creating a dedicated intermediate role: This adds the complexity of creating and maintaining an additional IAM role.
Both approaches seem to add unnecessary complexity for a standard self-hosted EKS setup. It would be great if Trigger.dev could natively support the direct use of IRSA credentials, which aligns with the "optional STS assume role support" mentioned in PR #2224.