Contact Details [Optional]
Available on GitHub for follow-up comments
System Information
zenml info -a -s not applicable — this bug is in the CloudFormation
infrastructure template, not in the ZenML Python package itself.
File affected: infra/aws/aws-ecr-s3-sagemaker.yaml
Branch: main
What happened?
While reviewing the CloudFormation template at infra/aws/aws-ecr-s3-sagemaker.yaml,
I found 3 bugs that cause silent failures or broken deployments.
Bug 1 — ECR Repository tag uses !Sub instead of !Ref
Current code (wrong):
ECRRepository:
Tags:
- Key: !Ref TagName
Value: !Sub TagValue # outputs literal string "TagValue"
Expected code:
ECRRepository:
Tags:
- Key: !Ref TagName
Value: !Ref TagValue # correctly resolves the parameter
!Sub TagValue does not resolve the TagValue parameter — it outputs
the literal string "TagValue". Every other resource in this file
correctly uses !Ref TagValue. ECR tags will always show "TagValue"
instead of the actual user-provided value like "zenml".
Bug 2 — CodeBuild PrivilegedMode: false breaks Docker-in-Docker builds
Current code (wrong):
Environment:
Image: bentolor/docker-dind-awscli
PrivilegedMode: false
Expected code:
Environment:
Image: bentolor/docker-dind-awscli
PrivilegedMode: true
The image bentolor/docker-dind-awscli is a Docker-in-Docker (dind) image
that requires running a Docker daemon inside the container. AWS CodeBuild
requires PrivilegedMode: true for this. With false, every docker build
command fails with a permission error — making the CodeBuild=true option
completely broken.
Reference: https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-compute-types.html
Bug 3 — Lambda runtime python3.8 is deprecated by AWS
Current code (wrong):
InvokeZenMLAPIFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: python3.8
Expected code:
Properties:
Runtime: python3.12
AWS officially deprecated the Python 3.8 Lambda runtime. New Lambda
functions using python3.8 will fail to deploy. Since this Lambda
auto-registers the ZenML stack during CloudFormation deployment,
the entire RegisterZenMLStack flow breaks silently for new deployments.
Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
Reproduction steps
1.### Bug 1 — ECR Tag
- Open
infra/aws/aws-ecr-s3-sagemaker.yaml
- Find the
ECRRepository resource (~line 60)
- Check the Tags section —
Value: !Sub TagValue
- Deploy the CloudFormation stack with TagValue = "zenml"
- Go to AWS Console → ECR → check the repository tags
- Tag value shows literal
"TagValue" instead of "zenml"
Bug 2 — CodeBuild PrivilegedMode
- Deploy the CloudFormation stack with parameter
CodeBuild=true
- Trigger a ZenML pipeline that uses CodeBuild as image builder
- Go to AWS Console → CodeBuild → check build logs
- Build fails with Docker daemon permission error because
PrivilegedMode: false prevents Docker from running inside container
Bug 3 — Lambda Runtime
- Open
infra/aws/aws-ecr-s3-sagemaker.yaml
- Find
InvokeZenMLAPIFunction resource
- Note
Runtime: python3.8
- Deploy the CloudFormation stack with ZenMLServerURL and
ZenMLServerAPIToken filled in
- Lambda fails to deploy or throws deprecation warning
- ZenML stack auto-registration silently fails
...
Relevant log output
### Bug 2 — Expected CodeBuild error with PrivilegedMode: false:
Error response from daemon: cannot start a stopped process: unknown
exec /usr/local/bin/dockerd-entrypoint.sh: operation not permitted
### Bug 3 — Expected Lambda deployment warning:
The runtime parameter of python3.8 is no longer supported for
creating or updating AWS Lambda functions. We recommend you use
the new runtime (python3.12) while creating or updating functions.
Code of Conduct
Contact Details [Optional]
Available on GitHub for follow-up comments
System Information
zenml info -a -s not applicable — this bug is in the CloudFormation
infrastructure template, not in the ZenML Python package itself.
File affected: infra/aws/aws-ecr-s3-sagemaker.yaml
Branch: main
What happened?
While reviewing the CloudFormation template at
infra/aws/aws-ecr-s3-sagemaker.yaml,I found 3 bugs that cause silent failures or broken deployments.
Bug 1 — ECR Repository tag uses
!Subinstead of!RefCurrent code (wrong):
Expected code:
!Sub TagValuedoes not resolve theTagValueparameter — it outputsthe literal string
"TagValue". Every other resource in this filecorrectly uses
!Ref TagValue. ECR tags will always show"TagValue"instead of the actual user-provided value like
"zenml".Bug 2 — CodeBuild
PrivilegedMode: falsebreaks Docker-in-Docker buildsCurrent code (wrong):
Expected code:
The image
bentolor/docker-dind-awscliis a Docker-in-Docker (dind) imagethat requires running a Docker daemon inside the container. AWS CodeBuild
requires
PrivilegedMode: truefor this. Withfalse, everydocker buildcommand fails with a permission error — making the
CodeBuild=trueoptioncompletely broken.
Reference: https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref-compute-types.html
Bug 3 — Lambda runtime
python3.8is deprecated by AWSCurrent code (wrong):
Expected code:
AWS officially deprecated the Python 3.8 Lambda runtime. New Lambda
functions using
python3.8will fail to deploy. Since this Lambdaauto-registers the ZenML stack during CloudFormation deployment,
the entire
RegisterZenMLStackflow breaks silently for new deployments.Reference: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
Reproduction steps
1.### Bug 1 — ECR Tag
infra/aws/aws-ecr-s3-sagemaker.yamlECRRepositoryresource (~line 60)Value: !Sub TagValue"TagValue"instead of"zenml"Bug 2 — CodeBuild PrivilegedMode
CodeBuild=truePrivilegedMode: falseprevents Docker from running inside containerBug 3 — Lambda Runtime
infra/aws/aws-ecr-s3-sagemaker.yamlInvokeZenMLAPIFunctionresourceRuntime: python3.8ZenMLServerAPIToken filled in
...
Relevant log output
Code of Conduct