-
Notifications
You must be signed in to change notification settings - Fork 67
Make model-engine FIPS compliant by updating base chainguard image #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make model-engine FIPS compliant by updating base chainguard image #724
Conversation
…ssues when creating endpoints
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
| {{- end }} | ||
| {{- end }} | ||
| {{- if $config_values }} | ||
| - name: service-config-volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious if you needed to add this for specific reason? do you actually use batch-job-orchestration-job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we don't add this change, there are integration tests running batch jobs that will fail because they can't find the service configs. See explanation below:
This is part of fixing the integration tests bug in the file below (rest_api_utils.py). Context (I debugged all this by SSH-ing into the instance running the CircleCI workflows and inspecting the kubernetes logs):
- Some of the integration tests were using a hardcoded model engine image tag (
830c81ecba2a147022e504917c6ce18b00c2af44) to run - seeCREATE_DOCKER_IMAGE_BATCH_JOB_BUNDLE_REQUEST,CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST.. - The integration tests would spin up kubernetes pods for some batch jobs, and they were being created from the hardcoded model engine image. But that meant that new changes to model engine server might not actually be reflected in the integration tests, so I updated the
rest_api_utils.pyfile to rebuild the container.
Fixing this bug caused the integration tests to fail because the old hardcoded image had the service configs copied inside the container image, whereas new images need to mount them instead. I reran these tests on another branch where I only changed the model engine image tag used to confirm this is an isolated issue - no other changes (i.e. Dockerfile):
| @@ -1,9 +1,9 @@ | |||
| # federal/Dockerfile.chainguard | |||
| FROM cgr.dev/scale.com/python-fips:3.10.15-dev | |||
| FROM cgr.dev/scale.com/python-fips:3.10.19-dev | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andytang-scale can you check if this change is ok w/ your use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm outside of changing the dockerfile used by fed. tagged @andytang-scale for review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just checking though that by removing the sitecustomize.py file these other md5 changes I see implemented allow the dockerfile to run? I just did the general sitecustomize.py to make sure it catches all hashing calls into fips compliance
|
@andytang-scale Thank you 🙌🏻 Yes, previously the I tested with a trivy scan and the new Also tested (with CircleCI integration tests) that both the standard |
Pull Request Summary
This PR upgrades the model-engine Docker base image to use Chainguard's FIPS-compliant Python image, and fixes bugs in the CircleCI integration tests.
FIPS compliance changes:
Dockerfileto use chainguard base image for FIPS compliancefederal/Dockerfilecopycelery_enable_sha256: truein all configs for FIPS complianceFixing integration tests:
Test Plan and Usage Guide