Make model-engine FIPS compliant by updating base chainguard image #724

ValentineDragan · 2025-10-16T11:56:29Z

Pull Request Summary

This PR upgrades the model-engine Docker base image to use Chainguard's FIPS-compliant Python image, and fixes bugs in the CircleCI integration tests.

FIPS compliance changes:

Update Dockerfile to use chainguard base image for FIPS compliance
- Delete the now identical federal/Dockerfile copy
Upgrading SQLAlchemy to 2.0.21 which uses FIPS-compliant md5 hashing
- This removes the need to monkey patching the hashing library with sitecustomize.py which was making integration tests fail because md5 is still needed for non-security hashing (i.e. generating Git/CircleCI hashes)
Set celery_enable_sha256: true in all configs for FIPS compliance

Fixing integration tests:

Update integration tests to use the current/latest model-engine image instead of a hardcoded image tag from 2024
Update helm chart to mount service configs in CircleCI
Add chainctl authentication to CircleCI to enable pulling the chainguard base image

Test Plan and Usage Guide

All unit tests and integration tests pass
- (previously integration tests weren't reflecting the latest repo changes due to using hardcoded image)

…r batch jobs pod

…ssues when creating endpoints

socket-security · 2025-10-28T21:48:51Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	sqlalchemy@2.0.4 ⏵ 2.0.21

View full report

…and doesn't require monkey patching

dmchoiboi · 2025-10-29T16:18:34Z

charts/model-engine/templates/service_template_config_map.yaml

-          {{- end }}
+            {{- end }}
+            {{- if $config_values }}
+            - name: service-config-volume


curious if you needed to add this for specific reason? do you actually use batch-job-orchestration-job

Yes, if we don't add this change, there are integration tests running batch jobs that will fail because they can't find the service configs. See explanation below:

This is part of fixing the integration tests bug in the file below (rest_api_utils.py). Context (I debugged all this by SSH-ing into the instance running the CircleCI workflows and inspecting the kubernetes logs):

Some of the integration tests were using a hardcoded model engine image tag (830c81ecba2a147022e504917c6ce18b00c2af44) to run - see CREATE_DOCKER_IMAGE_BATCH_JOB_BUNDLE_REQUEST, CREATE_FINE_TUNE_DI_BATCH_JOB_BUNDLE_REQUEST..

The integration tests would spin up kubernetes pods for some batch jobs, and they were being created from the hardcoded model engine image. But that meant that new changes to model engine server might not actually be reflected in the integration tests, so I updated the rest_api_utils.py file to rebuild the container.

Fixing this bug caused the integration tests to fail because the old hardcoded image had the service configs copied inside the container image, whereas new images need to mount them instead. I reran these tests on another branch where I only changed the model engine image tag used to confirm this is an isolated issue - no other changes (i.e. Dockerfile):

The integration tests succeed with the old hardcoded image tags

The tests fail if you only change the model engine image to use the latest changes

dmchoiboi · 2025-11-03T16:29:43Z

model-engine/Dockerfile.fips

@@ -1,9 +1,9 @@
-# federal/Dockerfile.chainguard
-FROM cgr.dev/scale.com/python-fips:3.10.15-dev
+FROM cgr.dev/scale.com/python-fips:3.10.19-dev


@andytang-scale can you check if this change is ok w/ your use case?

dmchoiboi

lgtm outside of changing the dockerfile used by fed. tagged @andytang-scale for review

andytang-scale

LGTM. Just checking though that by removing the sitecustomize.py file these other md5 changes I see implemented allow the dockerfile to run? I just did the general sitecustomize.py to make sure it catches all hashing calls into fips compliance

ValentineDragan · 2025-11-03T17:23:17Z

@andytang-scale Thank you 🙌🏻 Yes, previously the sitecustomize.py was fixing the fips vulnerabilities by changing all md5 calls to sha256 (becuase md5 hashing for secrets/cybersecurity is not safe). However this was causing the CircleCI integration tests to fail (because some CircleCI code does actually need md5 to compute git/image hashes). So instead, I bumped the sqlalchemy to a version where they fixed its fips-compliance issues, and marked the md5 calls we make in our code as non-security (we don't use them for secrets, we use them for image/commits).

I tested with a trivy scan and the new Dockerfile.fips is fips compliant.

Also tested (with CircleCI integration tests) that both the standard Dockerfile and Dockerfile.fips are running correctly and passing all integration tests

ValentineDragan added 24 commits October 16, 2025 12:55

Update Dockerfile

cf6c084

Update circleci config to login to chainguard

0201f0c

fix typo in circleci config

8ece2a5

Add code to debug circleci errors

c3ef4e5

Debug missing chainguard token

93f579a

Debug failing oidc token swap

3853517

Update config

9dac2cd

Retry OIDC token swap with updated chainguard identity

8104ff1

Update audience for token exchange request

4ddee82

Simplify chainguard authentication with chainctl

e5642e8

Specify audience cgr.dev in auth login

44b59ed

Update system packages in Dockerfile

82fe322

Update Dockerfile packages for chainguard compatbility

209a34a

update Dockerfile

6f79179

Revert circleci python version to 3.10.14

ae1bb4e

Update hardcoded model-engine image tag used in integration tests

a465e51

Fix CircleCI config trying to use hardcoded model-engine image tag fo…

0eca9cb

…r batch jobs pod

Mount service_config_circleci.yaml in batch job pods

fe8d764

Fix broken helm template

b53b7c9

Add missing infra config and service template config to batch job pods

b054e67

remove redundant config for batch job pods

c142b27

enable SHA256 checksums for Celery S3 backend to avoid MD5 decoding i…

f312dfe

…ssues when creating endpoints

Fix failing md5 monkey patch

f742965

bump sqlalchemy to 2.0.21 to address md5 FIPS compliance

c8a2c66

ValentineDragan added 4 commits October 28, 2025 22:51

Fix black linting errors

5e4fcf2

wrap Dockerfile layers between root and nonroot user

506b0bf

Remove the federal/ directory since Dockerfile is now FIPS compliant …

66cbd33

…and doesn't require monkey patching

set celery_enable_sha256 to true in all configs for FIPS compliance

e27a32d

ValentineDragan changed the title ~~Update Dockerfile with Chainguard base image~~ Make model-engine FIPS compliant by updating base chainguard image Oct 28, 2025

ValentineDragan marked this pull request as ready for review October 28, 2025 23:48

ValentineDragan requested a review from dmchoiboi October 28, 2025 23:49

dmchoiboi reviewed Oct 29, 2025

View reviewed changes

ValentineDragan and others added 3 commits November 2, 2025 23:51

make changes backwards compatible by having separate Dockerfiles

fb479f6

formatting

9b11f29

Merge branch 'main' into fix/fix-vulnerabilities-in-model-engine-image

aec1fe0

dmchoiboi reviewed Nov 3, 2025

View reviewed changes

dmchoiboi approved these changes Nov 3, 2025

View reviewed changes

dmchoiboi requested a review from andytang-scale November 3, 2025 16:30

andytang-scale approved these changes Nov 3, 2025

View reviewed changes

ValentineDragan merged commit 210fa3e into main Nov 3, 2025
7 checks passed

ValentineDragan deleted the fix/fix-vulnerabilities-in-model-engine-image branch November 3, 2025 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make model-engine FIPS compliant by updating base chainguard image #724

Make model-engine FIPS compliant by updating base chainguard image #724

Uh oh!

ValentineDragan commented Oct 16, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

dmchoiboi Oct 29, 2025

Uh oh!

ValentineDragan Nov 2, 2025

Uh oh!

dmchoiboi Nov 3, 2025 •

edited

Loading

Uh oh!

dmchoiboi left a comment

Uh oh!

andytang-scale left a comment

Uh oh!

ValentineDragan commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Make model-engine FIPS compliant by updating base chainguard image #724

Make model-engine FIPS compliant by updating base chainguard image #724

Uh oh!

Conversation

ValentineDragan commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Uh oh!

socket-security bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmchoiboi Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ValentineDragan Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

dmchoiboi Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmchoiboi left a comment

Choose a reason for hiding this comment

Uh oh!

andytang-scale left a comment

Choose a reason for hiding this comment

Uh oh!

ValentineDragan commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ValentineDragan commented Oct 16, 2025 •

edited

Loading

socket-security bot commented Oct 28, 2025 •

edited

Loading

dmchoiboi Nov 3, 2025 •

edited

Loading