Skip to content

Conversation

pablintino
Copy link
Contributor

@pablintino pablintino commented Oct 13, 2025

Closes: OCPBUGS-62714

- What I did

This commit addresses OCPBUGS-61714 by implementing a temporary container image policy override for rpm-ostree operations when pulling images from local container storage.

The Problem:
When using PinnedImageSets with restrictive container image policies, rpm-ostree fails to pull OS images from local storage during updates. The issue occurs because:

  • PinnedImageSets pre-pull images into local container storage
  • Restrictive policies (where users removed the default "insecureAcceptAnything" policy) don't explicitly allow the containers-storage transport
  • rpm-ostree is blocked by the policy when attempting to rebase from the locally stored image

The Solution:
Create a temporary, non-invasive policy override mechanism:

  • Generates a temporary policy file (/run/tmp-rpm-ostree-policy.json) that includes an insecureAcceptAnything rule specifically for the target image in containers-storage
  • Uses a systemd drop-in to bind-mount this temporary policy over /etc/containers/policy.json for the rpm-ostreed service only
  • Automatically cleans up both the temporary policy and drop-in after the rebase operation completes
  • Skips the override entirely when existing policies are already permissive enough

This ensures rpm-ostree can always pull from local storage when using PinnedImageSets without permanently modifying the system's security policies.

- How to verify it

  1. Spin-up a 4.21 cluster that is not using the latest available version (just for the shake of making the update). Important: To truly test the change, select an install version with a CoreOS image different from the one you will use for the update.
  2. Pin the CoreOS image of the update
# Get the CoreOS digest
COREOS_DIGEST=$(oc adm release info <update-release-image-pullspec> -o=jsonpath='{.references.spec.tags[?(@.name=="rhel-coreos")].from.name}')

# Apply the PIS resource and ensure the image is pulled in master nodes
cat <<EOF | oc apply -f - 
  apiVersion: machineconfiguration.openshift.io/v1
  kind: PinnedImageSet
  metadata:
    labels:
      machineconfiguration.openshift.io/role: master
    name: master-pinned-images
  spec:
    pinnedImages:
     - name: $COREOS_DIGEST
EOF
  1. Wait for the image to be pulled in all the master nodes
oc get node -l node-role.kubernetes.io/master -o name | \                                                                                        
xargs -I {} oc debug {} -- chroot /host podman images --filter reference=$COREOS_DIGEST

Starting pod/pabrodri-test-c4w4f-master-0-debug-hj8b6 ...
To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`.
REPOSITORY                                      TAG         IMAGE ID      CREATED     SIZE
quay.io/openshift-release-dev/ocp-v4.0-art-dev  <none>      f590b495c6cd  4 days ago  2.64 GB
### OUTPUT CROPPED FOR BREVITY. SAME OUTPUT FOR ALL MASTER NODES ###
  1. Apply a restrictive pull policy:
oc patch image.config.openshift.io/cluster --type=merge -p '
{                                                 
  "spec": {
    "allowedRegistriesForImport": [
      {
        "domainName": "registry.ci.openshift.org",
        "insecure": false
      },
      {
        "domainName": "quay.io",
        "insecure": false
      },
      {
        "domainName": "registry.redhat.io",
        "insecure": false
      },
      {
        "domainName": "registry.connect.redhat.com",
        "insecure": false
      },
      {
        "domainName": "registry.access.redhat.com",
        "insecure": false
      },
      {
        "domainName": "registry-proxy.engineering.redhat.com",
        "insecure": false
      },
      {
        "domainName": "registry.stage.redhat.io",
        "insecure": false
      },
      {
        "domainName": "ghcr.io",
        "insecure": false
      }
    ],
    "registrySources": {
      "allowedRegistries": [
        "registry.ci.openshift.org",
        "quay.io",
        "registry.redhat.io",
        "registry.connect.redhat.com",
        "registry.access.redhat.com",
        "registry-proxy.engineering.redhat.com",
        "registry.stage.redhat.io",
        "ghcr.io"
      ]
    }
  }
}
'
  1. Wait for the MCPs to rollout the change
  2. Ensure the policy has been deployed. It should look like this in all nodes:
oc get node -l node-role.kubernetes.io/master -o name | \                                                                                                                                                                                                13:23:59
xargs -I {} oc debug {} -- chroot /host cat /etc/containers/policy.json
Starting pod/pabrodri-test-c4w4f-master-0-debug-xv7c5 ...
To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`.
{
  "default": [
    {
      "type": "reject"
    }
  ],
  "transports": {
    "atomic": {
      "ghcr.io": [
        {
          "type": "insecureAcceptAnything"
        }
      ],
      "quay.io": [
        {
          "type": "insecureAcceptAnything"
        }
      ],
      "registry-proxy.engineering.redhat.com": [
        {
          "type": "insecureAcceptAnything"
        }
      ],
      "registry.access.redhat.com": [
        {
          "type": "insecureAcceptAnything"
        }
      ],
      "registry.ci.openshift.org": [
        {
          "type": "insecureAcceptAnything"
        }
### OUTPUT CROPPED FOR BREVITY ###
  1. Trigger the cluster update
oc adm upgrade --to-image=<update-release-image-pullspec>
  1. Wait for the update to finish

- Description for the changelog

Fix rpm-ostree rebase failures from local container storage when using PinnedImageSets with restrictive image policies

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 13, 2025
@pablintino pablintino changed the base branch from main to release-4.19 October 13, 2025 17:14
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2025
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 13, 2025
@pablintino pablintino force-pushed the ocpbugs-62714 branch 3 times, most recently from f398cb2 to 5d18375 Compare October 15, 2025 11:08
@pablintino pablintino marked this pull request as ready for review October 15, 2025 11:08
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2025
@pablintino pablintino force-pushed the ocpbugs-62714 branch 3 times, most recently from af4bdb7 to e15a4ae Compare October 17, 2025 12:37
@pablintino
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@pablintino: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pablintino pablintino changed the title [OCPBUGS-62714] Temporary policy.json for PIS rpm-ostree rebasing OCPBUGS-62714: Temporary policy.json for PIS rpm-ostree rebasing Oct 17, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 17, 2025
@openshift-ci-robot
Copy link
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-62714, which is invalid:

  • expected the bug to target the "4.19.z" version, but no target version was set
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-62714 to depend on a bug targeting a version in 4.20.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall structure makes sense to me. It looks like we opted for Colin's suggestion in https://issues.redhat.com/browse/OCPBUGS-62714 which I think is a safe path.

Some questions/suggestions inline:


// PodmanStorageConfig contains storage configuration from Podman.
type PodmanStorageConfig struct {
GraphDriverName string `json:"graphDriverName"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity are these names and fields copied over from podman somewhere? Or just what we need to construct the full pull spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values come from

# Default storage driver, must be set for proper operation.
driver = "overlay"
# Temporary storage location
runroot = "/run/containers/storage"
# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

I foud podman the more convenient way to get them instead of reading the file, as podman will take into consideration user overrides placed in ~/.config/containers/storage.conf

Id string `json:"Id"`
Digest string `json:"Digest"`
RepoDigests []string `json:"RepoDigests"`
RepoDigest string `json:"-"` // Filled with matching digest from RepoDigests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused about this field at first, but I guess the intent here is that Id, Digest, RepoDigests will get unmarshed from podimage images, and instead of having a new field we pass around, we have a pre-filtered RepoDigest field we populate after the fact?

Interesting pattern that I'm not sure we employ elsewhere and should work, so I'm not against it, just wanted to make sure I'm understanding that right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got it right. I did it to avoid passing a tuple everywhere, but, if it's not too clear I can always follow that other approach.

}

_, containerStoragePoliciesPresent := policy.Transports[imagePolicyTransportContainerStorage]
if (reflect.DeepEqual(policy.Default[0], signature.PolicyRequirements{signature.NewPRInsecureAcceptAnything()}) && !containerStoragePoliciesPresent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, can there be other fields in the policy.Default[0] if we have insecureAcceptAnything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it shouldn't, based on the reverse engineering I've done, there are many checks in the containers repo that assummes the insecure policy is a single element list.
To be 100% sure I added the length condition cause they are all "and" aggregated, thus, if there are more elements I cannot warranty the policy will accept the pull and I prefer to patch the policy and try with our temporal entry.

// the local image
isOsImagePresent := false
var podmanImageInfo *PodmanImageInfo
if isPisConfigured {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non blocking) do we plan on removing the PIS requirement, or keep this functionality for the PIS use case only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this one and I have not enough context of why this thing was done only for PIS. I'll ask to the team.
BTW Shouldn't we remove the FeatureGate checks now that is GAed?

assert.True(t, os.IsNotExist(err))
}

func TestIsImagePresent(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to unit test the podman info commands?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided doing so cause we have a ton of calls to exec all over the MCD that are not tested and I thought the ones the new podman file perform are just "a few more".
I've reworked the code to try to land an abstraction layer that helps testing with mocked commands. I'll create a Jira story in the Tech Debt epic to increase its usage in the MCD.

@pablintino pablintino force-pushed the ocpbugs-62714 branch 4 times, most recently from 3373ccd to 9dc9d5e Compare October 21, 2025 13:45
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated workflow looks sane to me, thanks for addressing the comments!

I didn't mean to increase the scope with my original test comment, so apologies for that, hopefully there's no conflicts on the backport.

Speaking of, I noticed that you made this against the 4.19 branch. Should we not do this on main and backport?

One last suggestion inline as well:

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2025
@pablintino pablintino changed the base branch from release-4.19 to main October 22, 2025 09:01
@openshift-ci-robot
Copy link
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-62714, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2025
@pablintino
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 22, 2025
@openshift-ci-robot
Copy link
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-62714, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr October 22, 2025 09:04
@pablintino
Copy link
Contributor Author

/retest-required

@openshift-ci-robot
Copy link
Contributor

@pablintino: This pull request references Jira Issue OCPBUGS-62714, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

- What I did

- How to verify it

  1. Spin-up a 4.21 cluster that is not using the latest available version (just for the shake of making the update)
  2. Pin the CoreOS image of the update
# Get the CoreOS digest
COREOS_DIGEST=$(oc adm release info <update-release-image-pullspec> -o=jsonpath='{.references.spec.tags[?(@.name=="rhel-coreos")].from.name}')

# Apply the PIS resource and ensure the image is pulled in master nodes
cat <<EOF | oc apply -f - 
 apiVersion: machineconfiguration.openshift.io/v1
 kind: PinnedImageSet
 metadata:
   labels:
     machineconfiguration.openshift.io/role: master
   name: master-pinned-images
 spec:
   pinnedImages:
    - name: $COREOS_DIGEST
EOF
  1. Wait for the image to be pulled in all the master nodes
  2. Apply a restrictive pull policy:
oc patch image.config.openshift.io/cluster --type=merge -p '
{                                                 
 "spec": {
   "allowedRegistriesForImport": [
     {
       "domainName": "registry.ci.openshift.org",
       "insecure": false
     },
     {
       "domainName": "quay.io",
       "insecure": false
     },
     {
       "domainName": "registry.redhat.io",
       "insecure": false
     },
     {
       "domainName": "registry.connect.redhat.com",
       "insecure": false
     },
     {
       "domainName": "registry.access.redhat.com",
       "insecure": false
     },
     {
       "domainName": "registry-proxy.engineering.redhat.com",
       "insecure": false
     },
     {
       "domainName": "registry.stage.redhat.io",
       "insecure": false
     },
     {
       "domainName": "ghcr.io",
       "insecure": false
     }
   ],
   "registrySources": {
     "allowedRegistries": [
       "registry.ci.openshift.org",
       "quay.io",
       "registry.redhat.io",
       "registry.connect.redhat.com",
       "registry.access.redhat.com",
       "registry-proxy.engineering.redhat.com",
       "registry.stage.redhat.io",
       "ghcr.io"
     ]
   }
 }
}
'
  1. Wait for the MCPs to rollout the change
  2. Trigger the cluster update
oc adm upgrade --to-image=<update-release-image-pullspec>

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pablintino
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@pablintino: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack 05a2848 link false /test e2e-openstack
ci/prow/e2e-aws-ovn 05a2848 link true /test e2e-aws-ovn
ci/prow/images 05a2848 link true /test images
ci/prow/okd-scos-e2e-aws-ovn 05a2848 link false /test okd-scos-e2e-aws-ovn
ci/prow/bootstrap-unit 05a2848 link false /test bootstrap-unit
ci/prow/okd-scos-e2e-aws-ovn 05a2848 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-hypershift 05a2848 link true /test e2e-hypershift
ci/prow/security 05a2848 link false /test security
ci/prow/e2e-aws-ovn-upgrade 05a2848 link true /test e2e-aws-ovn-upgrade
ci/prow/periodics-images 05a2848 link true /test periodics-images
ci/prow/bootstrap-unit 05a2848 link false /test bootstrap-unit
ci/prow/e2e-gcp-op-single-node 05a2848 link true /test e2e-gcp-op-single-node
ci/prow/e2e-gcp-op 05a2848 link true /test e2e-gcp-op
ci/prow/okd-scos-images 05a2848 link true /test okd-scos-images
ci/prow/unit 05a2848 link true /test unit
ci/prow/verify-deps 05a2848 link true /test verify-deps
ci/prow/verify 05a2848 link true /test verify

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I think all my concerns are addressed. Will let verification process and CI ensure we don't break anything.

/payload 4.21 nightly blocking

Just for additional safety

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 22, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pablintino, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [pablintino,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants