[WIP] Update a cancelableReader importing gpg keys #489

QiWang19 · 2025-11-24T19:41:18Z

WIP PR cancelable reader and timeout go routine. we'd like to package a binary with these changes for the customer to test .

podmanbot · 2025-11-24T19:42:39Z

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6533

mtrmac

(For reference, compare earlier #421 .)

Testing and research are always good to do.

Do note that we know a bit more than at the time when #421 was filed: there seems to be something wrong with tracking the state of “the write end of the --command-fd pipe”. That pipe is not directly driven by caller-provided inputs to GPGME; so if the hypothesis is correct, and if the state tracking causes incorrectly reused / closed file descriptors, that would still affect, and probably break, CRI-O (compare “the write ends of all of these pipes are owned by conmon” in Jira ).

mtrmac · 2025-11-25T03:24:00Z

image/signature/policy_eval_signedby.go

-	mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(data)
-	if err != nil {
-		return sarRejected, nil, err
+	// Import the keys with a 60s timeout to avoid hanging indefinitely. see issues.redhat.com/browse/OCPBUGS-57893


I really quite strongly don’t want this complexity in core policy enforcement code.

Also, I think the core complexity of the cleanup attempt here comes from separating the new…Mechanism initialization and defer mech.Close() into different goroutines. I think running all of the PolicyContext.IsRunningImageAllowed within a goroutine with a timeout would automatically handle cleanup (… or not, and leak a C thread+goroutine). In what respect is this approach better?

This is really orthogonal to the other part of the of the PR: if the approach to terminate GPGME via a callback failing works, this moving of the code into a concurrent goroutine does not need to exist (and continues to have the issues of #421 ).

I agree the cimplexity and the still has the hang in c library. for debug wise, drop the select waiting on timeout from the core policy code base, so we can debugging on the cancelable reader.

Thanks for the note.

QiWang19 · 2025-11-25T20:27:52Z

image/signature/policy_eval_signedby.go

 	// FIXME: move this to per-context initialization
-	mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(data)
+	// Pass the context so the cancelable reader can detect cancellation and potentially abort GPGME operations
+	mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(ctx, data)


Suggested change

mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(ctx, data)

importCtx, cancel := context.WithTimeout(ctx, importKeyTimeOut)

defer cancel()

mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(importCtx, data)

@mtrmac Updated the code. Do we still need to add timeout context pass down, so the cancellable reader to work? if not hardcode a timeout here, will the caller cancel the context?

I’m afraid I can’t parse the question. Are you asking whether the CRI-O server already has a timeout / request cancellation logic of some kind (maybe triggering when/if the Kubelet client gives up)? That would be a CRI-O question.

And if CRI-O did pass to c/image/copy.Image a context that would in some situations be canceled, yes, I think such a context should be forwarded all the way to this code.

I would, by default, expect the timeout, if any, to be set at a pretty high-level; I was thinking one for the whole RPC operation served by CRI-O … except that that doesn’t work well for pulls, which, on slow links, may be making steady progress over many minutes, so there is no good timeout that can be set for the whole pull by default.

So, I guess, in the caller of IsRunningImageAllowed in c/image would be at least better than here prSignedBy; IsRunningImageAllowed is not really constant-time, but it is fairly bounded, and it would mean less new code in c/image/signature.

[There’s also CRI-O’s PullProgressTimeout which, last time I looked at it, made unwarranted assumptions about how c/image reports progress, and thus was ~unusable. But if it, hypothetically, did not trigger on false positives where c/image does not report progress but may spend a lot of time, it would trigger cancellation of the context and that should be visible in the GPGME mechanism.]

Signed-off-by: Qi Wang <[email protected]>

Luap99 · 2025-11-28T12:04:11Z

image/signature/mechanism_gpgme_only.go

+	// Check if context is cancelled before each read
+	if err := r.ctx.Err(); err != nil {
+		return 0, err
+	}
+	n, err := r.reader.Read(p)
+	// Check again after read in case cancellation happened during the read
+	if err == nil && r.ctx.Err() != nil {
+		return n, r.ctx.Err()
+	}


Does this even work/address the problem in any case? The issue is the Read is blocking so checking the context before/after will not solve the issue if the bug is is read() blocks forever.

I mean sure it does work if the the problem is that it keep reading indefinitely over and over in some loop and we want to timeout that but per #421 I got the impression that is not the case.

If we use a real pipe as reader go exposes SetDeadline() so that could be used to limit the actual time the read would block.

GPGME is a C library running gnupg as a subprocess, and communicating with it using a ~poll loop — and something goes wrong and hangs there. (My limited investigation suggests that this centers on a control FD which is not visible in the GPGME API at all, but that might be wrong or a consequence of some earlier failure.)

Replacing one of the data sources with something that can indicate failure is hypothetically a way to interrupt that poll loop — if the code has a reason to execute one more iteration of the poll. There’s not much reason to think that the code will actually have a chance to poll again, but, also, that’s the ~only cancellation mechanism we have. That would absolutely not be a fix but it might be a pragmatic workaround if it happened to be proven to affect things.

The issue is the Read is blocking so checking the context before/after will not solve the issue if the bug is is read() blocks forever.

In this PR, the cancellation is on a source that is passed as a byte array, so it should never block. OTOH that also makes it fairly unlikely that the cancellation opportunity would manifest.

ack, then I don't think the context check on read can help to work around it.

github-actions bot added the image Related to "image" package label Nov 24, 2025

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 24, 2025

dnm: Vendor changes from containers/container-libs#489

36e58b7

podmanbot mentioned this pull request Nov 24, 2025

Sync: [WIP] Update a cancelableReader importing gpg keys containers/buildah#6533

Draft

mtrmac reviewed Nov 25, 2025

View reviewed changes

mtrmac marked this pull request as draft November 25, 2025 03:33

QiWang19 force-pushed the gpgdatareader branch from 48f7b34 to 33db1ac Compare November 25, 2025 18:43

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 25, 2025

dnm: Vendor changes from containers/container-libs#489

1ef13f1

QiWang19 commented Nov 25, 2025

View reviewed changes

Update a cancelableReader importing gpg keys

51408d8

Signed-off-by: Qi Wang <[email protected]>

QiWang19 force-pushed the gpgdatareader branch from 33db1ac to 51408d8 Compare November 25, 2025 21:44

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Nov 25, 2025

dnm: Vendor changes from containers/container-libs#489

5f37337

Luap99 reviewed Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Update a cancelableReader importing gpg keys #489

[WIP] Update a cancelableReader importing gpg keys #489

Uh oh!

QiWang19 commented Nov 24, 2025 •

edited

Loading

Uh oh!

podmanbot commented Nov 24, 2025

Uh oh!

mtrmac left a comment

Uh oh!

mtrmac Nov 25, 2025

Uh oh!

QiWang19 Nov 25, 2025 •

edited

Loading

Uh oh!

QiWang19 Nov 25, 2025

Uh oh!

mtrmac Nov 25, 2025

Uh oh!

Luap99 Nov 28, 2025

Uh oh!

mtrmac Nov 28, 2025

Uh oh!

Luap99 Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-	mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(ctx, data)
+	importCtx, cancel := context.WithTimeout(ctx, importKeyTimeOut)
+	defer cancel()
+	mech, trustedIdentities, err := newEphemeralGPGSigningMechanism(importCtx, data)

[WIP] Update a cancelableReader importing gpg keys #489

Are you sure you want to change the base?

[WIP] Update a cancelableReader importing gpg keys #489

Uh oh!

Conversation

QiWang19 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

podmanbot commented Nov 24, 2025

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

mtrmac Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

QiWang19 Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiWang19 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mtrmac Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Luap99 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

mtrmac Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Luap99 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

QiWang19 commented Nov 24, 2025 •

edited

Loading

QiWang19 Nov 25, 2025 •

edited

Loading