Skip to content

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Aug 27, 2025

When upgrading operators, CRD validation errors can be very large (50KB+). Kubernetes rejects status updates over 32KB with "Too long: may not be more than 32768 bytes". This causes ClusterExtension upgrades to fail and get stuck.

Messages keep important info at the start and add "... [message truncated]" suffix. Now upgrades complete successfully even with large CRD validation errors.

Added unit tests for truncation logic and CRD error scenarios.

Reviewer Checklist

  • [N/A] API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • [N/A ] Links to related GitHub Issue(s)

@camilamacedo86 camilamacedo86 requested a review from a team as a code owner August 27, 2025 13:18
Copy link

netlify bot commented Aug 27, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 87fb307
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/68b7cf45327ed30008791c72
😎 Deploy Preview https://deploy-preview-2169--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@camilamacedo86 camilamacedo86 changed the title 🐛 Fix: Truncate large error messages in status conditions 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 27, 2025
@joelanford
Copy link
Member

Can you include some details of the messages that are too long? I feel like arbitrarily truncating the message is sort of papering over the underlying issue, which is that 30k-byte messages in conditions are a poor UX, and the real solution would be to make the message shorter to begin with.

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 27, 2025
@@ -160,7 +160,7 @@ func ensureAllConditionsWithReason(ext *ocv1.ClusterExtension, reason v1alpha1.C
Type: condType,
Status: metav1.ConditionFalse,
Reason: string(reason),
Message: message,
Message: truncateMessage(message),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a limit imposed on all condition messages, it seems like we need to make sure that we truncate all condition messages.

This is one of many places where we set condition messages, right?

We may need to implement a wrapper around the meta.SetCondition() that:

  1. truncates messages
  2. everything throughout our project uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think wrapper will be better as well +1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@camilamacedo86 camilamacedo86 changed the title 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) WIP 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 27, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 27, 2025
@camilamacedo86

This comment was marked as outdated.

@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 29, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 29, 2025
@camilamacedo86 camilamacedo86 force-pushed the fix-preflight branch 2 times, most recently from aec2345 to b30dc47 Compare August 29, 2025 20:47
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.77%. Comparing base (c56a811) to head (87fb307).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2169      +/-   ##
==========================================
+ Coverage   72.73%   72.77%   +0.03%     
==========================================
  Files          79       79              
  Lines        7384     7391       +7     
==========================================
+ Hits         5371     5379       +8     
+ Misses       1666     1665       -1     
  Partials      347      347              
Flag Coverage Δ
e2e 44.18% <88.23%> (-0.02%) ⬇️
experimental-e2e 56.17% <88.23%> (-0.03%) ⬇️
unit 58.28% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@camilamacedo86 camilamacedo86 changed the title 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) WIP 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 29, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 29, 2025
@joelanford
Copy link
Member

/hold cancel

Thanks Camila! +1 on solving this both ways. Truncate long messages AND try to avoid long messages in the first place!

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 29, 2025
@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) 🐛 Fix: Truncate large error messages in status conditions (OCPBUGS-59518, OCPBUGS-38567) Aug 30, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2025
@joelanford
Copy link
Member

CRD validation errors can be very large (50KB+)

I've noticed this as well. Not only are they large, but the information density is low. IIRC, the CRD upgrade check output can contain a "diff-style" output that show context around the diff, which isn't really useful.

I think we should revisit our message formatting specifically in the case of the CRD upgrade check output to make it easier to understand:

  1. Which field(s) is/are problematic?
  2. Which changes were considered breaking?
  3. (no other information)

When upgrading operators, CRD validation errors can be very large (50KB+).
Kubernetes rejects status updates over 32KB with "Too long: may not be more than 32768 bytes".
This causes ClusterExtension upgrades to fail and get stuck.

Assisted-by: Cursor
@camilamacedo86
Copy link
Contributor Author

Hi @joelanford @perdasilva

I think it is good to get merged. I think we should truncate anyway.
However, the real fix for the issue it seems for me to be: #2179

I updated the last comment in : https://issues.redhat.com/browse/OCPBUGS-59518

The problem here is v1alpha1: ^.status.resources: unhandled: unhandled changes found :

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 3, 2025
Copy link

openshift-ci bot commented Sep 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 3, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 81be2e9 into operator-framework:main Sep 3, 2025
24 checks passed
@camilamacedo86 camilamacedo86 deleted the fix-preflight branch September 3, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants