-
Notifications
You must be signed in to change notification settings - Fork 128
🐛 fail fast on MTU mismatch before SelfSubjectAccessReview #1319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
🐛 fail fast on MTU mismatch before SelfSubjectAccessReview #1319
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Shrushti72 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughAdded a local MTU detection helper and an MTU-based validation guard in CreateSelfSubjectAccessReviews; adjusted the function parameter list formatting, reformatted a client call chain to multiple lines, and added fmt and net imports. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/common/helpers/ssar.go (1)
36-65: Enhance test coverage for new MTU validation logic.The test file (
ssar_test.go) currently has a TODO comment acknowledging it needs enhancement. While existing tests call the function with different SSAR types, they don't cover the new MTU detection path:
- MTU detection success (mtu ≤ 1500)
- MTU detection failure (detectLocalMTU error)
- MTU mismatch error (mtu > 1500)
All three callers handle errors generically with
fmt.Errorfwrapping, so the new error message won't break existing error handling patterns. However, add test cases for the MTU validation scenarios to ensure the detection logic works as intended.
🤖 Fix all issues with AI agents
In @pkg/common/helpers/ssar.go:
- Around line 13-34: The detectLocalMTU function wrongly returns the maximum MTU
across all "up" interfaces (often the loopback MTU), causing false positives;
update detectLocalMTU to ignore loopback and non-routable interfaces by
filtering out interfaces with net.FlagLoopback and/or interfaces lacking an
IPv4/IPv6 address, or better: determine the interface used to reach the hub
endpoint (e.g., perform a UDP/TCP dial to the hub address and inspect the local
address/route) and return the MTU for that specific interface; ensure references
to net.Interfaces(), iface.Flags, and iface.MTU in detectLocalMTU are updated
accordingly so loopback/non-routable interfaces are excluded or the correct
outbound interface is selected.
- Around line 42-50: The MTU detection error is currently ignored, so update the
SSAR MTU check to log a warning via klog when detectLocalMTU() returns a non-nil
error while preserving the fail-open behavior; specifically, after calling
detectLocalMTU() (the detectLocalMTU symbol) add klog.Warningf("failed to detect
local MTU: %v; skipping MTU safety check", err) (or equivalent) and then
continue without changing the existing control flow that proceeds when err !=
nil, but keep the existing error/return path when err == nil and mtu > 1500.
🧹 Nitpick comments (1)
pkg/common/helpers/ssar.go (1)
39-39: Unnecessary trailing comma.The trailing comma after
selfSubjectAccessReviewsis a stylistic choice typically used for multi-element lists that span multiple lines. For a function signature with parameters already on separate lines, it's not strictly necessary and adds minor churn to the diff.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/common/helpers/ssar.go
🔇 Additional comments (2)
pkg/common/helpers/ssar.go (2)
5-6: LGTM!The new imports are necessary for MTU detection and error formatting.
54-56: LGTM - improved readability.Reformatting the method chain to multiple lines makes the client-go API call pattern easier to read without changing behavior.
| mtu, err := detectLocalMTU() | ||
| if err == nil && mtu > 1500 { | ||
| return false, nil, fmt.Errorf( | ||
| "MTU mismatch detected: managed cluster MTU (%d) exceeds typical hub path MTU (1500). "+ | ||
| "This can cause TLS handshake timeouts during bootstrap or certificate rotation. "+ | ||
| "Please align MTU values between hub and managed cluster.", | ||
| mtu, | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's read the ssar.go file to see the full context
cat -n pkg/common/helpers/ssar.goRepository: open-cluster-management-io/ocm
Length of output: 6793
🏁 Script executed:
# Find logging patterns used in this package and broader codebase
rg -n 'klog\.|logr\.|zap\.|fmt\.Printf|log\.' --type=go pkg/common/helpers/ | head -20Repository: open-cluster-management-io/ocm
Length of output: 355
🏁 Script executed:
# Also check for logging in the parent package
rg -n 'klog\.|logr\.|zap\.|fmt\.Printf|log\.' --type=go pkg/common/ | head -30Repository: open-cluster-management-io/ocm
Length of output: 591
Silent error handling masks MTU detection failures.
When detectLocalMTU() returns an error, it's silently ignored and the code proceeds with the SSAR check. This "fail-open" approach may mask real issues:
- Permission errors preventing network interface enumeration
- Platform-specific issues where
net.Interfaces()fails - Configuration problems that should be surfaced
Since the codebase uses klog for logging (already in use in pkg/common/helpers/), log a warning when MTU detection fails so operators can diagnose why the safety check isn't running.
💡 Suggested enhancement
mtu, err := detectLocalMTU()
+ if err != nil {
+ klog.Warningf("MTU detection failed: %v. Skipping MTU validation check.", err)
+ } else if mtu > 1500 {
return false, nil, fmt.Errorf(
"MTU mismatch detected: managed cluster MTU (%d) exceeds typical hub path MTU (1500). "+
"This can cause TLS handshake timeouts during bootstrap or certificate rotation. "+
"Please align MTU values between hub and managed cluster.",
mtu,
)
}This maintains the fail-open behavior while providing operator visibility into why the check was skipped.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mtu, err := detectLocalMTU() | |
| if err == nil && mtu > 1500 { | |
| return false, nil, fmt.Errorf( | |
| "MTU mismatch detected: managed cluster MTU (%d) exceeds typical hub path MTU (1500). "+ | |
| "This can cause TLS handshake timeouts during bootstrap or certificate rotation. "+ | |
| "Please align MTU values between hub and managed cluster.", | |
| mtu, | |
| ) | |
| } | |
| mtu, err := detectLocalMTU() | |
| if err != nil { | |
| klog.Warningf("MTU detection failed: %v. Skipping MTU validation check.", err) | |
| } else if mtu > 1500 { | |
| return false, nil, fmt.Errorf( | |
| "MTU mismatch detected: managed cluster MTU (%d) exceeds typical hub path MTU (1500). "+ | |
| "This can cause TLS handshake timeouts during bootstrap or certificate rotation. "+ | |
| "Please align MTU values between hub and managed cluster.", | |
| mtu, | |
| ) | |
| } |
🤖 Prompt for AI Agents
In @pkg/common/helpers/ssar.go around lines 42 - 50, The MTU detection error is
currently ignored, so update the SSAR MTU check to log a warning via klog when
detectLocalMTU() returns a non-nil error while preserving the fail-open
behavior; specifically, after calling detectLocalMTU() (the detectLocalMTU
symbol) add klog.Warningf("failed to detect local MTU: %v; skipping MTU safety
check", err) (or equivalent) and then continue without changing the existing
control flow that proceeds when err != nil, but keep the existing error/return
path when err == nil and mtu > 1500.
|
actually pr was not verified ,it was creating issue so many commits was pushed because of this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @pkg/common/helpers/ssar.go:
- Around line 1-12: The import block in package helpers (ssar.go) is not
grouped/ordered per gci rules (standard, external, local); fix by reordering and
grouping the imports accordingly (standard libs first: context, fmt, net; then
external k8s packages; then any local packages) and run the gci formatter on the
file (e.g., use "gci write" for this file) to apply the correct ordering and
formatting.
- Around line 168-170: Remove the unnecessary trailing blank lines inserted in
pkg/common/helpers/ssar.go (around the end of the affected block near lines
168–170); delete the extra empty lines so there is no trailing whitespace or
blank lines after the surrounding code (ensure file ends cleanly and run go fmt
if needed).
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/common/helpers/ssar.go
🧰 Additional context used
🪛 golangci-lint (2.5.0)
pkg/common/helpers/ssar.go
[error] 1-1: File is not properly formatted
(gci)
🔇 Additional comments (3)
pkg/common/helpers/ssar.go (3)
40-41: Idiomatic trailing comma for multi-line signature.The trailing comma in the parameter list follows Go's style conventions for multi-line function signatures.
55-57: Formatting change improves readability.Breaking the chained method call across multiple lines is a reasonable style choice with no functional impact.
43-51: Silent error handling masks MTU detection failures (as previously flagged).When
detectLocalMTU()fails, the error is ignored and execution continues. This fail-open behavior provides no visibility into why the safety check was skipped, masking permission errors, platform-specific failures, or configuration issues.As noted in the previous review, add logging (the codebase uses
klog) to warn operators when MTU detection fails.Based on previous review feedback on this PR.
Likely an incorrect or invalid review comment.
|
|
||
| package helpers | ||
|
|
||
| import ( | ||
| "context" | ||
| "fmt" | ||
| "net" | ||
|
|
||
| authorizationv1 "k8s.io/api/authorization/v1" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/client-go/kubernetes" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix import ordering per gci linter.
The imports should be properly grouped and ordered according to the gci linter rules (standard library, external packages, local packages).
Run the following to fix:
#!/bin/bash
# Format imports according to gci rules
gci write pkg/common/helpers/ssar.go🧰 Tools
🪛 golangci-lint (2.5.0)
[error] 1-1: File is not properly formatted
(gci)
🤖 Prompt for AI Agents
In @pkg/common/helpers/ssar.go around lines 1 - 12, The import block in package
helpers (ssar.go) is not grouped/ordered per gci rules (standard, external,
local); fix by reordering and grouping the imports accordingly (standard libs
first: context, fmt, net; then external k8s packages; then any local packages)
and run the gci formatter on the file (e.g., use "gci write" for this file) to
apply the correct ordering and formatting.
| func detectLocalMTU() (int, error) { | ||
| ifaces, err := net.Interfaces() | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
|
|
||
| maxMTU := 0 | ||
| for _, iface := range ifaces { | ||
| if iface.Flags&net.FlagUp == 0 { | ||
| continue | ||
| } | ||
| if iface.MTU > maxMTU { | ||
| maxMTU = iface.MTU | ||
| } | ||
| } | ||
|
|
||
| if maxMTU == 0 { | ||
| return 0, fmt.Errorf("unable to detect MTU") | ||
| } | ||
|
|
||
| return maxMTU, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: loopback MTU will cause false positives (as previously flagged).
This function returns the maximum MTU across all "up" interfaces, including loopback (typically 65536 on Linux). The validation at line 44 (mtu > 1500) will fail in nearly all deployments, blocking legitimate cluster registrations even when the actual hub-facing interface has the correct MTU.
As noted in the previous review, you must either:
- Filter out loopback/non-routable interfaces (check
net.FlagLoopback), or - Detect the specific interface used to reach the hub endpoint
Based on previous review feedback on this PR.
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary trailing whitespace.
The added blank lines serve no purpose and should be removed.
♻️ Remove the blank lines
})
}
return reviews
}
-
-
- 📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| }) | |
| } | |
| return reviews | |
| } |
🤖 Prompt for AI Agents
In @pkg/common/helpers/ssar.go around lines 168 - 170, Remove the unnecessary
trailing blank lines inserted in pkg/common/helpers/ssar.go (around the end of
the affected block near lines 168–170); delete the extra empty lines so there is
no trailing whitespace or blank lines after the surrounding code (ensure file
ends cleanly and run go fmt if needed).
qiujian16
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this detection to be opt-in and disabled by default. It might break in existing env otherwise
| selfSubjectAccessReviews []authorizationv1.SelfSubjectAccessReview, | ||
| ) (bool, *authorizationv1.SelfSubjectAccessReview, error) { | ||
|
|
||
| mtu, err := detectLocalMTU() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this mtu detection in a separated func instead of calling inside CreateSelfSubjectAccessReviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this mtu detection in a separated func instead of calling inside CreateSelfSubjectAccessReviews
I agree that MTU detection shouldn’t be embedded directly inside CreateSelfSubjectAccessReviews.
I’ll refactor this if wanted so MTU detection lives in a separate helper and is passed in only when the feature is enabled.
| ) (bool, *authorizationv1.SelfSubjectAccessReview, error) { | ||
|
|
||
| mtu, err := detectLocalMTU() | ||
| if err == nil && mtu > 1500 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if local MTU is larger than 1500 and remote is also larger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if local MTU is larger than 1500 and remote is also larger?
You’re right, local MTU alone is only a heuristic.
The intent here is not to guarantee a mismatch, but to conservatively reduce SSAR payload size only when jumbo frames are detected locally and the opt-in flag is enabled.
| ) | ||
|
|
||
| func detectLocalMTU() (int, error) { | ||
| ifaces, err := net.Interfaces() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to clarify system permission it needs to execute this command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to clarify system permission it needs to execute this command.
Good catch. net.Interfaces() relies on standard OS networking APIs and does not require elevated privileges in typical container environments, but this should be documented.
I’ll gate the MTU-aware behavior behind an explicit flag e.g. env var or config option, and keep the default path unchanged when it’s disabled. |
Fix klusterlet TLS handshake timeouts caused by MTU mismatch
Klusterlet bootstrap and certificate rotation can fail when the managed
cluster MTU exceeds the hub path MTU due to large TLS handshake packets
during SelfSubjectAccessReview calls.
This change adds a pre-flight MTU validation to fail fast with a clear,
actionable error message instead of a silent TLS handshake timeout.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.