-
Notifications
You must be signed in to change notification settings - Fork 296
CA-384228: Xapi fail to start on slave when "Synchronising bonds on slave with master" #6817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1d0e0d4 to
0d7e423
Compare
|
The PR description has " - <title>" and the description " - <another_title>" and the link is not clear for someone outside XenServer. I assume the PR title reference the original issue this fixes and the description one is the solution found to fix it? |
Sorry, added detailed context for the fix in the description, pls review it again, thanks. |
0d7e423 to
dcaf8d9
Compare
Fixes slave xapi hang during pool join when jumbo frames are configured but the network path doesn't support them. Problem: When MTU mismatch occurs (interface configured for 9000 but path supports only 1500), RPC connections hang on large requests (~1613 bytes). The hanging connection holds a database lock, blocking all other DB operations and causing the entire slave xapi to become unresponsive during pool join. Root cause: Without Path MTU Discovery, TCP cannot detect when the path MTU is smaller than the configured interface MTU. When ICMP "Fragmentation Needed" messages are blocked by firewalls, TCP has no feedback mechanism to reduce packet size. Packets exceeding the path MTU are silently dropped by network infrastructure, leading to connection timeouts. The application-level retry logic (in master_connection.ml) attempts reconnection, but each retry encounters the same issue while holding a database lock, causing extended hangs. Solution: Enable TCP PMTUD to allow automatic MTU detection and adaptation. Configuration: - net.ipv4.tcp_mtu_probing=1: Enable automatic MTU detection when ICMP blackhole is detected (recommended setting) - net.ipv4.tcp_base_mss=1024: Base MSS for MTU probing With PMTUD enabled, TCP detects packet loss patterns indicating MTU issues and proactively reduces packet size to find a working MTU. This works even when ICMP Fragmentation Needed messages are blocked by firewalls, allowing connections to succeed and preventing database lock contention. Files: - scripts/92-xapi-tcp-mtu.conf: New sysctl configuration file - scripts/Makefile: Install sysctl config to /usr/lib/sysctl.d/ The conf file is installed to /usr/lib/sysctl.d/ (package-owned location) rather than /etc/sysctl.d/ (user config space). The "92" prefix ensures this loads after basic network configuration (91-net-ipv6.conf), and admins can override with files in /etc/sysctl.d/ if needed. Reference: https://blog.cloudflare.com/path-mtu-discovery-in-practice/ Signed-off-by: Gang Ji <[email protected]>
Add diagnostic tests during pool join to detect and warn about MTU mismatches, particularly when higher MTU values are configured but the network path doesn't support them. While TCP PMTUD (enabled in previous commit) fixes the hang automatically, this provides visibility into MTU configuration issues so customers can fix their network infrastructure. The diagnostics: 1. Query master's management network MTU via RPC 2. Detect VLAN configuration and account for 4-byte overhead 3. Calculate ICMP payload dynamically: MTU - IP header (20) - ICMP header (8) - VLAN (4 if present) 4. Test standard MTU (1500) with ICMP ping 5. Test configured MTU if > 1500 6. Create pool-level alert when CA-384228 scenario detected: - Standard MTU (1500) works - Configured higher MTU fails - This indicates path MTU < configured MTU Key design decisions: - Does NOT block pool join (ICMP may be blocked by firewalls) - Queries master's DB via verified RPC (slave's DB not yet synced) - Called after certificate exchange with verified connection - Creates pool-level alert for customer visibility in XenCenter/CLI - Relies on TCP PMTUD (enabled by previous commit) to prevent hangs - Diagnostics are informational only, providing visibility The implementation dynamically calculates test packet sizes based on actual configured MTU rather than assuming fixed values, making it work correctly with any MTU configuration (not just jumbo frames). Warning format highlights the issue clearly and references the TCP PMTUD fix that handles it automatically, with guidance for infrastructure improvements. Signed-off-by: Gang Ji <[email protected]>
|
Apologies for the notification noise just now—I accidentally deleted the branch and closed the PR due to a typo in my git push command. |
65433e1 to
ea5101b
Compare
changlei-li
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix. But I think the critical issue is the infrastructure/MTU configuration in the CA ticket. This PR is the improvement.
IMO, the commit 1 enable TCP PMTUD, it should goes into some Foundation repo, not XAPI. Also it takes effect in the whole host which should be considered carefully. It may change the existing test case behavior. It only makes sense to add to XAPI repo if need to be configured and managed by XAPI.
For the commit 2, I'm not sure if it is deserved to take more time to diagnose MTU during pool join.
|
Close this PR as the author can't continue working on this. Can re-open it if someone would like to continue based on this. Otherwise, may pick up this in the future with preparation. |
Problem: CA-384228 - Xapi fails to start on slave during pool join
When a slave joins a pool with jumbo frames (MTU 9000) configured, xapi can hang
during "Synchronising bonds on slave with master" if the network path doesn't
support the configured MTU.
Root Cause:
Why it happens:
By default, TCP relies on ICMP "Fragmentation Needed" messages to discover path MTU.
When the interface is configured with MTU=9000 but the network path only supports
1500 bytes:
If ICMP is working: Router sends "Fragmentation Needed" ICMP message back to TCP,
TCP reduces packet size to 1500, connection works fine.
If ICMP is blocked (CA-384228 scenario): Router drops large packets but the ICMP
message is blocked by firewalls. TCP has no way to discover the mismatch. It keeps
retrying with 9000-byte packets that get silently dropped, leading to connection hangs and
database lock contention..
Before this fix: TCP depends entirely on ICMP for MTU discovery. When ICMP is
blocked, TCP cannot adapt, causing extended hangs and database deadlocks.
After this fix: TCP can detect packet loss patterns and proactively reduce packet
size even without ICMP feedback, preventing hangs and allowing pool operations to
complete successfully.
Solution Overview:
This fix has two parts:
and adapt to path MTU, preventing hangs
for visibility, creates alert for customer awareness
Commit 1: CA-384228: Enable TCP Path MTU Discovery by default
Add sysctl configuration to enable TCP PMTUD on all XenServer hosts.
This prevents TCP connection hangs when path MTU is smaller than configured
interface MTU (e.g., jumbo frames configured but network infrastructure
doesn't support them).
How it fixes the hang:
With PMTUD enabled, TCP can now automatically:
This prevents the database lock contention that causes slave xapi to hang completely.
Configuration:
blackholed (recommended setting)
Files:
The "92" prefix ensures this loads after basic network configuration
(91-net-ipv6.conf) but before local administrator overrides (99-*).
Reference: https://blog.cloudflare.com/path-mtu-discovery-in-practice/
Commit 2: CA-384228: Add MTU diagnostics during pool join
Add diagnostic tests during pool join to detect and warn about MTU
mismatches, particularly when higher MTU values are configured but
the network path doesn't support them.
Why diagnostics are needed:
While TCP PMTUD (commit 1) fixes the hang automatically, customers need
visibility into MTU configuration issues. This creates an alert visible
in XenCenter/CLI when path MTU < configured MTU, prompting infrastructure
fixes to prevent performance degradation.
The diagnostics:
Key design decisions:
The implementation dynamically calculates test packet sizes based on
actual configured MTU rather than assuming fixed values, making it
work correctly with any MTU configuration (not just jumbo frames).
Warning format highlights the issue clearly and references the
TCP PMTUD fix that handles it automatically, with guidance for
persistent problems.