-
Notifications
You must be signed in to change notification settings - Fork 1.2k
agent, server: improve packet framing and use TLS 1.3 #11503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request refactors the TLS framing and buffer management in the `Link` class to improve correctness and maintainability, and updates the SSL context initialization to use TLS 1.3 for enhanced security. CloudStack uses a 4-byte header for TLS packets. Earlier, it was not sent within the TLS application data, which affected maintainability and the implementation of agent-server communication using a different language. The most important changes are grouped below. * Reworked the TLS buffer handling in `Link.java`, replacing legacy header and packet assembly logic with a more robust system using `netBuffer`, `appBuffer`, and an explicit `headerBuffer` for frame length management. This improves frame parsing and avoids buffer overflows. * Refactored the read and write logic: the `read` method now correctly assembles frames from TLS streams, handling buffer resizing and edge cases, while the `doWrite` method builds TLS packets with a 4-byte length header and payload, ensuring correct framing and handshake handling. * Simplified the message sending and writing logic by removing manual header prepending and using the new framing system; the write queue now contains only payload buffers, and the header is added during the TLS wrap process. * Updated SSL context initialization in `Link.java` to use `SSLUtils.getSSLContextWithLatestVersion()`, ensuring that TLS 1.3 is used for all server, client, and management SSL contexts. * Added a new method `getSSLContextWithLatestVersion()` in `SSLUtils.java`, which returns an `SSLContext` instance for TLS 1.3. Signed-off-by: Abhishek Kumar <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11503 +/- ##
============================================
- Coverage 17.55% 17.55% -0.01%
+ Complexity 15543 15534 -9
============================================
Files 5910 5910
Lines 529334 529359 +25
Branches 64654 64656 +2
============================================
- Hits 92944 92909 -35
- Misses 425933 425995 +62
+ Partials 10457 10455 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueorangutan package |
|
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14723 |
Signed-off-by: Abhishek Kumar <[email protected]>
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-14133)
|
|
[LL] Trillian Build Failed (tid-7129) |
|
@blueorangutan package |
|
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15690 |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
Some issue with smoke test runs. Will investigate and make the required fixes |
Description
This pull request refactors the TLS framing and buffer management in the
Linkclass to improve correctness and maintainability, and updates the SSL context initialization to use TLS 1.3 for enhanced security. CloudStack uses a 4-byte header for TLS packets. Earlier, it was not sent within the TLS application data, which affected maintainability (simply using TLS1.3 without packet changes didn't work, and it resulted in errors like [1]) and the implementation of agent-server communication using a different language. The most important changes are grouped below.TLS Framing and Buffer Management
Link.java, replacing legacy header and packet assembly logic with a more robust system usingnetBuffer,appBuffer, and an explicitheaderBufferfor frame length management. This improves frame parsing and avoids buffer overflows.readmethod now correctly assembles frames from TLS streams, handling buffer resizing and edge cases, while thedoWritemethod builds TLS packets with a 4-byte length header and payload, ensuring correct framing and handshake handling.Security Improvements
Link.javato useSSLUtils.getSSLContextWithLatestVersion(), ensuring that TLS 1.3 is used for all server, client, and management SSL contexts.getSSLContextWithLatestVersion()inSSLUtils.java, which returns anSSLContextinstance for TLS 1.3.[1] Error in agent-server connection with TLS1.3 without packet framing changes
2025-08-25 18:41:41,698 INFO [utils.nio.NioClient] (main:[]) (logid:) Connecting to 172.120.0.67:8250
2025-08-25 18:41:41,702 INFO [utils.nio.NioClient] (main:[]) (logid:) Connected to 172.120.0.67:8250
2025-08-25 18:41:41,704 INFO [utils.nio.Link] (main:[]) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2025-08-25 18:41:41,941 INFO [utils.nio.NioClient] (main:[]) (logid:) SSL: Handshake done
2025-08-25 18:41:41,950 DEBUG [utils.nio.NioClient] (Agent-NioConnectionHandler-1:[]) (logid:) Location 1: Socket Socket[addr=/172.120.0.67,port=8250,localport=59004] closed on read. Probably -1 returned: Input record too big: max = 16709 len = 22679
2025-08-25 18:41:41,950 DEBUG [utils.nio.NioClient] (Agent-NioConnectionHandler-1:[]) (logid:) Closing socket Socket[addr=/172.120.0.67,port=8250,localport=59004]
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Logs from management server:
Logs from one of the host:
Communication with hosts, system VMs and MS seemed fine
How did you try to break this feature and the system with this change?