-
-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Describe the bug
I use Citadel on Linux to manage a large number of virtual machines. When performing VM startups (such as powering on or rebooting), if there are many VMs (increasing CPU and RAM pressure), the startup speed slows down. I use try await SSHClient.connect to connect to these VMs and check for timeout errors. If a timeout occurs, I repeatedly wait and retry. However, after about 5 or 6 retries, a segmentation fault consistently occurs, causing the program to crash.
This issue occurs only in the version statically compiled with MUSL.
I'd like to know whether this is a known issue that can be resolved by updating the Swift NIO libraries that Citadel depends on. To me, this appears to be a compatibility issue with the Swift NIO library when used with MUSL.
Reproducer Sample
I can't provide a complete demo, as this is an internal tool. I can only show a basic usage example. Refer to the following code snippet:
struct SSHContext {
var os: OS
let sshInfo: InitConfig.VM.Group.SSH
private let client: SSHClient
init(
sshInfo: InitConfig.VM.Group.SSH,
os: OS,
timeout: Int64 = 5
) async throws(SSHContextError) {
do {
// Test function work as expectation
if !sshInfo.ip.reachable(port: sshInfo.port!) {
log.debug("\(sshInfo) may not reachable")
}
var settings = SSHClientSettings(
host: sshInfo.ip,
port: sshInfo.port!,
authenticationMethod: {
.passwordBased(username: sshInfo.user, password: sshInfo.passwd)
},
hostKeyValidator: .acceptAnything(),
)
settings.connectTimeout = .seconds(timeout)
client = try await SSHClient.connect(to: settings)
self.sshInfo = sshInfo
self.os = os
} catch ChannelError.connectTimeout(let timeout) {
throw .timeout(timeout) // FIXME: why is always 10 s?
} catch {
throw .canNotConnect(error.localizedDescription)
}
}
...
} await withTaskGroup(of: (header: String, output: String).self) { group in
for vm in vmdb {
let logHeader = "VM-\(vm.index) \(vm.ssh):"
var output = ""
group.addTask {
do {
// vm boot may need some time
try await Task.sleep(for: .seconds(5))
var timeoutError = SSHContextError.timeout(.seconds(0))
while case .timeout(_) = timeoutError {
do {
let ssh = try await SSHContext(sshInfo: vm.ssh, os: vm.os)
let os = await ssh.getRemoteOS()
await vmActor.updateOS(os, by: vm.index)
output = "OS detected: \(os)"
break
} catch let error as SSHContextError {
log.warning("Error connect \(vm.ssh): \(error)")
timeoutError = error
try await Task.sleep(for: .seconds(1))
}
}
} catch {
output = "not reachable: \(error)".styleError
}
return (logHeader, output)
}
}
for await (logHeader, output) in group {
log.info("\(logHeader) \(output)")
}
}Expected behavior
Runs normally without runtime errors.
Client (please complete the following information):
- OS: Ubuntu 22.04
- Client: Citadel
- Version of Citadel, if applicable: 0.11.1
Server (please complete the following information):
- OS: Windows 10 22H2 (VM)
- Server: OpenSSH_for_Windows_9.5p1, LibreSSL 3.8.2
- Version of Citadel, if applicable: N/A
Additional context
* thread #24, name = 'NIO-SGLTN-12-#0', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7ee0fb8)
frame #0: 0x0000000001f50416 mtvmm`$s6NIOSSH10SSHMessageOWOc at <compiler-generated>:0
Note: this address is compiler-generated code in function $s6NIOSSH10SSHMessageOWOc that has no source code associated with it.
(lldb) bt
* thread #24, name = 'NIO-SGLTN-12-#0', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7ee0fb8)
* frame #0: 0x0000000001f50416 mtvmm`$s6NIOSSH10SSHMessageOWOc at <compiler-generated>:0
frame #1: 0x000000000213dc0b mtvmm`$s7NIOCore10ByteBufferV6NIOSSHE15writeSSHMessageySiAD0F0OF at SSHMessages.swift:1126:9
frame #2: 0x00000000020eb3a4 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV06addKeyc14InitMessagesToC5Bytes33_953599FB70E36C5BC98CFF48C957590DLL14clientsMessage07serversS0yAA10SSHMessageO0gcS0V_AKtFSi7NIOCore10ByteBufferVzXEfU_ at SSHKeyExchangeStateMachine.swift:453:20
frame #3: 0x00000000020f1540 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV06addKeyc14InitMessagesToC5Bytes33_953599FB70E36C5BC98CFF48C957590DLL14clientsMessage07serversS0yAA10SSHMessageO0gcS0V_AKtFSi7NIOCore10ByteBufferVzXEfU_TA at <compiler-generated>:0
frame #4: 0x0000000001f45477 mtvmm`$s7NIOCore10ByteBufferV6NIOSSHE23writeCompositeSSHStringyS2iACzKXEKF at ByteBuffer+SSH.swift:212:33
frame #5: 0x00000000020e7f86 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV06addKeyc14InitMessagesToC5Bytes33_953599FB70E36C5BC98CFF48C957590DLL14clientsMessage07serversS0yAA10SSHMessageO0gcS0V_AKtF at SSHKeyExchangeStateMachine.swift:452:35
frame #6: 0x00000000020fa078 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV6handle03keyC0AA15SSHMultiMessageVSgAA10SSHMessageO03KeycI0V_tKF at SSHKeyExchangeStateMachine.swift:127:22
frame #7: 0x0000000001f397f0 mtvmm`$s6NIOSSH26AcceptsKeyExchangeMessagesPAAE07receivecD7MessageyAA25SSHConnectionStateMachineV0iJ20InboundProcessResultOAA10SSHMessageO0cdG0VKF at AcceptsKeyExchangeMessages.swift:23:56
frame #8: 0x00000000020c68f0 mtvmm`$s6NIOSSH25SSHConnectionStateMachineV21processInboundMessage9allocator4loopAC0cdF13ProcessResultOSg7NIOCore19ByteBufferAllocatorV_AJ9EventLoop_ptKF at SSHConnectionStateMachine.swift:172:40
frame #9: 0x0000000001fb9568 mtvmm`$s6NIOSSH13NIOSSHHandlerC11channelRead7context4datay7NIOCore21ChannelHandlerContextC_AG6NIOAnyVtF at NIOSSHHandler.swift:165:54
frame #10: 0x0000000001fb2449 mtvmm`$s6NIOSSH13NIOSSHHandlerC7NIOCore22_ChannelInboundHandlerAadEP11channelRead7context4datayAD0dF7ContextC_AD6NIOAnyVtFTW at <compiler-generated>:0
frame #11: 0x0000000001d07882 mtvmm`$s7NIOCore21ChannelHandlerContextC06invokeB4Read33_F5AC316541457BD146E3694279514AA3LLyyAA6NIOAnyVF at ChannelPipeline.swift:2089:28
frame #12: 0x0000000001d130ca mtvmm`$s7NIOCore15ChannelPipelineC05_fireB5Read0yyAA6NIOAnyVF at ChannelPipeline.swift:1012:29
frame #13: 0x0000000001d168ce mtvmm`$s7NIOCore15ChannelPipelineC21SynchronousOperationsV04fireB4ReadyyAA6NIOAnyVF at ChannelPipeline.swift:1476:28
frame #14: 0x0000000001e470cf mtvmm`$s8NIOPosix23BaseStreamSocketChannelC08readFromD0AA0bdE0C10ReadResultOyx_GyKF at BaseStreamSocketChannel.swift:135:50
frame #15: 0x0000000001e31225 mtvmm`$s8NIOPosix17BaseSocketChannelC9readable033_7F4F544BB68CD2CFABA0C7990D6EB2C6LLAC15ReadStreamStateAELLOyx_GyF at BaseSocketChannel.swift:1137:35
frame #16: 0x0000000001e42061 mtvmm`$s8NIOPosix17BaseSocketChannelC8readableyyF at BaseSocketChannel.swift:1121:14
frame #17: 0x0000000001e31b19 mtvmm`$s8NIOPosix17BaseSocketChannelCyxGAA010SelectableD0A2aEP8readableyyFTW at <compiler-generated>:0
frame #18: 0x0000000001ef149b mtvmm`$s8NIOPosix19SelectableEventLoopC06handleC0_7channelyAA08SelectorC3SetV_xtAA0B7ChannelRzlF at SelectableEventLoop.swift:563:25
frame #19: 0x0000000001ee8251 mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKFyyKXEfU0_yAA08SelectorC0VyAA15NIORegistrationVGXEfU_ at SelectableEventLoop.swift:862:30
frame #20: 0x0000000001ee7383 mtvmm`$s8NIOPosix13SelectorEventVyAA15NIORegistrationVGs5Error_pIggzo_AFsAG_pIegnzo_TR at <compiler-generated>:0
frame #21: 0x0000000001eeb254 mtvmm`$s8NIOPosix13SelectorEventVyAA15NIORegistrationVGs5Error_pIggzo_AFsAG_pIegnzo_TRTA at <compiler-generated>:0
frame #22: 0x0000000001ef7b36 mtvmm`$s8NIOPosix8SelectorC10whenReady08strategy11onLoopBegin_yAA0B8StrategyO_yyXEyAA0B5EventVyxGKXEtKF at SelectorEpoll.swift:300:25
frame #23: 0x0000000001efe9c4 mtvmm`$s8NIOPosix8SelectorC9whenReady8strategy11onLoopBegin_yAA0B8StrategyO_yyXEyAA0B5EventVyxGKXEtKF at SelectorGeneric.swift:368:18
frame #24: 0x0000000001ef293b mtvmm`$s8NIOPosix19SelectableEventLoopC20_blockingWaitForWork17nextReadyDeadline_y7NIOCore11NIODeadlineVSg_yAA08SelectorC0VyAA15NIORegistrationVGXEtKF at SelectableEventLoop.swift:784:28
frame #25: 0x0000000001ee7fed mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKFyyKXEfU0_ at SelectableEventLoop.swift:857:26
frame #26: 0x0000000001eeb294 mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKFyyKXEfU0_TA at <compiler-generated>:0
frame #27: 0x0000000001eedcb2 mtvmm`$s8NIOPosix19withAutoReleasePoolyxxyKXEKlF at SelectableEventLoop.swift:46:16
frame #28: 0x0000000001ef2c2d mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKF at SelectableEventLoop.swift:856:17
frame #29: 0x0000000001e93c9a mtvmm`$s8NIOPosix27MultiThreadedEventLoopGroupC06runTheE033_C2B1528F4FBA68A3DBFA89DBAEBE9D4DLL6thread06parentF003candE22BeShutdownIndividually15selectorFactory11initializer15metricsDelegate_yAA9NIOThreadC_ACSgSbAA8SelectorCyAA15NIORegistrationVGyKcyAMcAA08NIOEventE15MetricsDelegate_pSgyAA010SelectabledE0CctFZ at MultiThreadedEventLoopGroup.swift:105:22
frame #30: 0x0000000001e8d9bb mtvmm`$s8NIOPosix27MultiThreadedEventLoopGroupC014setupThreadAnddE033_C2B1528F4FBA68A3DBFA89DBAEBE9D4DLL4name06parentF015selectorFactory11initializer15metricsDelegateAA010SelectabledE0CSS_AcA8SelectorCyAA15NIORegistrationVGyKcyAA9NIOThreadCcAA08NIOEvente7MetricsX0_pSgtFZyAScfU_ at MultiThreadedEventLoopGroup.swift:126:41
frame #31: 0x0000000001e90943 mtvmm`$s8NIOPosix27MultiThreadedEventLoopGroupC014setupThreadAnddE033_C2B1528F4FBA68A3DBFA89DBAEBE9D4DLL4name06parentF015selectorFactory11initializer15metricsDelegateAA010SelectabledE0CSS_AcA8SelectorCyAA15NIORegistrationVGyKcyAA9NIOThreadCcAA08NIOEvente7MetricsX0_pSgtFZyAScfU_TA at <compiler-generated>:0
frame #32: 0x0000000001f22f6f mtvmm`$s8NIOPosix9NIOThreadCIegg_ACytIegnr_TR at <compiler-generated>:0
frame #33: 0x0000000001f259d1 mtvmm`$s8NIOPosix14ThreadOpsPosixO3run6handle4args06detachB0yAA14PthreadWrapperVSgz_AA3BoxCyyAA9NIOThreadCc4body_SSSg4nametGSbtFZs5Int32VSpys13OpaquePointerVSgGXEfU_SvSgAYcfU_ at ThreadPosix.swift:153:21
frame #34: 0x0000000001f25a99 mtvmm`$s8NIOPosix14ThreadOpsPosixO3run6handle4args06detachB0yAA14PthreadWrapperVSgz_AA3BoxCyyAA9NIOThreadCc4body_SSSg4nametGSbtFZs5Int32VSpys13OpaquePointerVSgGXEfU_SvSgAYcfU_To at <compiler-generated>:0
frame #35: 0x00000000037410db mtvmm`start + 123
frame #36: 0x0000000003742f13 mtvmm`__clone + 47