Skip to content

The program encounters a segmentation fault at runtime. #118

@newamber

Description

@newamber

Describe the bug
I use Citadel on Linux to manage a large number of virtual machines. When performing VM startups (such as powering on or rebooting), if there are many VMs (increasing CPU and RAM pressure), the startup speed slows down. I use try await SSHClient.connect to connect to these VMs and check for timeout errors. If a timeout occurs, I repeatedly wait and retry. However, after about 5 or 6 retries, a segmentation fault consistently occurs, causing the program to crash.

This issue occurs only in the version statically compiled with MUSL.

I'd like to know whether this is a known issue that can be resolved by updating the Swift NIO libraries that Citadel depends on. To me, this appears to be a compatibility issue with the Swift NIO library when used with MUSL.

Reproducer Sample
I can't provide a complete demo, as this is an internal tool. I can only show a basic usage example. Refer to the following code snippet:

struct SSHContext {
    var os: OS
    let sshInfo: InitConfig.VM.Group.SSH

    private let client: SSHClient

    init(
        sshInfo: InitConfig.VM.Group.SSH,
        os: OS,
        timeout: Int64 = 5
    ) async throws(SSHContextError) {
        do {
            // Test function work as expectation
            if !sshInfo.ip.reachable(port: sshInfo.port!) {
                log.debug("\(sshInfo) may not reachable")
            }

            var settings = SSHClientSettings(
                host: sshInfo.ip,
                port: sshInfo.port!,
                authenticationMethod: {
                    .passwordBased(username: sshInfo.user, password: sshInfo.passwd)
                },
                hostKeyValidator: .acceptAnything(),
            )
            settings.connectTimeout = .seconds(timeout)

            client = try await SSHClient.connect(to: settings)
            self.sshInfo = sshInfo
            self.os = os
        } catch ChannelError.connectTimeout(let timeout) {
            throw .timeout(timeout)  // FIXME: why is always 10 s?
        } catch {
            throw .canNotConnect(error.localizedDescription)
        }
    }
    ...
}
     await withTaskGroup(of: (header: String, output: String).self) { group in
            for vm in vmdb {
                let logHeader = "VM-\(vm.index) \(vm.ssh):"
                var output = ""

                group.addTask {
                    do {
                        // vm boot may need some time
                        try await Task.sleep(for: .seconds(5))

                        var timeoutError = SSHContextError.timeout(.seconds(0))

                        while case .timeout(_) = timeoutError {
                            do {
                                let ssh = try await SSHContext(sshInfo: vm.ssh, os: vm.os)
                                let os = await ssh.getRemoteOS()
                                await vmActor.updateOS(os, by: vm.index)

                                output = "OS detected: \(os)"
                                break
                            } catch let error as SSHContextError {
                                log.warning("Error connect \(vm.ssh): \(error)")
                                timeoutError = error
                                try await Task.sleep(for: .seconds(1))
                            }
                        }
                    } catch {
                        output = "not reachable: \(error)".styleError
                    }

                    return (logHeader, output)
                }
            }

            for await (logHeader, output) in group {
                log.info("\(logHeader) \(output)")
            }
        }

Expected behavior
Runs normally without runtime errors.

Client (please complete the following information):

  • OS: Ubuntu 22.04
  • Client: Citadel
  • Version of Citadel, if applicable: 0.11.1

Server (please complete the following information):

  • OS: Windows 10 22H2 (VM)
  • Server: OpenSSH_for_Windows_9.5p1, LibreSSL 3.8.2
  • Version of Citadel, if applicable: N/A

Additional context

* thread #24, name = 'NIO-SGLTN-12-#0', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7ee0fb8)
    frame #0: 0x0000000001f50416 mtvmm`$s6NIOSSH10SSHMessageOWOc at <compiler-generated>:0
Note: this address is compiler-generated code in function $s6NIOSSH10SSHMessageOWOc that has no source code associated with it.
(lldb) bt
* thread #24, name = 'NIO-SGLTN-12-#0', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7ee0fb8)
  * frame #0: 0x0000000001f50416 mtvmm`$s6NIOSSH10SSHMessageOWOc at <compiler-generated>:0
    frame #1: 0x000000000213dc0b mtvmm`$s7NIOCore10ByteBufferV6NIOSSHE15writeSSHMessageySiAD0F0OF at SSHMessages.swift:1126:9
    frame #2: 0x00000000020eb3a4 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV06addKeyc14InitMessagesToC5Bytes33_953599FB70E36C5BC98CFF48C957590DLL14clientsMessage07serversS0yAA10SSHMessageO0gcS0V_AKtFSi7NIOCore10ByteBufferVzXEfU_ at SSHKeyExchangeStateMachine.swift:453:20
    frame #3: 0x00000000020f1540 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV06addKeyc14InitMessagesToC5Bytes33_953599FB70E36C5BC98CFF48C957590DLL14clientsMessage07serversS0yAA10SSHMessageO0gcS0V_AKtFSi7NIOCore10ByteBufferVzXEfU_TA at <compiler-generated>:0
    frame #4: 0x0000000001f45477 mtvmm`$s7NIOCore10ByteBufferV6NIOSSHE23writeCompositeSSHStringyS2iACzKXEKF at ByteBuffer+SSH.swift:212:33
    frame #5: 0x00000000020e7f86 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV06addKeyc14InitMessagesToC5Bytes33_953599FB70E36C5BC98CFF48C957590DLL14clientsMessage07serversS0yAA10SSHMessageO0gcS0V_AKtF at SSHKeyExchangeStateMachine.swift:452:35
    frame #6: 0x00000000020fa078 mtvmm`$s6NIOSSH26SSHKeyExchangeStateMachineV6handle03keyC0AA15SSHMultiMessageVSgAA10SSHMessageO03KeycI0V_tKF at SSHKeyExchangeStateMachine.swift:127:22
    frame #7: 0x0000000001f397f0 mtvmm`$s6NIOSSH26AcceptsKeyExchangeMessagesPAAE07receivecD7MessageyAA25SSHConnectionStateMachineV0iJ20InboundProcessResultOAA10SSHMessageO0cdG0VKF at AcceptsKeyExchangeMessages.swift:23:56
    frame #8: 0x00000000020c68f0 mtvmm`$s6NIOSSH25SSHConnectionStateMachineV21processInboundMessage9allocator4loopAC0cdF13ProcessResultOSg7NIOCore19ByteBufferAllocatorV_AJ9EventLoop_ptKF at SSHConnectionStateMachine.swift:172:40
    frame #9: 0x0000000001fb9568 mtvmm`$s6NIOSSH13NIOSSHHandlerC11channelRead7context4datay7NIOCore21ChannelHandlerContextC_AG6NIOAnyVtF at NIOSSHHandler.swift:165:54
    frame #10: 0x0000000001fb2449 mtvmm`$s6NIOSSH13NIOSSHHandlerC7NIOCore22_ChannelInboundHandlerAadEP11channelRead7context4datayAD0dF7ContextC_AD6NIOAnyVtFTW at <compiler-generated>:0
    frame #11: 0x0000000001d07882 mtvmm`$s7NIOCore21ChannelHandlerContextC06invokeB4Read33_F5AC316541457BD146E3694279514AA3LLyyAA6NIOAnyVF at ChannelPipeline.swift:2089:28
    frame #12: 0x0000000001d130ca mtvmm`$s7NIOCore15ChannelPipelineC05_fireB5Read0yyAA6NIOAnyVF at ChannelPipeline.swift:1012:29
    frame #13: 0x0000000001d168ce mtvmm`$s7NIOCore15ChannelPipelineC21SynchronousOperationsV04fireB4ReadyyAA6NIOAnyVF at ChannelPipeline.swift:1476:28
    frame #14: 0x0000000001e470cf mtvmm`$s8NIOPosix23BaseStreamSocketChannelC08readFromD0AA0bdE0C10ReadResultOyx_GyKF at BaseStreamSocketChannel.swift:135:50
    frame #15: 0x0000000001e31225 mtvmm`$s8NIOPosix17BaseSocketChannelC9readable033_7F4F544BB68CD2CFABA0C7990D6EB2C6LLAC15ReadStreamStateAELLOyx_GyF at BaseSocketChannel.swift:1137:35
    frame #16: 0x0000000001e42061 mtvmm`$s8NIOPosix17BaseSocketChannelC8readableyyF at BaseSocketChannel.swift:1121:14
    frame #17: 0x0000000001e31b19 mtvmm`$s8NIOPosix17BaseSocketChannelCyxGAA010SelectableD0A2aEP8readableyyFTW at <compiler-generated>:0
    frame #18: 0x0000000001ef149b mtvmm`$s8NIOPosix19SelectableEventLoopC06handleC0_7channelyAA08SelectorC3SetV_xtAA0B7ChannelRzlF at SelectableEventLoop.swift:563:25
    frame #19: 0x0000000001ee8251 mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKFyyKXEfU0_yAA08SelectorC0VyAA15NIORegistrationVGXEfU_ at SelectableEventLoop.swift:862:30
    frame #20: 0x0000000001ee7383 mtvmm`$s8NIOPosix13SelectorEventVyAA15NIORegistrationVGs5Error_pIggzo_AFsAG_pIegnzo_TR at <compiler-generated>:0
    frame #21: 0x0000000001eeb254 mtvmm`$s8NIOPosix13SelectorEventVyAA15NIORegistrationVGs5Error_pIggzo_AFsAG_pIegnzo_TRTA at <compiler-generated>:0
    frame #22: 0x0000000001ef7b36 mtvmm`$s8NIOPosix8SelectorC10whenReady08strategy11onLoopBegin_yAA0B8StrategyO_yyXEyAA0B5EventVyxGKXEtKF at SelectorEpoll.swift:300:25
    frame #23: 0x0000000001efe9c4 mtvmm`$s8NIOPosix8SelectorC9whenReady8strategy11onLoopBegin_yAA0B8StrategyO_yyXEyAA0B5EventVyxGKXEtKF at SelectorGeneric.swift:368:18
    frame #24: 0x0000000001ef293b mtvmm`$s8NIOPosix19SelectableEventLoopC20_blockingWaitForWork17nextReadyDeadline_y7NIOCore11NIODeadlineVSg_yAA08SelectorC0VyAA15NIORegistrationVGXEtKF at SelectableEventLoop.swift:784:28
    frame #25: 0x0000000001ee7fed mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKFyyKXEfU0_ at SelectableEventLoop.swift:857:26
    frame #26: 0x0000000001eeb294 mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKFyyKXEfU0_TA at <compiler-generated>:0
    frame #27: 0x0000000001eedcb2 mtvmm`$s8NIOPosix19withAutoReleasePoolyxxyKXEKlF at SelectableEventLoop.swift:46:16
    frame #28: 0x0000000001ef2c2d mtvmm`$s8NIOPosix19SelectableEventLoopC3runyyKF at SelectableEventLoop.swift:856:17
    frame #29: 0x0000000001e93c9a mtvmm`$s8NIOPosix27MultiThreadedEventLoopGroupC06runTheE033_C2B1528F4FBA68A3DBFA89DBAEBE9D4DLL6thread06parentF003candE22BeShutdownIndividually15selectorFactory11initializer15metricsDelegate_yAA9NIOThreadC_ACSgSbAA8SelectorCyAA15NIORegistrationVGyKcyAMcAA08NIOEventE15MetricsDelegate_pSgyAA010SelectabledE0CctFZ at MultiThreadedEventLoopGroup.swift:105:22
    frame #30: 0x0000000001e8d9bb mtvmm`$s8NIOPosix27MultiThreadedEventLoopGroupC014setupThreadAnddE033_C2B1528F4FBA68A3DBFA89DBAEBE9D4DLL4name06parentF015selectorFactory11initializer15metricsDelegateAA010SelectabledE0CSS_AcA8SelectorCyAA15NIORegistrationVGyKcyAA9NIOThreadCcAA08NIOEvente7MetricsX0_pSgtFZyAScfU_ at MultiThreadedEventLoopGroup.swift:126:41
    frame #31: 0x0000000001e90943 mtvmm`$s8NIOPosix27MultiThreadedEventLoopGroupC014setupThreadAnddE033_C2B1528F4FBA68A3DBFA89DBAEBE9D4DLL4name06parentF015selectorFactory11initializer15metricsDelegateAA010SelectabledE0CSS_AcA8SelectorCyAA15NIORegistrationVGyKcyAA9NIOThreadCcAA08NIOEvente7MetricsX0_pSgtFZyAScfU_TA at <compiler-generated>:0
    frame #32: 0x0000000001f22f6f mtvmm`$s8NIOPosix9NIOThreadCIegg_ACytIegnr_TR at <compiler-generated>:0
    frame #33: 0x0000000001f259d1 mtvmm`$s8NIOPosix14ThreadOpsPosixO3run6handle4args06detachB0yAA14PthreadWrapperVSgz_AA3BoxCyyAA9NIOThreadCc4body_SSSg4nametGSbtFZs5Int32VSpys13OpaquePointerVSgGXEfU_SvSgAYcfU_ at ThreadPosix.swift:153:21
    frame #34: 0x0000000001f25a99 mtvmm`$s8NIOPosix14ThreadOpsPosixO3run6handle4args06detachB0yAA14PthreadWrapperVSgz_AA3BoxCyyAA9NIOThreadCc4body_SSSg4nametGSbtFZs5Int32VSpys13OpaquePointerVSgGXEfU_SvSgAYcfU_To at <compiler-generated>:0
    frame #35: 0x00000000037410db mtvmm`start + 123
    frame #36: 0x0000000003742f13 mtvmm`__clone + 47

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions