Skip to content

Race condition with many SSH channels open, unhelpful errors #2

@sbrl

Description

@sbrl

Hello,

I am developing an application in which I need to make a very large number of concurrent SSH connections at once (regularly 180+ individual connections is the end goal), but I am having difficulty scaling my application.

Whenever I run it with just ~2 connections at once, it runs fine. However, if I run it with more than 2-3 connections at once with concurrent async interaction between multiple connections, I get crashes like this one:

Ssh channel closed, check why the socket was closed or lost connection
Error: Ssh channel closed, check why the socket was closed or lost connection
    at Channel.<anonymous> (file:///home/sbrl/Documents/repos/PROJECT_NAME/node_modules/hivessh/dist/SshExec.js:190:41)
    at Object.onceWrapper (node:events:628:26)
    at Channel.emit (node:events:525:35)
    at Channel.doClose (/home/sbrl/Documents/repos/PROJECT_NAME/node_modules/ssh2/lib/utils.js:101:21)
    at Object.onceWrapper (node:events:627:28)
    at Channel.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1696:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21)Details so far: 

.....but it fails to tell me what the error is, and I given that there is a stack break I cannot tell which connection has the problem or which call to a hivessh method actually had the problem.

This error is randomly thrown at random points across my entire codebase, making debugging impactical. In esssence, I'm doing something like this (greatly simplified):

const sshcs = [ do_ssh_connect(...), do_ssh_connect(...) ];
await Promise.all(sshcs.map(sshc => sshc.exec(...));

....just with lots of .exec calls.

Empirical evidence suggests that this is a bug with .exec(), and NOT with sshc.sftp.*, as I have yet to see a crash from the SFTP subsystem.

In other words, I suspect that hivessh has a race condition when you execute multiple commands at the same time asynchronously across multiple SshHosts.


To this end, I suggest that the above error message be updated to include the reason why the socket was closed or lost connection.

Inspecting an SshHost instance reveals this might be referring to SshHost.closeErr, but the message is unclear as to precisely where one should go looking.

For example, the error could instead read:

Error: Ssh channel closed. Reason: <reason here>

...for example, making something up:

Error: Ssh channel closed. Reason: Connection closed by server

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or requesthelp wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions