-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix: Explicitly exit CLI to prevent hanging subprocesses #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
2fba752
to
3577bf2
Compare
90a62c6
to
fa73183
Compare
Rebased to remove unintended extra commit. |
state().queued.delete(input.sessionID) | ||
SessionCompaction.prune(input) | ||
log.info("waiting for session title to be generated") | ||
await ensureTitlePromise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure we wanna await here, doesnt that make title generation blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. The promise will be running either way, and if we don't await it here, it will hang around until either it resolves or the process exits.
I think it makes sense to await it here, as it is the end of the LLM's response to the user's input, meaning it can run concurrently with the LLM's response.
Also, since I do process.exit()
at the end of index.ts
to work around those dockerized MCP servers, if I don't await this here, the process.exit()
will be reached before the ensureTitlePromise resolves, so it will be interrupted and the title won't be saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rekram1-node Tweaked slightly to address your concern. See 1e0846f.
1e0846f
to
7da8a66
Compare
// Most notably, some docker-container-based MCP servers don't handle such signals unless | ||
// run using `docker run --init`. | ||
// Explicitly exit to avoid any hanging subprocesses. | ||
process.exit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other cases besides the docker mcps?
Can't we just track the pid from transports and send a SIGKILL if they are still running post close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a shell script wrapping an stdio MCP that has the same behavior. I can send it to you later today. I can imagine there can be other cases where an MCP misbehaves in a similar way. I have seen a few MCP server implementations and the general vibe is not high quality or well thought out.
I would like to avoid using SIGKILL. Fact of the matter is, the actual MCP server process receives SIGTERM when process.exit() is called. I confirmed that by running strace
on the container's PID. But from the POV of OC, the child process is not the MCP server, but rather the docker client managing the MCP server.
The PID from the transports is an implementation detail - I would like to avoid using that too.
Any subprocess not under our control can cause the hang. The process.exit is a panacea, and as long as we do proper awaits, it won't cause any issues. Which we do for the most part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shell script + how I use it in opencode.json
, as promised: https://gist.github.com/veracioux/3a9fe2ea06fca0d16852481030d9297e
|
||
return { | ||
queued, | ||
ensureTitlePromise: undefined as Promise<void> | undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if I have multiple calls to prompt at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that opencode server isn't tied just to the normal server + tui, it can also have other use cases ex:
https://github.com/sst/opencode/blob/dev/packages/sdk/js/example/example.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I'll make some tweaks. Thanks for pointing out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See newest commits. They should address both these concerns.
- Track cleanup promises (LSP clients, MCP clients, file watchers) to prevent premature disposal - Add timeout warnings to alert users when disposal is taking longer than expected - Use allSettled instead of Promise.all to handle disposal errors gracefully - Move session title generation to tracked promises, let State.dispose handle it in unified way
Reproduced using MCP config from: #1717
The easiest way to reproduce (and verify the fix) is to run
opencode run hi
.Investigation:
docker run
ai
library sends SIGTERM to all child MCP processes that use stdio transportsdocker run
client, andSIGTERM
isn't properly delivered, unlessdocker run --init
was used. So the subprocesses stick around, preventing opencode from exiting. I verified this with a minimalistic subprocess-spawning bun program andstrace
process.exit
, in which caseSIGTERM
is delivered correctlyprocess.exit()
when the CLI is done with everything.NOTE: Issue also happens in (some, maybe all) MCP servers wrapped by shell scripts, which this also fixes.
Edit:
I also found some places that don't properly await Promises and are liable to cause hangs before process exit. I changed those. I also replaced some
for () { await ... }
withawait Promise.all(...)
to potentially speed up cleanups of unrelated resources (which hence can be performed concurrently).Improved INFO logs for ensureTitle - it's now awaited and logged so it can be more obvious from
--print-logs
that that is the reason why OC hangs before exiting.Closes #854.
Closes #1001.
Closes #1195.
Closes #1431.
Closes #1717.
Closes #1810.
Might fix:
opencode run
and TUI exits hang on v0.15+ #3213Possibly related, but I'm not very confident this is fixed by the PR: