Fix black screen when login(1) calls vhangup()#1529
Open
sdirkwinkel wants to merge 1 commit intotsl0922:mainfrom
Open
Fix black screen when login(1) calls vhangup()#1529sdirkwinkel wants to merge 1 commit intotsl0922:mainfrom
sdirkwinkel wants to merge 1 commit intotsl0922:mainfrom
Conversation
Symptom
-------
When ttyd's command is `login` (or any program that revokes its
controlling terminal during startup), the browser-side terminal
intermittently comes up as a black screen with the keyboard appearing
dead. The bug is timing-dependent: it reproduces reliably on hosts
where deeper CPU C-states are disabled, and disappears (or only happens
sporadically) when the kernel is allowed to enter C3/C6.
Root cause
----------
login(1) calls vhangup(2) to revoke the controlling terminal before
re-attaching to it via open()/TIOCSCTTY. vhangup() sends SIGHUP to the
foreground process group (login ignores it) and forces subsequent I/O
on the existing slave fd to fail. On the master side, this surfaces
as a transient `read() == -1, errno == EIO` for as long as no slave
fd is open. Once login reopens the slave by device path, the master
recovers and reads succeed again.
The window between vhangup() and the slave being reopened is short
(microseconds when the CPU is hot, much longer when waking from a deep
C-state). ttyd's libuv read loop only loses the race when it manages
to dispatch a read inside that window:
1. uv__read() in libuv's src/unix/stream.c calls read(), which
returns -1 / EIO.
2. libuv invokes our read_cb with `nread == UV_EIO` AND, critically,
clears `UV_HANDLE_READABLE | UV_HANDLE_WRITABLE` on the stream and
stops its io_watcher (stream.c, the !EAGAIN branch around the
`stream->flags &= ~(UV_HANDLE_READABLE | UV_HANDLE_WRITABLE)`
line).
3. Any subsequent uv_read_start() on the same uv_pipe_t is a no-op
- the flags are gone and the io_watcher will not be re-armed.
The handle is permanently poisoned even though the underlying
kernel master fd recovers a millisecond later.
Result: process->out is dead, no further output reaches the websocket,
the browser shows a black screen. process->in is a *separate*
uv_pipe_t over its own dup() of the master fd, so writes (keystrokes)
typically still work because users only type after login has finished
and the slave is back - the input side is rarely exercised inside the
EIO window. This is why the symptom looks like "no output, blind
input" rather than a fully dead session.
Three sub-bugs in the existing code conspired here:
1. process_read_cb (protocol.c) stored NULL in pss->pty_buf on a
transient EIO and asked LWS for a writable callback that produced
nothing, so even a working pipe would have been left paused.
2. pty_pause() / pty_resume() and read_cb() never updated
process->paused, so the `if (!paused) return` guards in those
functions made flow control depend on stale state. After a
uv_read_stop in read_cb, paused was still false, so subsequent
pty_resume() short-circuited.
3. Even with (1) and (2) fixed, the poisoned uv_pipe_t cannot be
re-armed - libuv has cleared the flags and a fresh handle is
required.
Fix
---
src/pty.c read_cb: when read fails with UV_EIO and the child is still
running, treat it as transient and schedule a reconnect of the read
side instead of propagating EOF upstream. Other read errors fall
through to the existing EOF path so we don't paper over real failures.
src/pty.c pty_reconnect_out: close the poisoned uv_pipe_t, dup() the
still-valid master fd into a fresh handle, and re-arm reading. Retry
every 20ms up to 20 times (400ms total), which is more than enough
for login to finish reopening the slave. On the last attempt the
upper layer is notified with EOF so the websocket closes cleanly.
src/pty.c pty_pause / pty_resume / read_cb: keep process->paused in
sync with the actual io_watcher state so pty_resume() will re-arm
reading as designed.
src/pty.h: add reconnect_timer + reconnect_attempts to pty_process.
The timer is allocated in pty_spawn and freed in process_free after
the wait thread is joined, so there is no UAF window. The reconnect
helper is file-static; there is no public API change.
Notes / known limitations
-------------------------
- Only process->out is reconnected. process->in is a separate
handle and was not observed to break in practice (writes happen
after the user starts typing, by which time login is done). If a
future workload triggers EIO on the write side, the symmetric fix
would be to attempt reconnect from pty_write() on EPIPE.
- process_running() uses uv_kill(pid, 0), which returns success for
zombies. In the narrow window between child exit and wait_cb
reaping it, we may schedule one or two pointless reconnects before
the next dup()+read() returns EOF and the session closes normally.
Fixes: tsl0922#768
Code was written and tested by claude opus 4.6
Standalone reproducer (compile with -luv, run as root):
/*
* test_ttyd_vhangup_fix.c - integration test for vhangup recovery.
* Forkpty's a child that writes output, calls vhangup(), reopens the
* slave PTY by device path, then writes more output. Parent uses the
* same one-shot read + reconnect pattern as the fix. Passes when
* AFTER output is received through the new pipe.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <pty.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <uv.h>
#include <signal.h>
#include <time.h>
#include <errno.h>
static uv_loop_t *loop;
static uv_pipe_t *pty_out;
static uv_pipe_t pty_in;
static uv_timer_t timeout_timer;
static uv_timer_t reconnect_timer;
static uv_idle_t writable_idle;
static int master_fd = -1;
static int child_pid = 0;
static bool paused = true;
static int reconnect_attempts = 0;
#define MAX_RECONNECT_ATTEMPTS 10
#define RECONNECT_BASE_DELAY_MS 100
static char *pending_buf = NULL;
static size_t pending_len = 0;
static bool pending_close = false;
static int total_reads = 0;
static int total_eios = 0;
static bool saw_before = false;
static bool saw_after = false;
static struct timespec start_time;
static double elapsed(void) {
struct timespec now;
clock_gettime(CLOCK_MONOTONIC, &now);
return (now.tv_sec - start_time.tv_sec) +
(now.tv_nsec - start_time.tv_nsec) / 1e9;
}
static bool process_running(void) {
return child_pid > 0 && kill(child_pid, 0) == 0;
}
static void alloc_cb(uv_handle_t *h, size_t s, uv_buf_t *b) {
(void)h; b->base = malloc(s); b->len = s;
}
static void close_cb(uv_handle_t *h) { free(h); }
static bool fd_duplicate(int fd, uv_pipe_t *pipe) {
int d = dup(fd);
if (d < 0) return false;
fcntl(d, F_SETFD, FD_CLOEXEC);
if (uv_pipe_open(pipe, d)) { close(d); return false; }
return true;
}
static void read_cb(uv_stream_t *stream, ssize_t n, const uv_buf_t *buf);
static void writable_cb(uv_idle_t *idle);
static void pty_resume(void) {
if (!paused || !pty_out) return;
paused = false;
pty_out->data = NULL;
if (uv_read_start((uv_stream_t *)pty_out, alloc_cb, read_cb))
paused = true;
}
static void reconnect_timer_cb(uv_timer_t *t) {
(void)t;
if (pty_out) {
uv_read_stop((uv_stream_t *)pty_out);
uv_close((uv_handle_t *)pty_out, close_cb);
pty_out = NULL;
}
if (!process_running()) return;
pty_out = malloc(sizeof(uv_pipe_t));
uv_pipe_init(loop, pty_out, 0);
if (!fd_duplicate(master_fd, pty_out)) {
uv_close((uv_handle_t *)pty_out, close_cb);
pty_out = NULL;
return;
}
paused = true;
pty_resume();
}
static void pty_reconnect_out(void) {
reconnect_attempts++;
if (reconnect_attempts > MAX_RECONNECT_ATTEMPTS) {
pending_close = true;
uv_idle_start(&writable_idle, writable_cb);
return;
}
uint64_t delay = RECONNECT_BASE_DELAY_MS *
(1u << (reconnect_attempts - 1));
uv_timer_start(&reconnect_timer, reconnect_timer_cb, delay, 0);
}
static void read_cb(uv_stream_t *stream, ssize_t n,
const uv_buf_t *buf) {
uv_read_stop(stream);
paused = true;
if (n <= 0) {
if (n == UV_ENOBUFS || n == 0) { free(buf->base); return; }
total_eios++;
free(buf->base);
if (process_running()) pty_reconnect_out();
else {
pending_close = true;
uv_idle_start(&writable_idle, writable_cb);
}
return;
}
total_reads++;
reconnect_attempts = 0;
pending_buf = malloc(n);
memcpy(pending_buf, buf->base, n);
pending_len = n;
free(buf->base);
uv_idle_start(&writable_idle, writable_cb);
}
static void writable_cb(uv_idle_t *idle) {
uv_idle_stop(idle);
if (pending_close) return;
if (pending_buf) {
if (strstr(pending_buf, "BEFORE")) saw_before = true;
if (strstr(pending_buf, "AFTER")) saw_after = true;
free(pending_buf);
pending_buf = NULL;
pending_len = 0;
pty_resume();
}
}
static void on_close_noop(uv_handle_t *h) { (void)h; }
static void timeout_cb(uv_timer_t *t) {
(void)t;
printf("saw_before=%d saw_after=%d reads=%d eios=%d\n",
saw_before, saw_after, total_reads, total_eios);
if (saw_before && saw_after)
printf("*** FIX VERIFIED ***\n");
else
printf("*** FIX FAILED ***\n");
if (child_pid > 0) kill(child_pid, SIGTERM);
if (pty_out) {
uv_read_stop((uv_stream_t *)pty_out);
uv_close((uv_handle_t *)pty_out, close_cb);
pty_out = NULL;
}
uv_close((uv_handle_t *)&pty_in, on_close_noop);
uv_close((uv_handle_t *)&timeout_timer, on_close_noop);
uv_close((uv_handle_t *)&reconnect_timer, on_close_noop);
uv_close((uv_handle_t *)&writable_idle, on_close_noop);
}
int main(void) {
if (geteuid() != 0) {
fprintf(stderr, "Must run as root\n");
return 1;
}
clock_gettime(CLOCK_MONOTONIC, &start_time);
struct winsize ws = { .ws_row = 24, .ws_col = 80 };
char slave_name[256];
child_pid = forkpty(&master_fd, slave_name, NULL, &ws);
if (child_pid < 0) { perror("forkpty"); return 1; }
if (child_pid == 0) {
signal(SIGHUP, SIG_IGN);
printf("BEFORE\n"); fflush(stdout); usleep(200000);
close(0); close(1); close(2);
vhangup();
signal(SIGHUP, SIG_DFL);
int fd = open(slave_name, O_RDWR);
if (fd < 0) _exit(99);
dup2(fd, 0); dup2(fd, 1); dup2(fd, 2);
if (fd > 2) close(fd);
ioctl(0, TIOCSCTTY, 1);
usleep(500000);
printf("AFTER\n"); fflush(stdout);
sleep(3);
_exit(0);
}
fcntl(master_fd, F_SETFL,
fcntl(master_fd, F_GETFL) | O_NONBLOCK);
fcntl(master_fd, F_SETFD, FD_CLOEXEC);
loop = uv_default_loop();
uv_pipe_init(loop, &pty_in, 0);
pty_out = malloc(sizeof(uv_pipe_t));
uv_pipe_init(loop, pty_out, 0);
int fd_in = dup(master_fd);
fcntl(fd_in, F_SETFD, FD_CLOEXEC);
uv_pipe_open(&pty_in, fd_in);
if (!fd_duplicate(master_fd, pty_out)) return 1;
uv_idle_init(loop, &writable_idle);
uv_timer_init(loop, &reconnect_timer);
pty_resume();
uv_timer_init(loop, &timeout_timer);
uv_timer_start(&timeout_timer, timeout_cb, 8000, 0);
uv_run(loop, UV_RUN_DEFAULT);
int status;
waitpid(child_pid, &status, WNOHANG);
uv_loop_close(loop);
return (saw_before && saw_after) ? 0 : 1;
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related to #768
This was mostly investigated and written by claude, I only tested it and did minor cleanups.
Feel free to close / do this differently, I'm not an expert in ttyd or libuv
The C-State behaviour was on my machines with a rt kernel, It seems from the issue this is also more common without
limiting C-States.
Symptom
When ttyd's command is
login(or any program that revokes its controlling terminal during startup), the browser-side terminal intermittently comes up as a black screen with the keyboard appearing dead. The bug is timing-dependent: it reproduces reliably on hosts where deeper CPU C-states are disabled, and disappears (or only happens sporadically) when the kernel is allowed to enter C3/C6.Root cause
login(1) calls vhangup(2) to revoke the controlling terminal before re-attaching to it via open()/TIOCSCTTY. vhangup() sends SIGHUP to the foreground process group (login ignores it) and forces subsequent I/O on the existing slave fd to fail. On the master side, this surfaces as a transient
read() == -1, errno == EIOfor as long as no slave fd is open. Once login reopens the slave by device path, the master recovers and reads succeed again.The window between vhangup() and the slave being reopened is short (microseconds when the CPU is hot, much longer when waking from a deep C-state). ttyd's libuv read loop only loses the race when it manages to dispatch a read inside that window:
nread == UV_EIOAND, critically, clearsUV_HANDLE_READABLE | UV_HANDLE_WRITABLEon the stream and stops its io_watcher (stream.c, the !EAGAIN branch around thestream->flags &= ~(UV_HANDLE_READABLE | UV_HANDLE_WRITABLE)line).Result: process->out is dead, no further output reaches the websocket, the browser shows a black screen. process->in is a separate uv_pipe_t over its own dup() of the master fd, so writes (keystrokes) typically still work because users only type after login has finished and the slave is back - the input side is rarely exercised inside the EIO window. This is why the symptom looks like "no output, blind input" rather than a fully dead session.
Three sub-bugs in the existing code conspired here:
if (!paused) returnguards in those functions made flow control depend on stale state. After a uv_read_stop in read_cb, paused was still false, so subsequent pty_resume() short-circuited.Fix
src/pty.c read_cb: when read fails with UV_EIO and the child is still running, treat it as transient and schedule a reconnect of the read side instead of propagating EOF upstream. Other read errors fall through to the existing EOF path so we don't paper over real failures.
src/pty.c pty_reconnect_out: close the poisoned uv_pipe_t, dup() the still-valid master fd into a fresh handle, and re-arm reading. Retry every 20ms up to 20 times (400ms total), which is more than enough for login to finish reopening the slave. On the last attempt the upper layer is notified with EOF so the websocket closes cleanly.
src/pty.c pty_pause / pty_resume / read_cb: keep process->paused in sync with the actual io_watcher state so pty_resume() will re-arm reading as designed.
src/pty.h: add reconnect_timer + reconnect_attempts to pty_process. The timer is allocated in pty_spawn and freed in process_free after the wait thread is joined, so there is no UAF window. The reconnect helper is file-static; there is no public API change.
Notes / known limitations
Fixes: #768
Code was written and tested by claude opus 4.6
Standalone reproducer (compile with -luv, run as root):
/*
static uv_loop_t *loop;
static uv_pipe_t *pty_out;
static uv_pipe_t pty_in;
static uv_timer_t timeout_timer;
static uv_timer_t reconnect_timer;
static uv_idle_t writable_idle;
static int master_fd = -1;
static int child_pid = 0;
static bool paused = true;
static int reconnect_attempts = 0;
#define MAX_RECONNECT_ATTEMPTS 10
#define RECONNECT_BASE_DELAY_MS 100
static char *pending_buf = NULL;
static size_t pending_len = 0;
static bool pending_close = false;
static int total_reads = 0;
static int total_eios = 0;
static bool saw_before = false;
static bool saw_after = false;
static struct timespec start_time;
static double elapsed(void) {
struct timespec now;
clock_gettime(CLOCK_MONOTONIC, &now);
return (now.tv_sec - start_time.tv_sec) +
(now.tv_nsec - start_time.tv_nsec) / 1e9;
}
static bool process_running(void) {
return child_pid > 0 && kill(child_pid, 0) == 0;
}
static void alloc_cb(uv_handle_t *h, size_t s, uv_buf_t *b) {
(void)h; b->base = malloc(s); b->len = s;
}
static void close_cb(uv_handle_t *h) { free(h); }
static bool fd_duplicate(int fd, uv_pipe_t *pipe) {
int d = dup(fd);
if (d < 0) return false;
fcntl(d, F_SETFD, FD_CLOEXEC);
if (uv_pipe_open(pipe, d)) { close(d); return false; }
return true;
}
static void read_cb(uv_stream_t *stream, ssize_t n, const uv_buf_t *buf);
static void writable_cb(uv_idle_t *idle);
static void pty_resume(void) {
if (!paused || !pty_out) return;
paused = false;
pty_out->data = NULL;
if (uv_read_start((uv_stream_t *)pty_out, alloc_cb, read_cb))
paused = true;
}
static void reconnect_timer_cb(uv_timer_t *t) {
(void)t;
if (pty_out) {
uv_read_stop((uv_stream_t *)pty_out);
uv_close((uv_handle_t *)pty_out, close_cb);
pty_out = NULL;
}
if (!process_running()) return;
pty_out = malloc(sizeof(uv_pipe_t));
uv_pipe_init(loop, pty_out, 0);
if (!fd_duplicate(master_fd, pty_out)) {
uv_close((uv_handle_t *)pty_out, close_cb);
pty_out = NULL;
return;
}
paused = true;
pty_resume();
}
static void pty_reconnect_out(void) {
reconnect_attempts++;
if (reconnect_attempts > MAX_RECONNECT_ATTEMPTS) {
pending_close = true;
uv_idle_start(&writable_idle, writable_cb);
return;
}
uint64_t delay = RECONNECT_BASE_DELAY_MS *
(1u << (reconnect_attempts - 1));
uv_timer_start(&reconnect_timer, reconnect_timer_cb, delay, 0);
}
static void read_cb(uv_stream_t *stream, ssize_t n,
const uv_buf_t *buf) {
uv_read_stop(stream);
paused = true;
if (n <= 0) {
if (n == UV_ENOBUFS || n == 0) { free(buf->base); return; }
total_eios++;
free(buf->base);
if (process_running()) pty_reconnect_out();
else {
pending_close = true;
uv_idle_start(&writable_idle, writable_cb);
}
return;
}
total_reads++;
reconnect_attempts = 0;
pending_buf = malloc(n);
memcpy(pending_buf, buf->base, n);
pending_len = n;
free(buf->base);
uv_idle_start(&writable_idle, writable_cb);
}
static void writable_cb(uv_idle_t idle) {
uv_idle_stop(idle);
if (pending_close) return;
if (pending_buf) {
if (strstr(pending_buf, "BEFORE")) saw_before = true;
if (strstr(pending_buf, "AFTER")) saw_after = true;
free(pending_buf);
pending_buf = NULL;
pending_len = 0;
pty_resume();
}
}
static void on_close_noop(uv_handle_t h) { (void)h; }
static void timeout_cb(uv_timer_t t) {
(void)t;
printf("saw_before=%d saw_after=%d reads=%d eios=%d\n",
saw_before, saw_after, total_reads, total_eios);
if (saw_before && saw_after)
printf(" FIX VERIFIED \n");
else
printf(" FIX FAILED ***\n");
if (child_pid > 0) kill(child_pid, SIGTERM);
if (pty_out) {
uv_read_stop((uv_stream_t *)pty_out);
uv_close((uv_handle_t *)pty_out, close_cb);
pty_out = NULL;
}
uv_close((uv_handle_t *)&pty_in, on_close_noop);
uv_close((uv_handle_t *)&timeout_timer, on_close_noop);
uv_close((uv_handle_t *)&reconnect_timer, on_close_noop);
uv_close((uv_handle_t *)&writable_idle, on_close_noop);
}
int main(void) {
if (geteuid() != 0) {
fprintf(stderr, "Must run as root\n");
return 1;
}
clock_gettime(CLOCK_MONOTONIC, &start_time);
struct winsize ws = { .ws_row = 24, .ws_col = 80 };
char slave_name[256];
child_pid = forkpty(&master_fd, slave_name, NULL, &ws);
if (child_pid < 0) { perror("forkpty"); return 1; }
if (child_pid == 0) {
signal(SIGHUP, SIG_IGN);
printf("BEFORE\n"); fflush(stdout); usleep(200000);
close(0); close(1); close(2);
vhangup();
signal(SIGHUP, SIG_DFL);
int fd = open(slave_name, O_RDWR);
if (fd < 0) _exit(99);
dup2(fd, 0); dup2(fd, 1); dup2(fd, 2);
if (fd > 2) close(fd);
ioctl(0, TIOCSCTTY, 1);
usleep(500000);
printf("AFTER\n"); fflush(stdout);
sleep(3);
_exit(0);
}
fcntl(master_fd, F_SETFL,
fcntl(master_fd, F_GETFL) | O_NONBLOCK);
fcntl(master_fd, F_SETFD, FD_CLOEXEC);
loop = uv_default_loop();
uv_pipe_init(loop, &pty_in, 0);
pty_out = malloc(sizeof(uv_pipe_t));
uv_pipe_init(loop, pty_out, 0);
int fd_in = dup(master_fd);
fcntl(fd_in, F_SETFD, FD_CLOEXEC);
uv_pipe_open(&pty_in, fd_in);
if (!fd_duplicate(master_fd, pty_out)) return 1;
uv_idle_init(loop, &writable_idle);
uv_timer_init(loop, &reconnect_timer);
pty_resume();
uv_timer_init(loop, &timeout_timer);
uv_timer_start(&timeout_timer, timeout_cb, 8000, 0);
uv_run(loop, UV_RUN_DEFAULT);
int status;
waitpid(child_pid, &status, WNOHANG);
uv_loop_close(loop);
return (saw_before && saw_after) ? 0 : 1;
}