Skip to content

Move task execution into a daemon#263

Open
arcanis wants to merge 10 commits intomainfrom
mael/daemon-tasks
Open

Move task execution into a daemon#263
arcanis wants to merge 10 commits intomainfrom
mael/daemon-tasks

Conversation

@arcanis
Copy link
Member

@arcanis arcanis commented Mar 4, 2026

This PR adds support for long-running tasks, currently annotated with a @long-running attribute.

To support that, task execution has been moved inside a daemon process managed by Yarn Switch. The core logic still lives inside Yarn (not Yarn Switch), with Yarn Switch being merely responsible to keep records about which daemons are in use in which projects.

The daemons are currently accessible through unauthenticated websockets listening on localhost. It's slightly insecure in a multi-user context, auth should be implemented in a follow-up.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Confidence Score: 1/5

  • Not safe to merge — two Windows build-breaking issues and a daemon-kill orphan bug need to be fixed first.
  • The PR introduces substantial new functionality and the overall daemon architecture is sound, but there are two concrete compilation errors (unconditional Unix-only import in coordinator.rs, missing winapi dependency for Windows cfg blocks in daemons.rs) that will prevent builds on Windows entirely. Additionally, there is a behavioral bug where killing the daemon leaves spawned task children running as orphans, which directly contradicts the intent of the switch daemon --kill command for cleanup. There is also unbounded memory growth in the output buffer for long-running daemons. The security concern (unauthenticated WebSocket) is acknowledged and intentionally deferred, which is acceptable for an initial implementation, but the build blockers and orphan-process bug are showstoppers.
  • packages/zpm/src/daemon/coordinator.rs (Unix-only import), packages/zpm-switch/src/daemons.rs (missing winapi dependency), packages/zpm-switch/src/commands/switch/daemon_kill.rs (orphaned task children on kill), packages/zpm/src/daemon/client.rs (SIGKILL on process group), taskfile (debug entries)

Important Files Changed

Filename Overview
packages/zpm/src/daemon/coordinator.rs New coordinator file that manages daemon execution. Contains two critical issues: (1) unconditional import of std::os::unix::fs::MetadataExt at line 3 will fail to compile on Windows, and (2) unbounded memory growth in output_buffer HashMap which accumulates one entry per task ID for the lifetime of the daemon without ever removing entries.
packages/zpm-switch/src/daemons.rs Daemon registry with process lifecycle management. Windows code (lines 100-153) references winapi crate types but winapi is not declared as a dependency in Cargo.toml, causing build failure on Windows platforms.
packages/zpm/src/daemon/client.rs Daemon client with WebSocket-based IPC. StandaloneDaemonHandle::kill() (lines 41-49) uses kill -9 -{pid} which sends SIGKILL to the entire process group, preventing graceful cleanup of the daemon and its task children.
packages/zpm-switch/src/commands/switch/daemon_kill.rs Daemon kill command. Calls daemons::kill_process which sends SIGTERM only to the daemon process itself. Task subprocesses spawned by the daemon are not signaled and continue running as orphans, defeating the purpose of the kill command.
packages/zpm/src/commands/tasks/push.rs Task push command reworked to use daemon WebSocket client. Contains a confusing 0 << 8 expression at line 66 in the success exit code path that should be simplified to 0.
taskfile Root project taskfile. Contains several debug/test tasks (bar, bar2, x, producer, foo) that appear to be development scratch entries added for testing the new daemon functionality. They don't serve any project-level purpose and should be removed before merging or moved to test fixtures if needed by acceptance tests.

Last reviewed commit: 39ef683

@@ -0,0 +1,344 @@
use std::collections::{HashMap, HashSet};
use std::io::Write;
use std::os::unix::fs::MetadataExt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unix-only import used unconditionally

std::os::unix::fs::MetadataExt is a Unix-only trait imported unconditionally. This will cause a compile error on Windows. The import and all code that calls .ino() (lines ~45–47) must be gated behind #[cfg(unix)], with the inode-watching block either disabled or replaced with a no-op on non-Unix platforms.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/zpm/src/daemon/coordinator.rs
Line: 3

Comment:
Unix-only import used unconditionally

`std::os::unix::fs::MetadataExt` is a Unix-only trait imported unconditionally. This will cause a compile error on Windows. The import and all code that calls `.ino()` (lines ~45–47) must be gated behind `#[cfg(unix)]`, with the inode-watching block either disabled or replaced with a no-op on non-Unix platforms.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +58 to +107
let output_buffer: OutputBuffer
= Arc::new(RwLock::new(HashMap::new()));

let subscription_registry
= Arc::new(SubscriptionRegistry::new());

let long_lived_registry
= Arc::new(LongLivedRegistry::new());

let scheduler_for_loop
= scheduler.clone();

let (loop_event_tx, mut loop_event_rx)
= mpsc::unbounded_channel::<ExecutorEvent>();

let subscription_registry_for_loop
= subscription_registry.clone();

let subscription_registry_for_events
= subscription_registry.clone();

let output_buffer_for_events
= output_buffer.clone();

let long_lived_registry_for_events
= long_lived_registry.clone();

let scheduler_for_events
= scheduler.clone();

tokio::spawn(async move {
while let Some(event) = loop_event_rx.recv().await {
if let ExecutorEvent::Output { task_id, line, stream } = &event {
if let Ok(mut buffer) = output_buffer_for_events.write() {
let lines: &mut Vec<BufferedOutputLine>
= buffer
.entry(task_id.to_string())
.or_insert_with(Vec::new);

lines.push(BufferedOutputLine {
line: line.to_string(),
stream: stream.as_str().to_string(),
});

if lines.len() > OUTPUT_BUFFER_MAX_LINES {
let excess
= lines.len() - OUTPUT_BUFFER_MAX_LINES;

lines.drain(0..excess);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded memory growth in output buffer

The output_buffer HashMap (created at line 58–59) accumulates an entry for every task ID that ever runs in this daemon session. While the per-task line count is capped at OUTPUT_BUFFER_MAX_LINES (1000 lines), the number of task entries in the HashMap is never pruned. For a long-running daemon that processes thousands of short-lived tasks, this will steadily grow the resident memory of the daemon process. Entries for completed tasks (particularly non-long-lived ones whose output has already been retrieved by the client) should be removed once they are no longer needed.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/zpm/src/daemon/coordinator.rs
Line: 58-107

Comment:
Unbounded memory growth in output buffer

The `output_buffer` HashMap (created at line 58–59) accumulates an entry for every task ID that ever runs in this daemon session. While the per-task line count is capped at `OUTPUT_BUFFER_MAX_LINES` (1000 lines), the number of task entries in the HashMap is never pruned. For a long-running daemon that processes thousands of short-lived tasks, this will steadily grow the resident memory of the daemon process. Entries for completed tasks (particularly non-long-lived ones whose output has already been retrieved by the client) should be removed once they are no longer needed.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +41 to +49
pub fn kill(&self) {
#[cfg(unix)]
{
let _ = std::process::Command::new("kill")
.arg("-9")
.arg(format!("-{}", self.pid))
.status();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGKILL on entire process group prevents graceful cleanup

kill -9 -{pid} sends SIGKILL to every process in the daemon's process group. Because SIGKILL cannot be caught or ignored, neither the daemon nor any of its running task children will have a chance to flush buffers, clean up temporary files, or release resources. In addition, if any task subprocess moves itself to a different process group, it will survive this kill.

For the standalone case it may be acceptable to be forceful, but using SIGTERM first (with a timeout and SIGKILL as a fallback) would be safer and more consistent with the SIGTERM used elsewhere in the codebase.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/zpm/src/daemon/client.rs
Line: 41-49

Comment:
SIGKILL on entire process group prevents graceful cleanup

`kill -9 -{pid}` sends SIGKILL to every process in the daemon's process group. Because SIGKILL cannot be caught or ignored, neither the daemon nor any of its running task children will have a chance to flush buffers, clean up temporary files, or release resources. In addition, if any task subprocess moves itself to a different process group, it will survive this kill.

For the standalone case it may be acceptable to be forceful, but using SIGTERM first (with a timeout and SIGKILL as a fallback) would be safer and more consistent with the SIGTERM used elsewhere in the codebase.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +100 to +153
#[cfg(windows)]
{
use std::ptr::null_mut;
unsafe {
let handle = winapi::um::processthreadsapi::OpenProcess(
winapi::um::winnt::PROCESS_QUERY_LIMITED_INFORMATION,
0,
pid,
);
if handle.is_null() {
false
} else {
winapi::um::handleapi::CloseHandle(handle);
true
}
}
}

#[cfg(not(any(unix, windows)))]
{
true
}
}

pub fn kill_process(pid: u32) -> bool {
#[cfg(unix)]
{
unsafe { libc::kill(pid as i32, libc::SIGTERM) == 0 }
}

#[cfg(windows)]
{
use std::ptr::null_mut;
unsafe {
let handle = winapi::um::processthreadsapi::OpenProcess(
winapi::um::winnt::PROCESS_TERMINATE,
0,
pid,
);
if handle.is_null() {
false
} else {
let result = winapi::um::processthreadsapi::TerminateProcess(handle, 1) != 0;
winapi::um::handleapi::CloseHandle(handle);
result
}
}
}

#[cfg(not(any(unix, windows)))]
{
false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

winapi crate referenced but not declared as a dependency

Both is_process_alive and kill_process have #[cfg(windows)] branches that reference winapi::um::processthreadsapi, winapi::um::winnt, and winapi::um::handleapi. However, winapi does not appear in the Cargo.toml for zpm-switch (nor in the workspace Cargo.toml changes in this PR). This will produce a compile error on Windows. You need to add:

[target.'cfg(windows)'.dependencies]
winapi = { version = "0.3", features = ["processthreadsapi", "winnt", "handleapi"] }
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/zpm-switch/src/daemons.rs
Line: 100-153

Comment:
`winapi` crate referenced but not declared as a dependency

Both `is_process_alive` and `kill_process` have `#[cfg(windows)]` branches that reference `winapi::um::processthreadsapi`, `winapi::um::winnt`, and `winapi::um::handleapi`. However, `winapi` does not appear in the `Cargo.toml` for `zpm-switch` (nor in the workspace `Cargo.toml` changes in this PR). This will produce a compile error on Windows. You need to add:

```toml
[target.'cfg(windows)'.dependencies]
winapi = { version = "0.3", features = ["processthreadsapi", "winnt", "handleapi"] }
```

How can I resolve this? If you propose a fix, please make it concise.

client.push_tasks(task_subscriptions, parent_task_id, None, None).await?;

Ok(ExitStatus::from_raw(0))
Ok(ExitStatus::from_raw(0 << 8))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 << 8 is always 0 — confusing no-op

ExitStatus::from_raw(0 << 8) evaluates identically to ExitStatus::from_raw(0). The << 8 shift pattern is used elsewhere to encode non-zero exit codes in the wait-status format, but shifting zero is a no-op and reads as if the author forgot to put a real value in.

Suggested change
Ok(ExitStatus::from_raw(0 << 8))
Ok(ExitStatus::from_raw(0))
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/zpm/src/commands/tasks/push.rs
Line: 66

Comment:
`0 << 8` is always `0` — confusing no-op

`ExitStatus::from_raw(0 << 8)` evaluates identically to `ExitStatus::from_raw(0)`. The `<< 8` shift pattern is used elsewhere to encode non-zero exit codes in the wait-status format, but shifting zero is a no-op and reads as if the author forgot to put a real value in.

```suggestion
        Ok(ExitStatus::from_raw(0))
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +22
bar:
sleep 5

bar2:
sleep 10

x:
python3 -c "import time; print(f'ts:{int(time.time()*1000)}:line1')"
sleep 1
python3 -c "import time; print(f'ts:{int(time.time()*1000)}:line2')"
sleep 1
python3 -c "import time; print(f'ts:{int(time.time()*1000)}:line3')"

producer:
for x in {1..10}; do
echo "producer: $x"
sleep 1
done

foo: bar& bar2&
echo "foo"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug/test tasks left in the repository root taskfile

The tasks bar, bar2, x, producer, and foo appear to be development scratch entries added to test the new daemon functionality. They don't appear to serve any project-level purpose and should be removed before merging, or moved to a test fixture if they are needed for acceptance tests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: taskfile
Line: 1-22

Comment:
Debug/test tasks left in the repository root `taskfile`

The tasks `bar`, `bar2`, `x`, `producer`, and `foo` appear to be development scratch entries added to test the new daemon functionality. They don't appear to serve any project-level purpose and should be removed before merging, or moved to a test fixture if they are needed for acceptance tests.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +21 to +54
pub async fn execute(&self) -> Result<(), Error> {
let project_cwd = get_final_cwd()?;

let find_result = find_closest_package_manager(&project_cwd)?;

let detected_root = find_result
.detected_root_path
.ok_or(Error::NoProjectFound)?;

let Some(daemon) = daemons::get_daemon(&detected_root)? else {
println!(
"{} No daemon registered for this project",
DataType::Info.colorize("ℹ")
);
return Ok(());
};

if !daemons::is_process_alive(daemon.pid) {
daemons::unregister_daemon(&detected_root)?;
println!(
"{} Daemon was not running (cleaned up stale entry)",
DataType::Info.colorize("ℹ")
);
return Ok(());
}

if daemons::kill_process(daemon.pid) {
daemons::unregister_daemon(&detected_root)?;
println!(
"{} Stopped daemon for {} (PID: {})",
DataType::Success.colorize("✓"),
detected_root.to_print_string(),
daemon.pid
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Killing the daemon does not terminate its running task children

daemons::kill_process sends SIGTERM only to the daemon process itself (the yarn debug daemon binary). All task subprocesses that the daemon has spawned are in the same session but may be in their own process groups. When the daemon receives SIGTERM it will exit — but because nothing in the daemon's signal handling path terminates the child processes, those tasks continue running as orphans.

This means switch daemon --kill can leave long-running tasks (e.g. @long-lived dev servers) silently running in the background after the user believes they have been stopped. The daemon should either propagate the signal to its children on shutdown, or the kill command should enumerate and terminate task children before sending SIGTERM to the daemon.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/zpm-switch/src/commands/switch/daemon_kill.rs
Line: 21-54

Comment:
Killing the daemon does not terminate its running task children

`daemons::kill_process` sends `SIGTERM` only to the daemon process itself (the `yarn debug daemon` binary). All task subprocesses that the daemon has spawned are in the same session but may be in their own process groups. When the daemon receives SIGTERM it will exit — but because nothing in the daemon's signal handling path terminates the child processes, those tasks continue running as orphans.

This means `switch daemon --kill` can leave long-running tasks (e.g. `@long-lived` dev servers) silently running in the background after the user believes they have been stopped. The daemon should either propagate the signal to its children on shutdown, or the kill command should enumerate and terminate task children before sending SIGTERM to the daemon.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant