Skip to content

Fix/windows process tree and timeout fix#1635

Merged
yottahmd merged 5 commits intodagu-org:mainfrom
prods:fix/windows-process-tree-and-timeout-fix
Feb 6, 2026
Merged

Fix/windows process tree and timeout fix#1635
yottahmd merged 5 commits intodagu-org:mainfrom
prods:fix/windows-process-tree-and-timeout-fix

Conversation

@prods
Copy link
Contributor

@prods prods commented Feb 6, 2026

Fix: Enforce timeoutSec on Windows and prevent process hangs

Problem Description

When timeoutSec is set on a DAG step, processes that exceed the timeout should be forcibly terminated. However, on Windows, some processes would continue running for hours even when timeoutSec was set (e.g., to 900 seconds).

Root Cause

The issue had two components:

  1. Blocking cmd.Wait(): The commandExecutor.Run() method blocked indefinitely on cmd.Wait() without listening to context cancellation. When the timeout expired, the process would not be killed because the code was stuck waiting for Wait() to return.

  2. Incomplete process termination on Windows: The Windows KillProcessGroup() function only killed the immediate process, leaving child processes (subprocess tree) orphaned and running.

Changes Made

1. internal/runtime/builtin/command/command.go

Modified the Run() method to:

  • Run cmd.Wait() in a goroutine with a result channel
  • Use select to watch for both context cancellation and command completion
  • Forcibly kill the process when context is cancelled (timeout)
  • Return exit code 124 (standard timeout exit code) on timeout

2. internal/cmn/cmdutil/cmd_windows.go

Updated KillProcessGroup() to:

  • Use killProcessTree() instead of just cmd.Process.Kill()
  • Ensure child processes are also terminated on Windows
  • Match the behavior used in KillMultipleProcessGroups() and Unix implementation

Impact

  • timeoutSec now works reliably on all platforms, especially Windows
  • Processes that hang due to I/O blocking or zombie states are properly terminated
  • Child processes on Windows are no longer orphaned when the parent is killed
  • Exit code 124 is returned for timeout scenarios (consistent with Unix timeout behavior)

Files Changed

  1. internal/runtime/builtin/command/command.go - Core timeout enforcement fix
  2. internal/cmn/cmdutil/cmd_windows.go - Windows process tree termination fix

Summary by CodeRabbit

Bug Fixes

  • Improved Windows subprocess cleanup to properly terminate the complete process tree instead of only the immediate target process, ensuring full resource cleanup.
  • Enhanced command execution with context-aware timeout handling to prevent hung processes, explicit termination when context is cancelled, and improved error reporting that includes recent stderr information for better diagnostics.

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

Two changes improve command execution and process termination. Windows process termination now kills entire subprocess trees instead of single processes. Command execution adds context-aware timeout handling with asynchronous waiting, error context inclusion, and automatic process termination on cancellation.

Changes

Cohort / File(s) Summary
Windows Process Termination
internal/cmn/cmdutil/cmd_windows.go
Modified KillProcessGroup to terminate entire subprocess tree via killProcessTree() instead of only the target process.
Command Execution Timeout Handling
internal/runtime/builtin/command/command.go
Refactored Run() method to use context-aware asynchronous waiting with goroutine and select statement. Now handles context cancellation with process termination (exit code 124), appends stderr context to errors, and supports timeout-driven termination.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix/windows process tree and timeout fix' addresses both main changes: Windows process tree termination and timeout handling. It accurately reflects the core issues and solutions in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

@yottahmd yottahmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you so much for the improvement!

@yottahmd yottahmd merged commit a83d1ea into dagu-org:main Feb 6, 2026
5 checks passed
@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.72%. Comparing base (6cf71a3) to head (dca10c1).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1635      +/-   ##
==========================================
+ Coverage   69.67%   69.72%   +0.05%     
==========================================
  Files         327      327              
  Lines       37071    37082      +11     
==========================================
+ Hits        25829    25857      +28     
+ Misses       9181     9171      -10     
+ Partials     2061     2054       -7     
Files with missing lines Coverage Δ
internal/runtime/builtin/command/command.go 92.89% <100.00%> (+0.42%) ⬆️

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 082324b...dca10c1. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants