fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors #3774

MasterPtato · 2026-01-07T20:28:44Z

No description provided.

MasterPtato · 2026-01-07T20:29:05Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors #3774 👈 (View in Graphite)
chore: remove method from key for consistency #3764
fix: fix dashboards, add pb metrics, fix listen_n order #3762 : 1 other dependent PR (#3763 )
main

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

… pending actors

claude · 2026-01-07T20:31:26Z

PR Review: Add TOKIO_WORKER_THREADS_SUBTRACT and Fix Loop Breaking

Summary

This PR makes two improvements:

Adds TOKIO_WORKER_THREADS_SUBTRACT env var to dynamically set Tokio worker threads based on available parallelism
Fixes loop breaking logic in allocate_pending_actors to properly exit the outer loop

Code Quality ✅

engine/packages/engine/src/main.rs

Good adherence to CLAUDE.md guidelines by avoiding glob imports from anyhow
Changed from use anyhow::* to use anyhow::Result ✓

engine/packages/pegboard/src/workflows/runner2.rs

Correctly uses labeled break statements (break 'queue_loop) to exit the outer loop
Previously, break statements on lines 715, 725, and 733 would only break the inner loop, not the outer 'queue_loop, which was likely a bug
This fix ensures that when runners run out or version mismatches occur, the entire queue processing stops as intended ✓

engine/packages/runtime/src/lib.rs

New TOKIO_WORKER_THREADS_SUBTRACT feature is well-implemented with proper safeguards
Uses saturating_sub() to prevent underflow
Ensures minimum of 1 thread with .max(1) ✓

Potential Issues & Suggestions

1. Error Handling - Missing Parse Error Handling ⚠️

engine/packages/runtime/src/lib.rs:132

The code uses .parse().unwrap() which will panic if the env var contains invalid input:

.saturating_sub(subtract_worker_threads.parse().unwrap())

Recommendation: Add error handling consistent with other env var parsing in the file, or at minimum add a descriptive expect message:

.saturating_sub(
    subtract_worker_threads
        .parse()
        .expect("TOKIO_WORKER_THREADS_SUBTRACT must be a valid integer")
)

Better yet, handle the error gracefully with a warning and fallback to default behavior.

2. Thread Count Edge Case 🤔

engine/packages/runtime/src/lib.rs:128-135

When TOKIO_WORKER_THREADS_SUBTRACT is set to a value >= available parallelism, the result will be 1 thread. This might be intentional but could cause performance issues if misconfigured.

Recommendation: Consider adding a warning log if the resulting thread count is significantly reduced:

let available = std::thread::available_parallelism()
    .map_or(1, std::num::NonZeroUsize::get);
let subtract = subtract_worker_threads.parse().unwrap();
let threads = available.saturating_sub(subtract).max(1);

if threads == 1 && available > 2 {
    tracing::warn!(
        available,
        subtract,
        "TOKIO_WORKER_THREADS_SUBTRACT resulted in only 1 worker thread"
    );
}

rt_builder.worker_threads(threads);

3. Missing Documentation 📝

The new TOKIO_WORKER_THREADS_SUBTRACT env var isn't documented. Consider:

Adding a comment explaining the use case (e.g., reserving CPU cores for non-Tokio workloads)
Documenting valid values and behavior

Performance Considerations ✅

The loop fix in allocate_pending_actors should improve performance by:

Avoiding unnecessary iteration when runners are exhausted
Properly respecting the early timeout check by breaking completely out of nested loops

The thread count calculation is efficient and should have negligible overhead.

Security Concerns ✅

No security issues identified. The changes are internal configuration and control flow improvements.

Test Coverage 🧪

Missing: No tests for the new TOKIO_WORKER_THREADS_SUBTRACT functionality.

Recommendation: Consider adding unit tests for:

Valid subtract values
Edge cases (subtract >= available parallelism)
Invalid input handling (once error handling is added)
Zero/negative values

For the loop fix, integration tests covering the allocate_pending_actors workflow would be valuable if they don't already exist.

Additional Notes

The labeled break fix suggests this was a subtle bug where the inner loop would break but processing would continue with the next queue entry. This could have caused:

Wasted work attempting allocations when no runners are available
Incorrect handling of version mismatches
Potential for exceeding the early timeout

Good catch on this issue! 👍

Overall Assessment: The changes are sound and fix real issues. Main concern is the missing error handling for the env var parsing. With that addressed, this PR is good to merge.

claude · 2026-01-07T20:31:46Z

PR Review

This PR introduces two improvements: a new environment variable for controlling Tokio worker thread counts and a fix for loop control flow in the actor allocation logic.

✅ Positive Points

Good import hygiene (engine/packages/engine/src/main.rs:1): The change from glob import to specific import follows the CLAUDE.md guideline: "Do not glob import (::*) from anyhow. Instead, import individual types and traits"
Clear intent (engine/packages/runtime/src/lib.rs:127-135): The new TOKIO_WORKER_THREADS_SUBTRACT feature provides a useful way to reserve CPU cores for non-Tokio work, which is helpful in multi-runtime environments
Correct loop control (engine/packages/pegboard/src/workflows/runner2.rs:715,725,733): The fix from break to break 'queue_loop ensures the outer loop terminates correctly when conditions are met

⚠️ Issues to Address

1. Missing error handling (`engine/packages/runtime/src/lib.rs:132`)

Severity: High

The .unwrap() call on the parsed environment variable will panic if the value is not a valid integer:

.saturating_sub(subtract_worker_threads.parse().unwrap())

Recommendation: Use proper error handling with either expect() with a clear message, or unwrap_or_else() with a warning and default value.

2. Potential logical issue (`engine/packages/runtime/src/lib.rs:133`)

Severity: Medium

The .max(1) ensures we never set worker threads to 0 (which would be invalid for Tokio), but the logic could be clearer with intermediate variables and debug logging.

3. Missing documentation

Severity: Low

The new environment variable should be documented somewhere to help users understand what TOKIO_WORKER_THREADS_SUBTRACT does and when to use it.

📋 Additional Observations

Loop labels are correct: The use of labeled breaks in allocate_pending_actors properly exits the outer loop
No test coverage visible: Consider adding tests for edge cases with the new env var

📊 Summary

Code Quality: Good overall structure with one critical issue
Bugs: Error handling missing for env var parsing
Performance: No concerns
Security: No concerns
Test Coverage: Not visible in this diff

🎯 Recommendation

Request changes - Please address the error handling issue before merging. The other suggestions are optional improvements.

claude · 2026-01-07T20:32:52Z

PR Review: Fix add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors

Summary

This PR makes two distinct improvements:

Adds a new TOKIO_WORKER_THREADS_SUBTRACT environment variable to configure Tokio worker threads relative to available CPU cores
Fixes loop breaking behavior in allocate_pending_actors to properly exit the queue loop

Code Quality & Best Practices

✅ Positive Observations

Proper anyhow import fix (engine/packages/engine/src/main.rs:1):
- Correctly follows CLAUDE.md guidance: "Do not glob import (::*) from anyhow. Instead, import individual types and traits"
- Changes use anyhow::*; to use anyhow::Result;
Loop label fix (engine/packages/pegboard/src/workflows/runner2.rs:714-733):
- The changes from break; to break 'queue_loop; are correct and important
- The inner loop previously broke only itself, not the outer 'queue_loop, which could cause incorrect behavior
- Now properly exits both loops when:
  - Stream is exhausted
  - Version is lower than highest version
  - Runner has no remaining millislots

⚠️ Issues & Concerns

1. Error Handling - Critical Issue (`engine/packages/runtime/src/lib.rs:128-135`)

Problem: The .parse().unwrap() call on line 132 will panic if TOKIO_WORKER_THREADS_SUBTRACT contains an invalid value.

.saturating_sub(subtract_worker_threads.parse().unwrap())

Impact: Production crashes if the environment variable is misconfigured.

Recommendation:

if let Ok(subtract_worker_threads) = env::var("TOKIO_WORKER_THREADS_SUBTRACT") {
    let subtract: usize = subtract_worker_threads
        .parse()
        .expect("TOKIO_WORKER_THREADS_SUBTRACT must be a valid number");
    rt_builder.worker_threads(
        std::thread::available_parallelism()
            .map_or(1, std::num::NonZeroUsize::get)
            .saturating_sub(subtract)
            .max(1),
    );
}

Using .expect() with a descriptive message is better than .unwrap() as it provides context when the panic occurs.

2. Potential Edge Case - Subtract Value Too Large

Scenario: If TOKIO_WORKER_THREADS_SUBTRACT is set to a value >= available cores, the calculation becomes:

available_parallelism() - subtract_value = 0 (due to saturating_sub)
max(0, 1) = 1

Question: Is 1 thread the desired fallback behavior, or should there be validation/logging when the subtract value is too large?

Recommendation: Consider adding a warning:

let available = std::thread::available_parallelism()
    .map_or(1, std::num::NonZeroUsize::get);
let subtract: usize = subtract_worker_threads
    .parse()
    .expect("TOKIO_WORKER_THREADS_SUBTRACT must be a valid number");

if subtract >= available {
    tracing::warn!(
        available,
        subtract,
        "TOKIO_WORKER_THREADS_SUBTRACT is >= available cores, using 1 thread"
    );
}

rt_builder.worker_threads(available.saturating_sub(subtract).max(1));

3. Missing TOKIO_WORKER_THREADS Handling

Observation: The diff shows TOKIO_WORKER_THREADS_SUBTRACT being added but the original TOKIO_WORKER_THREADS logic appears to be removed (based on the diff context).

Question: Was this intentional? Should both be supported, with TOKIO_WORKER_THREADS_SUBTRACT taking precedence if both are set?

Current behavior: If both env vars are set, only TOKIO_WORKER_THREADS_SUBTRACT will be used.

Recommendation: Either:

Document that TOKIO_WORKER_THREADS_SUBTRACT replaces TOKIO_WORKER_THREADS, OR
Support both with a precedence order and mutual exclusivity check

Performance Considerations

✅ Good Performance Decisions

Configurable Thread Count: The new env var allows tuning worker threads based on workload, which is useful for:
- Reserving cores for other processes
- Testing with reduced parallelism
- Working around Tokio scheduling issues
Correct Loop Breaking: The break 'queue_loop' fixes prevent unnecessary iterations through the outer queue loop when the inner runner allocation loop determines no more work can be done.

Security Concerns

⚠️ Low Priority

DoS via Environment Variables: Setting TOKIO_WORKER_THREADS_SUBTRACT=0 or a very large value could reduce system performance, but this is similar to existing TOKIO_WORKER_THREADS and requires environment access.

Test Coverage

❌ Missing Test Coverage

No tests were added for:

TOKIO_WORKER_THREADS_SUBTRACT parsing and calculation logic
Edge cases (subtract value too large, invalid values, etc.)
The loop breaking behavior fix in allocate_pending_actors

Recommendations:

Add unit tests for the worker thread calculation with various inputs:
- Normal case: subtract < available_parallelism
- Edge case: subtract >= available_parallelism
- Invalid: non-numeric strings (should panic with clear message)
Consider integration tests for allocate_pending_actors to verify:
- Proper early exit when version is too low
- Proper exit when runners have no millislots
- Timeout behavior

Additional Observations

Documentation

Missing: No comments or documentation for the new TOKIO_WORKER_THREADS_SUBTRACT functionality. Consider adding:

A comment explaining the calculation: // Sets worker threads to (available_cores - N), with minimum of 1
Environment variable documentation in a README or config guide

Code Style

✅ Follows repository conventions:

Hard tabs (as per rustfmt.toml)
Structured logging would apply if there were log statements (there aren't any added)
Imports follow the pattern of not glob-importing

Summary of Recommendations

Must Fix (Blocking):

Replace .unwrap() with .expect("descriptive message") on line 132

Should Consider:

Add warning when subtract value >= available cores
Clarify relationship with removed TOKIO_WORKER_THREADS env var
Add unit tests for the new worker thread calculation
Add documentation/comments for the new env var

Nice to Have:

Integration tests for the loop breaking behavior
Validation that prevents setting both TOKIO_WORKER_THREADS and TOKIO_WORKER_THREADS_SUBTRACT

Overall Assessment

The PR addresses two legitimate issues:

Proper anyhow import hygiene (follows CLAUDE.md)
Critical bug fix in loop breaking logic

However, the new TOKIO_WORKER_THREADS_SUBTRACT feature needs better error handling before merging to production. The .unwrap() is a potential crash point that should be addressed.

Rating: ⚠️ Needs minor fixes before merge (primarily the error handling)

graphite-app · 2026-01-07T20:57:01Z

Merge activity

Jan 7, 8:57 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
Jan 7, 8:57 PM UTC: CI is running for this pull request on a draft pull request (#3779) due to your merge queue CI optimization settings.
Jan 7, 8:58 PM UTC: Merged by the Graphite merge queue via draft PR: #3779.

… pending actors (#3774)

MasterPtato requested a review from NathanFlurry January 7, 2026 20:28

MasterPtato mentioned this pull request Jan 7, 2026

chore: remove method from key for consistency #3764

Closed

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating…

d1b9f5c

… pending actors

MasterPtato force-pushed the 01-07-fix_add_tokio_worker_threads_subtract_break_earlier_when_allocating_pending_actors branch from dce9507 to d1b9f5c Compare January 7, 2026 20:29

MasterPtato force-pushed the 01-06-chore_remove_method_from_key_for_consistently branch from 39a5303 to 33639ab Compare January 7, 2026 20:29

NathanFlurry approved these changes Jan 7, 2026

View reviewed changes

graphite-app bot pushed a commit that referenced this pull request Jan 7, 2026

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating…

3618c74

… pending actors (#3774)

graphite-app bot closed this Jan 7, 2026

graphite-app bot deleted the 01-07-fix_add_tokio_worker_threads_subtract_break_earlier_when_allocating_pending_actors branch January 7, 2026 20:58

claude bot mentioned this pull request Jan 7, 2026

[Graphite MQ] Draft PR GROUP:spec_750493 (PRs 3762, 3764, 3774) #3779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors #3774

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors #3774

Uh oh!

MasterPtato commented Jan 7, 2026

Uh oh!

MasterPtato commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

claude bot commented Jan 7, 2026

Uh oh!

graphite-app bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors #3774

fix: add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors #3774

Uh oh!

Conversation

MasterPtato commented Jan 7, 2026

Uh oh!

MasterPtato commented Jan 7, 2026

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Jan 7, 2026

PR Review: Add TOKIO_WORKER_THREADS_SUBTRACT and Fix Loop Breaking

Summary

Code Quality ✅

Potential Issues & Suggestions

1. Error Handling - Missing Parse Error Handling ⚠️

2. Thread Count Edge Case 🤔

3. Missing Documentation 📝

Performance Considerations ✅

Security Concerns ✅

Test Coverage 🧪

Additional Notes

Uh oh!

claude bot commented Jan 7, 2026

PR Review

✅ Positive Points

⚠️ Issues to Address

1. Missing error handling (engine/packages/runtime/src/lib.rs:132)

2. Potential logical issue (engine/packages/runtime/src/lib.rs:133)

3. Missing documentation

📋 Additional Observations

📊 Summary

🎯 Recommendation

Uh oh!

claude bot commented Jan 7, 2026

PR Review: Fix add TOKIO_WORKER_THREADS_SUBTRACT, break earlier when allocating pending actors

Summary

Code Quality & Best Practices

✅ Positive Observations

⚠️ Issues & Concerns

1. Error Handling - Critical Issue (engine/packages/runtime/src/lib.rs:128-135)

2. Potential Edge Case - Subtract Value Too Large

3. Missing TOKIO_WORKER_THREADS Handling

Performance Considerations

✅ Good Performance Decisions

Security Concerns

⚠️ Low Priority

Test Coverage

❌ Missing Test Coverage

Additional Observations

Documentation

Code Style

Summary of Recommendations

Overall Assessment

Uh oh!

graphite-app bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Missing error handling (`engine/packages/runtime/src/lib.rs:132`)

2. Potential logical issue (`engine/packages/runtime/src/lib.rs:133`)

1. Error Handling - Critical Issue (`engine/packages/runtime/src/lib.rs:128-135`)

graphite-app bot commented Jan 7, 2026 •

edited

Loading