Skip to content

Conversation

@DarkSharpness
Copy link
Collaborator

Previously, each function call of MultiThreadCompileGrammar created its own thread pool. This could be misleading, as the total number of active worker threads might significantly exceed the configured max_threads — potentially reaching up to $n \times \text{max-threads}$ for $n$ concurrent compilation tasks.

This PR changes the implementation to use a shared global thread pool across all compilation tasks in one compiler, ensuring that the number of worker threads stays within the specified limit. For different grammar compilers, they still have their own thread pool.

Note: This change may introduce performance regressions in scenarios where the old behavior implicitly allowed over-subscription of threads, as thread usage is now strictly bounded.

Copilot AI review requested due to automatic review settings November 7, 2025 08:04
@DarkSharpness
Copy link
Collaborator Author

Updated cc @Ubospica @Seven-Streams . The rate limit policy should be refined later to achieve a balance between fairness(FIFO) and shortest-first (greedy-execution). FIFO may cause head-of-line blocking, while shortest-first may lead to starvation (worse average latency), deteriorating grammars that needs longer compilation (worse tail latency).

The old implementation creates a new thread pool per-compilation, which is closer to the latter I guess.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant