-
Notifications
You must be signed in to change notification settings - Fork 11
feat: enable async token counting with piscina workers #465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
93e0fbf to
1c51477
Compare
Port tokenizer logic into a Piscina worker pool so token counting no longer blocks the main thread and exposes async APIs across IPC. Render live token estimates in chat input with Suspense fallback and tighten defensive assertions while adapting tests to the new async flow.
Detect Bun runtime and bypass Piscina so token counting works in tests. Assert encoding responses and reuse worker logic when pooling is off. Disable Git signing in tests and align fixtures with current adapters. Change-Id: I12fc5053de8f2fb906a99e14b97689a6f74d0d7f Signed-off-by: Thomas Kosiewski <[email protected]>
Introduce bounded tokenizer fallback for mock streams to avoid hangs. Add assertions and debug logging to surface invalid tokenizer results. Switch mock scenarios to openai:gpt-5 to align with tokenizer choice. Expose PLAYWRIGHT_ARGS in test-e2e make target to ease overrides.
Standardize test runner to jest across CI and local development: - Update CI workflow to use jest for coverage reporting - Update Makefile test targets to use jest - Update tokenizer worker to use node:assert (jest-compatible) - Remove async/await from tokenizer.loadTokenizerModules (Promise.allSettled already returns Promise) - Change test model to gpt-5 for future-proofing This ensures consistent test execution between local and CI environments and prepares for better coverage reporting.
Removes the piscina dependency in favor of a custom worker pool implementation using Node.js built-in worker_threads. This reduces external dependencies while maintaining the same async tokenization functionality. Changes: - Added workerPool.ts to manage worker thread lifecycle and message passing - Updated tokenizer.worker.ts to handle parentPort messages directly - Modified tokenizer.ts to use new run() function instead of Piscina - Migrated unit tests from Jest to bun test for consistency - Updated CI and Makefile to use bun test for coverage The custom pool creates a single persistent worker at module load time and handles request/response matching via message IDs.
Increased expect timeout from 5s to 15s to accommodate worker thread encoding imports (~10s). This prevents test timeouts during startup.
1c51477 to
103aaea
Compare
Increase delays between stream deltas to better simulate realistic streaming behavior. Changes spacing from 2x+300ms, 2x+400ms to 3x+50ms, 3x+150ms, and extends final delay to 3x+500ms for more natural pacing in mock scenarios.
Map claude-haiku-4-5 to claude-3.5-haiku tokenizer as temporary workaround until ai-tokenizer adds native support for the newer model.
Changes tokenizer to gracefully handle unknown model names by falling back to a similar model's tokenizer with a warning, instead of throwing an error. This prevents crashes when new models are used before tokenizer support is added. Also fixes runtime tests on macOS by resolving symlinks in temp paths (/tmp -> /private/tmp) to match git worktree paths.
Defer worker thread creation until first use. Previously the worker was initialized at module load time, causing unnecessary overhead. Now createWorker() is called lazily from run(), reducing startup cost.
Reduces Jest maxWorkers from 100% to 50% in CI integration tests to prevent overwhelming the CI environment with excessive parallelism. Updates comment to reflect 16 workers instead of 32 on typical runners.
84da25b to
06609fe
Compare
Port tokenizer logic into a Piscina worker pool so token counting no
longer blocks the main thread and exposes async APIs across IPC.
Render live token estimates in chat input with Suspense fallback and
tighten defensive assertions while adapting tests to the new async flow.