fix: prevent infinite recursion in embedSingle() for CJK text (replaces PR #215)#238
Conversation
When a large CJK text (14KB+ Chinese .md file) is processed by auto-recall, embedSingle() enters an infinite recursion loop because: 1. smartChunk() treats token limits as character limits, but CJK characters use 2-3x more tokens than ASCII characters 2. Chunks of 5740 chars (70% of 8192 token limit) still exceed the model's token context for CJK text 3. smartChunk() returns 1 chunk identical to input → embedSingle() recurses with the same text → infinite loop This produced ~50,000 embedding errors in 12 minutes, blocking the entire Node.js event loop and making all agents unresponsive. Fixes: - Add recursion depth limit (max 3) to embedSingle() with forced truncation as fallback - Detect single-chunk output (same size as input) and truncate instead of recursing - Add CJK-aware chunk sizing in smartChunk() (divide char limit by 2.5 when CJK ratio > 30%) - Truncate auto-recall query to 1000 chars before embedding - Add 10s global timeout on embedPassage()/embedQuery() Closes CortexReach#214 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e (PR CortexReach#215 follow-up) This commit addresses the two blocking issues raised in PR CortexReach#215: 1. Timeout now uses AbortController for TRUE request cancellation - Timer is properly cleaned up in .finally() - AbortSignal is passed through to embedWithRetry 2. Recursion now guarantees monotonic convergence - Introduced STRICT_REDUCTION_FACTOR = 0.5 - Each recursion level must reduce input by 50% - Works regardless of model context size Modified by AI assistant (not human code) based on PR CortexReach#215. Thanks to the original author and maintainers. Co-authored-by: Hi-Jiajun <Hi-Jiajun@users.noreply.github.com>
AliceLJY
left a comment
There was a problem hiding this comment.
Core logic is sound — the convergence math is correct (halving per recursion + depth cap = guaranteed termination), and the AbortController timeout is a good addition. A few things to address before merge:
Must fix
1. Timer leak in withTimeout()
The setTimeout is never cleared when the embedding promise resolves successfully. Under normal load every successful call leaves a dangling timer. Fix with .finally(() => clearTimeout(timeoutId)).
Simpler alternative — drop the separate timeoutPromise + abort event listener entirely:
private withTimeout<T>(promiseFactory: (signal: AbortSignal) => Promise<T>, label: string): Promise<T> {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), EMBED_TIMEOUT_MS);
return promiseFactory(controller.signal).finally(() => clearTimeout(timeoutId));
}Let the AbortError propagate naturally from the SDK call (already caught by embedWithRetry). The current dual rejection path (abort event listener + SDK AbortError) can produce different error messages for the same timeout.
2. Dead code — remove unused definitions
SAFE_CHAR_LIMITS, getSafeCharLimit(), and DEFAULT_SAFE_CHAR_LIMIT are defined but never called anywhere. The forced truncation uses text.length * STRICT_REDUCTION_FACTOR instead. Please remove them to avoid confusion.
3. Clarify relationship with PR #215
This PR's first commit is a full copy of #215 — structurally this is a replacement, not a follow-up. Please state explicitly in the PR description that merging #238 should close #215 without merging it.
Should fix
4. Add regression tests
This fixes a production incident. At minimum, add tests for:
- Single-chunk detection path (chunking returns 1 chunk ≥ 90% of original → force reduce)
- Depth limit termination (depth 3 → throw instead of recurse)
- CJK-aware chunk sizing (>30% CJK text → smaller chunks)
5. Document batch timeout asymmetry
embedQuery / embedPassage are wrapped with withTimeout, but embedBatchQuery / embedBatchPassage are not. Add a comment explaining why, or wrap them too.
- Remove unused SAFE_CHAR_LIMITS, getSafeCharLimit, DEFAULT_SAFE_CHAR_LIMIT - Add comment explaining batch timeout asymmetry (embedBatchQuery/embedBatchPassage not wrapped) - Note: withTimeout already has .finally() cleanup, no change needed
…ortexReach#238) - Test single-chunk detection (force-reduce when chunk >= 90% of original) - Test depth limit termination (depth >= MAX_EMBED_DEPTH throws) - Test CJK-aware chunk sizing (>30% CJK -> smaller chunks) - Test strict reduction factor (50% per recursion level) - Test batch embedding works correctly
UpdateThe latest changes have been pushed to this PR:
The code has been tested locally. |
My suggestion would be:
|
- Preserve and surface chunkError instead of hiding behind original error - Remove 1000 char hard floor in smartChunk for small-context models (now 200) - Add regression test for small-context model chunking (all-MiniLM-L6-v2) - Add regression test for chunkError preservation - Wire cjk-recursion-regression.test.mjs into main test suite (CI)
rwmjhb review comments addressed1. chunkError is now preserved and surfaced
2. Removed 1000-char hard floor for small-context models
3. Regression test integrated into CI
|
Follow-up update (implemented and tested by gpt-5.4)I pushed an additional follow-up commit to fully address the remaining issues in this PR. What was fixed
Local verificationRan locally after the changes:
Both passed locally after the latest commit. This follow-up was modified and tested using gpt-5.4 on the local plugin/fork setup before pushing to this PR branch. |
Latest follow-up pushedI pushed one more follow-up commit to further strengthen the regression coverage for this PR:
What was added in this latest updateThe regression test now explicitly covers both the original reviewer-requested cases and the later follow-up concerns:
Local verificationRe-ran locally after the latest test update:
Both passed locally. This latest follow-up was also implemented and tested using gpt-5.4 before pushing to the PR branch. |
rwmjhb
left a comment
There was a problem hiding this comment.
LGTM! 多层防御的方案设计合理,测试覆盖充分。
两个小建议(不阻塞合并):
withTimeout的_label参数未使用,可以移除或加上日志输出MAX_EMBED_DEPTH名字有点误导 — 到达该深度后并不停止递归,只是切换到 force-truncation 模式,真正的终止靠safeLimit < 100的下限检查。建议改名为FORCE_TRUNCATE_DEPTH或加个注释说明
- Remove unused SAFE_CHAR_LIMITS, getSafeCharLimit, DEFAULT_SAFE_CHAR_LIMIT - Add comment explaining batch timeout asymmetry (embedBatchQuery/embedBatchPassage not wrapped) - Note: withTimeout already has .finally() cleanup, no change needed
…ortexReach#238) - Test single-chunk detection (force-reduce when chunk >= 90% of original) - Test depth limit termination (depth >= MAX_EMBED_DEPTH throws) - Test CJK-aware chunk sizing (>30% CJK -> smaller chunks) - Test strict reduction factor (50% per recursion level) - Test batch embedding works correctly
- Preserve and surface chunkError instead of hiding behind original error - Remove 1000 char hard floor in smartChunk for small-context models (now 200) - Add regression test for small-context model chunking (all-MiniLM-L6-v2) - Add regression test for chunkError preservation - Wire cjk-recursion-regression.test.mjs into main test suite (CI)
- Remove unused SAFE_CHAR_LIMITS, getSafeCharLimit, DEFAULT_SAFE_CHAR_LIMIT - Add comment explaining batch timeout asymmetry (embedBatchQuery/embedBatchPassage not wrapped) - Note: withTimeout already has .finally() cleanup, no change needed
…ortexReach#238) - Test single-chunk detection (force-reduce when chunk >= 90% of original) - Test depth limit termination (depth >= MAX_EMBED_DEPTH throws) - Test CJK-aware chunk sizing (>30% CJK -> smaller chunks) - Test strict reduction factor (50% per recursion level) - Test batch embedding works correctly
- Preserve and surface chunkError instead of hiding behind original error - Remove 1000 char hard floor in smartChunk for small-context models (now 200) - Add regression test for small-context model chunking (all-MiniLM-L6-v2) - Add regression test for chunkError preservation - Wire cjk-recursion-regression.test.mjs into main test suite (CI)
fix: prevent infinite recursion in embedSingle() for CJK text (replaces PR CortexReach#215)
Summary
This PR addresses the two blocking issues raised in PR #215:
Issue 1: Timeout not truly canceling requests
The original PR used Promise.race() + setTimeout() which only rejects the promise but doesn't cancel the underlying HTTP request.
Fix:
Issue 2: Recursion not guaranteeing convergence
The original PR added depth limits but didn't guarantee monotonic convergence for all models (especially small context models like all-MiniLM-L6-v2 with 512 tokens).
Fix:
Changes Made
Testing
Note: This PR replaces PR #215
This is a replacement, not a follow-up for PR #215. The first commit in this PR contains all changes from PR #215. When PR #238 is merged, PR #215 should be closed without merging.
Attribution