Skip to content

Long-context chunking can recurse indefinitely on CJK-heavy input after 400 maximum context length #234

@wllbll

Description

@wllbll

Plugin Version

1.1.0-beta.9

OpenClaw Version

3.13

Bug Description

Summary

When embedding very long CJK-heavy text, memory-lancedb-pro can enter an infinite chunking/retry loop after the provider returns:

400 Invalid 'input': maximum context length is 8192 tokens.

This can quickly cause a request storm.

Environment

  • Repo: CortexReach/memory-lancedb-pro
  • Version tested: 1.1.0-beta.9
  • Branch/commit tested: master / 2ebba8e6b7b65bf38336199384d5ec8690701f6e
  • Embedding model: text-embedding-3-small
  • Provider: OpenAI-compatible API
  • Config:
    {
      "embedding": {
        "model": "text-embedding-3-small",
        "chunking": true
      }
    }
    

What happens

For some CJK-heavy inputs, the first embedding request correctly fails with a context-length error, and the plugin starts chunking.

However, one of the generated chunks can still be too large for the provider in token terms, while smartChunk() returns exactly 1 chunk of the same size for that chunk.

That leads to repeated recursion:

  • embed oversized chunk
  • get 400 maximum context length
  • chunk again
  • still get 1 chunk with the same size
  • recurse again

Expected behavior

If chunking does not actually reduce the failing chunk, the plugin should stop and return an error instead of recursively retrying.

Suspected root cause

src/embedder.ts retries chunking on context errors, but there is no guard for:

  • chunkResult.chunks.length === 1
  • and the chunk is effectively unchanged from the original failing input

This seems especially easy to trigger with CJK-heavy text because character-count heuristics are not a safe proxy for token count.

Suggested fix

Before recursively retrying chunk embeddings, add a guard like:

  • if chunk count is 1
  • and the chunk length is unchanged (or not meaningfully reduced)

then abort and throw the original context-length error instead of retrying again.

Additional note

This is not just a "quota exceeded" symptom. In our testing, the initial trigger is a real provider 400 maximum context length, and the request storm is the secondary
effect.

Expected Behavior

If chunking does not actually reduce the failing chunk, the plugin should stop and return an error instead of recursively retrying.

Steps to Reproduce

Reproduction notes

Input that reproduced reliably:

  • "啊".repeat(12000)

Observed chunking behavior from smartChunk(text, "text-embedding-3-small"):

{"len":12000,"chunkCount":3,"chunkLengths":[5734,5734,1350]}
{"len":5734,"chunkCount":1,"chunkLengths":[5734]}

This appears to be the core issue:

  • 5734 chars may still exceed the provider's token limit for CJK-heavy text
  • but chunking 5734 chars returns the same single 5734-char chunk
  • so the embedder retries recursively forever

Error Logs / Screenshots

## Minimal observed behavior

  Real provider logs:

  Document exceeded context limit (400 Invalid 'input': maximum context length is 8192 tokens.), attempting chunking...
  Split document into 3 chunks for embedding
  Document exceeded context limit (400 Invalid 'input': maximum context length is 8192 tokens.), attempting chunking...
  Split document into 1 chunks for embedding
  Document exceeded context limit (400 Invalid 'input': maximum context length is 8192 tokens.), attempting chunking...
  Split document into 1 chunks for embedding

  In a 5-second controlled test, this produced:

  - 75 embedding requests
  - repeated request lengths only among:
      - 12000
      - 5734
      - 1350

Embedding Provider

OpenAI

OS / Platform

Ubuntu

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions