You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(translate): chunk large pages to avoid GPT-5.2 structured-output token limit
Pages like "Creating a New Observation" (~486K tokens) exceed OpenAI's
272K token limit for json_schema strict mode, causing translation to fail
for both pt-BR and es.
Fix: translateText now splits oversized markdown into chunks before
calling the API, then reassembles the translated pieces transparently.
Callers and function signatures are unchanged.
Key details:
- TRANSLATION_CHUNK_MAX_CHARS = 500_000 (~143K tokens, conservative buffer)
- Fence-aware section splitter: # inside code blocks is never a boundary
- 3-level fallback: headings -> paragraphs -> lines -> character slicing
- Leading oversized tokens are correctly split even when no content has
been accumulated yet in the current chunk
- token_overflow error code (non-critical) enables targeted recovery
- Adaptive fallback halves any chunk that still overflows after splitting
- 11 tests including lossless round-trip and leading-token edge cases
0 commit comments