Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
33c1581
fix(translate): prevent content loss in long-form translation (#166)
luandro Mar 19, 2026
fd3b4d2
fix(translate): address Codex review — reachable 8k floor and correct…
luandro Mar 19, 2026
c36dfe9
fix(translate): count setext headings in structure metrics
luandro Mar 19, 2026
4459a15
fix(translate): restrict setext heading detection to H1 (===) only
luandro Mar 19, 2026
ecdf122
fix(translate): add setext H2 and admonition tracking to completeness…
luandro Mar 19, 2026
c67c047
fix(translate): strip fenced content before metrics and fix table det…
luandro Mar 19, 2026
ece8b76
fix(scripts): resolve typescript compilation and markdown parsing bugs
luandro Mar 19, 2026
0c418f1
fix(translate): exclude YAML frontmatter from structure metrics
luandro Mar 20, 2026
d1b5ff2
fix(translate): tolerate one missing heading and restore unclosed fen…
luandro Mar 20, 2026
a5db86f
docs: add initial CHANGELOG.md file
luandro Mar 20, 2026
b040df4
fix(translate): flag any heading loss as incomplete translation
luandro Mar 25, 2026
014bd81
fix(translate): detect finish_reason:length as token_overflow
luandro Mar 25, 2026
7168988
fix(notion-translate): harden translation integrity checks
luandro Mar 26, 2026
3252c66
fix(translate): handle indented fenced code blocks
luandro Mar 26, 2026
93ba455
fix(i18n): restore translation strings and changelog formatting
luandro Mar 26, 2026
2a5bb87
revert(i18n): remove locale files from issue-166
luandro Mar 26, 2026
23e0754
fix(translate): track CommonMark fence length to prevent nested-fence…
luandro Mar 26, 2026
d15da43
fix(translate): wire frontmatter integrity failures into chunk-halvin…
luandro Mar 26, 2026
58e0a00
fix(translate): force chunked retries after incomplete responses
luandro Mar 27, 2026
1d624e0
test(translate): add efficiency eval coverage
luandro Mar 27, 2026
8638989
perf(translate): bound custom backend output budgets
luandro Mar 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

### Added
- **Targeted Notion Fetching:** Added the ability to fetch data for a single Notion page.
- **Pre-Release Safety:** Added validation checks to ensure all translations (locales) are complete.

### Changed
- **Simplified Data Fetching:** Cleaned up and simplified the logic for fetching all pages from Notion.
- **Docker Tests:** Updated the Docker integration tests to work correctly with the newly added fetch-job types.

### Removed
- **Code Cleanup:** Removed redundant code from the API schemas to make the codebase cleaner.

### Fixed
- **Translation Completeness:** Fixed several issues with how the system measures if a page is fully translated.
- **Long-form Content Translation:** Prevented issues where content could be lost when translating very long pages.
- **Language Switcher (Locale Dropdown):**
- Fixed a bug where the language switcher would sometimes point to the wrong page.
- Corrected an issue that caused "double" language codes in URLs.
- Fixed navigation issues when switching languages on category index pages.
- Fixed a display issue where the language dropdown might be hidden behind other menu items.
- **Build Scripts:** Resolved bugs in the TypeScript compilation and Markdown parsing scripts.
20 changes: 16 additions & 4 deletions scripts/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -182,10 +182,22 @@ export const ENGLISH_DIR_SAVE_ERROR =
// Translation retry configuration
export const TRANSLATION_MAX_RETRIES = 3;
export const TRANSLATION_RETRY_BASE_DELAY_MS = 750;
/** Max characters per translation chunk.
* Targets ~143K tokens (500K chars / 3.5 chars per token).
* Leaves generous buffer within OpenAI's 272K structured-output limit. */
export const TRANSLATION_CHUNK_MAX_CHARS = 500_000;
/**
* Reliability-oriented cap for proactive markdown translation chunking.
* This keeps long-form docs away from the model's theoretical context ceiling,
* even when the model advertises a much larger maximum context window.
*/
export const TRANSLATION_CHUNK_MAX_CHARS = 120_000;
/** Smallest total-budget chunk size used when retrying incomplete translations. */
export const TRANSLATION_MIN_CHUNK_MAX_CHARS = 8_000;
/**
* Maximum times to retry with smaller chunks after completeness checks fail.
* Each retry halves the chunk limit. Starting from 120 K chars:
* 120k → 60k → 30k → 15k → 8k (floor)
* Four halvings are needed to descend from the default cap to the 8k floor,
* so this must be at least 4.
*/
export const TRANSLATION_COMPLETENESS_MAX_RETRIES = 4;

// URL handling
export const INVALID_URL_PLACEHOLDER =
Expand Down
Loading
Loading