Skip to content

Commit 45ab75e

Browse files
authored
Enance document translation script to consistently generate valid markdown text (#394)
1 parent 1e6e79c commit 45ab75e

File tree

1 file changed

+72
-4
lines changed

1 file changed

+72
-4
lines changed

docs/src/scripts/translate.ts

Lines changed: 72 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,12 @@
44
import fs from 'fs/promises';
55
import path from 'path';
66
import { fileURLToPath } from 'url';
7-
import { Agent, Runner, setDefaultOpenAIKey } from '@openai/agents';
7+
import {
8+
Agent,
9+
getDefaultModelSettings,
10+
Runner,
11+
setDefaultOpenAIKey,
12+
} from '@openai/agents';
813

914
const __filename = fileURLToPath(import.meta.url);
1015
const __dirname = path.dirname(__filename);
@@ -226,12 +231,33 @@ You must return **only** the translated markdown. Do not include any commentary,
226231
- Treat the **Do‑Not‑Translate list** and **Term‑Specific list** as case‑insensitive; preserve the original casing you see.
227232
- No markdown tags.
228233
234+
#########################
235+
## HARD CONSTRAINTS ##
236+
#########################
237+
- Never insert spaces immediately inside emphasis markers. Use \`**bold**\`, not \`** bold **\`.
238+
- Preserve the number of emphasis markers from the source: if the source uses \`**\` or \`__\`, keep the same pair count.
239+
- Ensure one space after heading markers: \`##Heading\` -> \`## Heading\`.
240+
- Ensure one space after list markers: \`-Item\` -> \`- Item\`, \`*Item\` -> \`* Item\` (does not apply to \`**\`).
241+
- Trim spaces inside link/image labels: \`[ Label ](url)\` -> \`[Label](url)\`.
242+
243+
###########################
244+
## GOOD / BAD EXAMPLES ##
245+
###########################
246+
- Good: This is **bold** text.
247+
- Bad: This is ** bold ** text.
248+
- Good: ## Heading
249+
- Bad: ##Heading
250+
- Good: - Item
251+
- Bad: -Item
252+
- Good: [Label](https://example.com)
253+
- Bad: [ Label ](https://example.com)
254+
229255
#########################
230256
## LANGUAGE‑SPECIFIC ##
231257
#########################
232258
*(applies only when ${targetLanguage} = Japanese)*
233259
- Insert a half‑width space before and after all alphanumeric terms.
234-
- Add a half‑width space just outside markdown emphasis markers: \` **太字** \` (good) vs \`** 太字 **\` (bad).
260+
- Add a half‑width space just outside markdown emphasis markers: \` **bold** \` (good) vs \`** bold **\` (bad).
235261
236262
#########################
237263
## DO NOT TRANSLATE ##
@@ -251,6 +277,15 @@ ${specificTerms}
251277
${specificInstructions}
252278
- When translating Markdown tables, preserve the exact table structure, including all delimiters (|), header separators (---), and row/column counts. Only translate the cell contents. Do not add, remove, or reorder columns or rows.
253279
280+
#########################
281+
## VALIDATION STEPS ##
282+
#########################
283+
Before returning the final title, run this mental checklist and fix issues if any:
284+
- No occurrences of: \`**\\s+[^*]*\\s+**\`, \`__\\s+[^_]*\\s+__\`.
285+
- No heading without a space: lines starting with \`#{1,6}\` must be followed by a space.
286+
- No list marker without a space: lines starting with \`-\`, \`+\`, or a single \`*\` must be followed by a space.
287+
- No spaces just inside \`[ ... ]\` or \`![ ... ]\` labels.
288+
254289
#########################
255290
## IF UNSURE ##
256291
#########################
@@ -324,12 +359,33 @@ You must return **only** the translated markdown. Do not include any commentary,
324359
- Link URLs inside \`[label](URL)\` – translate the label, never the URL.
325360
- The internal links like [{label here}](path here) must be kept as-is.
326361
362+
#########################
363+
## HARD CONSTRAINTS ##
364+
#########################
365+
- Never insert spaces immediately inside emphasis markers. Use \`**bold**\`, not \`** bold **\`.
366+
- Preserve the number of emphasis markers from the source: if the source uses \`**\` or \`__\`, keep the same pair count.
367+
- Ensure one space after heading markers: \`##Heading\` -> \`## Heading\`.
368+
- Ensure one space after list markers: \`-Item\` -> \`- Item\`, \`*Item\` -> \`* Item\` (does not apply to \`**\`).
369+
- Trim spaces inside link/image labels: \`[ Label ](url)\` -> \`[Label](url)\`.
370+
371+
###########################
372+
## GOOD / BAD EXAMPLES ##
373+
###########################
374+
- Good: This is **bold** text.
375+
- Bad: This is ** bold ** text.
376+
- Good: ## Heading
377+
- Bad: ##Heading
378+
- Good: - Item
379+
- Bad: -Item
380+
- Good: [Label](https://example.com)
381+
- Bad: [ Label ](https://example.com)
382+
327383
#########################
328384
## LANGUAGE‑SPECIFIC ##
329385
#########################
330386
*(applies only when ${targetLanguage} = Japanese)*
331387
- Insert a half‑width space before and after all alphanumeric terms.
332-
- Add a half‑width space just outside markdown emphasis markers: \` **太字** \` (good) vs \`** 太字 **\` (bad). Review this rule again before returning the translated text.
388+
- Add a half‑width space just outside markdown emphasis markers: \` **bold** \` (good) vs \`** bold **\` (bad). Review this rule again before returning the translated text.
333389
334390
#########################
335391
## DO NOT TRANSLATE ##
@@ -373,6 +429,11 @@ Follow the following workflow to translate the given markdown text data:
373429
- any errors or rooms for improvements in terms of Markdown text format -- A common error is to have spaces within special syntax like * or _. You must have spaces after special syntax like * or _, but it's NOT the same for the parts inside special syntax (e.g., ** bold ** must be **bold**)
374430
- you should not have any unnecessary spaces outside of tags; especially for the ones you replace with the "TERM-SPECIFIC" list
375431
- any parts that are not compatible with *.mdx files -- In the past, you've generated an expression with acorn like {#title-here} in h2 (##) level but it was neither necessary nor valid
432+
- Run a final regex check in your head and fix if any of these patterns appear in your output:
433+
- \`**\\s+[^*]*\\s+**\` or \`__\\s+[^_]*\\s+__\` (spaces inside emphasis)
434+
- Lines starting with \`#{1,6}\` not followed by a space
435+
- Lines starting with \`-\`, \`+\`, or a single \`*\` not followed by a space
436+
- Avoid spaces directly inside link or image labels: use \`[Label](url)\`, not \`[ Label ](url)\` or \`![ Label ](url)\`.
376437
4. If improvements are necessary, refine the content without changing the original meaning.
377438
5. Continue improving the translation until you are fully satisfied with the result.
378439
6. Once the final output is ready, return **only** the translated markdown text. No extra commentary.
@@ -386,7 +447,13 @@ async function callAgent(
386447
instructions: string,
387448
model: string = OPENAI_MODEL,
388449
): Promise<string> {
389-
const agent = new Agent({ name: 'translator', instructions, model });
450+
const modelSettings = getDefaultModelSettings(model);
451+
const agent = new Agent({
452+
name: 'translator',
453+
instructions,
454+
model,
455+
modelSettings,
456+
});
390457
const result = await runner.run(agent, content);
391458
const output = result.finalOutput;
392459
if (!output) {
@@ -537,6 +604,7 @@ async function translateFile(
537604
const translated = await callAgent(chunk, instructions);
538605
translatedContent.push(translated);
539606
}
607+
// Join translated chunks back together; formatting is guided by prompt constraints
540608
let translatedText = translatedContent.join('\n');
541609
for (let idx = 0; idx < codeBlocks.length; ++idx) {
542610
translatedText = translatedText.replace(

0 commit comments

Comments
 (0)