Skip to content

Commit 023dad4

Browse files
luandroopenhands-agentclaude
authored
docs: add translation image stabilization specification for issue #137 (#140)
* docs: add translation image stabilization specification for issue #137 This specification document addresses the problem of expiring Notion/S3 image URLs in translated documentation. It provides: - Technical analysis of the current translation pipeline - Proposed solution integrating existing image processing infrastructure - Detailed 5-phase implementation plan with time estimates - Testing requirements and acceptance criteria - Risk analysis and rollback strategy The solution reuses the existing processAndReplaceImages() and validateAndFixRemainingImages() functions from the notion-fetch pipeline to stabilize image URLs before and after translation. Closes #137 Co-authored-by: openhands <openhands@all-hands.dev> * docs: add per-language image overrides and improve spec clarity Updates to TRANSLATION_IMAGE_STABILIZATION_SPEC.md: Per-language image overrides: - Add section explaining default vs override behavior - Document how translators can use localized screenshots - Define precedence rules (language page wins, EN fallback) - Add how-to snippet with before/after example Pipeline improvements: - Add language-aware source selection step in pipeline flow - Update acceptance criteria for localized images Technical clarifications: - Add function contracts section with exact input/output types - Fix line number typo (858-518 → 858-890) - Add invariants section with grep verification commands - Make failure policy explicit (fail page, no placeholders) - Clarify validation catches mutated canonical paths Testing: - Add per-language override test case - Require mocked network/filesystem in tests Cleanup: - Remove time estimates from phases - Reuse existing slug logic (no new implementation) - Expand out-of-scope section (no migration, no new storage) Co-authored-by: openhands <openhands@all-hands.dev> * docs: add review and simplified spec for translation image stabilization - Add detailed review of original spec with 17 issues identified - Add simplified spec that's easier to understand (ELI5 style) - Key changes: removed redundant items, fixed fallback behavior, clarified edge cases, added implementation details 🤖 Generated with [Qoder][https://qoder.com] * docs: update image stabilization spec with implementation clarifications (#141) * docs: fix spec accuracy issues found during code review Address all issues identified in the spec review by verifying against actual codebase: - Fix function name to processSinglePageTranslation (not processPageTranslation) - Fix slug helper to include page ID suffix for deterministic filenames - Replace validateAndFixRemainingImages (warns only) with getImageDiagnostics + throw - Add totalFailures > 0 check to fail on broken placeholder images - Defer per-language image overrides to future enhancement (EN-only initially) - Downgrade prompt enhancement to P2 (already 3 existing instructions) - Fix import list (remove unused validateAndFixRemainingImages) - Remove incorrect metrics.skippedSmallSize reference in logging - Add Prerequisites section for EN fetch recommendation - Document frontmatter images as out of scope - Update review document with resolution status https://claude.ai/code/session_01RMuRBsneMpCL1cBVLuVcmx * docs: add implementation plan for translation image stabilization (#137) Machine-readable implementation plan with atomic tasks, exact code locations, dependency graph, and complete test specifications. Ready for autonomous execution. https://claude.ai/code/session_01RMuRBsneMpCL1cBVLuVcmx * fix: address review feedback inconsistencies - Fix __tests__/ path to flat path in simplified spec - Fix title -> originalTitle variable name consistency - Add note about invoking processSinglePageTranslation in tests 🤖 Generated with [Qoder][https://qoder.com] * fix(spec): align SPEC with PLAN for logging and TASK-3.6 - Make diagnostic logging conditional (only log when successfulImages > 0) - Fix TASK-3.6 label to match PLAN (Test title page bypass) --------- Co-authored-by: Claude <noreply@anthropic.com> * feat(translation): add image stabilization to translation workflow Prevent expiring S3 URLs from being embedded in translated content by processing images before translation. The workflow now: 1. Pre-stabilization: Download S3 images, save to /images/, replace URLs 2. Translate: Send stabilized markdown to translation API 3. Post-validation: Verify no S3 URLs survived translation Key changes: - Reorder pipeline: images → callouts → emojis (was callouts first) - Skip canonical /images/ paths (don't re-process stable references) - Cache stabilized markdown per source page (reuse across languages) - Fail fast on image download failures (no broken placeholders) - Redact sensitive params from error messages (signed URL safety) Tested: 139 tests passed, 4 test files * docs: remove redundant spec docs, keep only SPEC as source of truth * fix: address review comments - deduplicate S3 detection and fix spec - Extract detectNotionS3Urls() helper to eliminate duplicated detection logic - Update acceptance criteria to clarify only Notion-family S3 URLs are blocked - Allow generic (non-Notion) amazonaws URLs through as per implementation * test: add cache edge case and dual detection tests for image stabilization - Add tests for cache edge cases: no images, cache isolation, fallback - Add tests for dual detection interaction scenarios - Total tests increased from 17 to 22 * docs: remove per-language image override tests from spec Per-language image overrides are not needed - images are added manually after translation if needed. This resolves the last remaining review comment. * chore: remove completed translation image stabilization spec The implementation is done, spec document is no longer needed. 🤖 Generated with [Qoder][https://qoder.com] --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 2fbc024 commit 023dad4

File tree

8 files changed

+1340
-53
lines changed

8 files changed

+1340
-53
lines changed

scripts/notion-fetch/__tests__/processMarkdownWithRetry.test.ts

Lines changed: 109 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ describe("processMarkdownWithRetry", () => {
5656
let hasS3Urls: Mock;
5757
let getImageDiagnostics: Mock;
5858
let processMarkdownWithRetry: any;
59+
let processMarkdownSinglePass: any;
5960

6061
beforeEach(async () => {
6162
restoreEnv = installTestNotionEnv();
@@ -74,9 +75,12 @@ describe("processMarkdownWithRetry", () => {
7475
const markdownRetryProcessor = await import("../markdownRetryProcessor");
7576
processMarkdownWithRetry =
7677
markdownRetryProcessor.processMarkdownWithRetry;
78+
processMarkdownSinglePass =
79+
markdownRetryProcessor.processMarkdownSinglePass;
7780
} catch (error) {
7881
// Should not fail - function should exist in dedicated module
7982
processMarkdownWithRetry = undefined;
83+
processMarkdownSinglePass = undefined;
8084
}
8185
});
8286

@@ -472,6 +476,106 @@ describe("processMarkdownWithRetry", () => {
472476
expect(result.content).toBeDefined();
473477
expect(result.totalSaved).toBeGreaterThanOrEqual(0);
474478
});
479+
480+
it("should run image stabilization before callout and emoji transforms in retry flow", async () => {
481+
expect(processMarkdownWithRetry).toBeDefined();
482+
483+
const initialContent = "# Test\n\n![image](/images/test.png)";
484+
const pageContext = {
485+
pageId: "ordered-retry-page-id",
486+
pageTitle: "Ordered Retry Page",
487+
safeFilename: "ordered-retry-page",
488+
};
489+
const rawBlocks = [{ type: "callout", callout: { rich_text: [] } }];
490+
const emojiMap = new Map<string, string>([["test-emoji", "😀"]]);
491+
492+
processAndReplaceImages.mockResolvedValue({
493+
markdown: initialContent,
494+
stats: { successfulImages: 1, totalFailures: 0, totalSaved: 64 },
495+
});
496+
validateAndFixRemainingImages.mockResolvedValue(initialContent);
497+
hasS3Urls.mockReturnValue(false);
498+
getImageDiagnostics.mockReturnValue({
499+
totalMatches: 1,
500+
markdownMatches: 1,
501+
htmlMatches: 0,
502+
s3Matches: 0,
503+
s3Samples: [],
504+
});
505+
506+
const markdownTransform = await import("../markdownTransform");
507+
const processCalloutsInMarkdown =
508+
markdownTransform.processCalloutsInMarkdown as Mock;
509+
510+
const emojiProcessor = await import("../emojiProcessor");
511+
const applyEmojiMappings = emojiProcessor.EmojiProcessor
512+
.applyEmojiMappings as Mock;
513+
514+
await processMarkdownWithRetry(
515+
initialContent,
516+
pageContext,
517+
rawBlocks,
518+
emojiMap
519+
);
520+
521+
const imageOrder = processAndReplaceImages.mock.invocationCallOrder[0];
522+
const calloutOrder =
523+
processCalloutsInMarkdown.mock.invocationCallOrder[0];
524+
const emojiOrder = applyEmojiMappings.mock.invocationCallOrder[0];
525+
526+
expect(imageOrder).toBeLessThan(calloutOrder);
527+
expect(calloutOrder).toBeLessThan(emojiOrder);
528+
});
529+
530+
it("should run image stabilization before callout and emoji transforms in single-pass flow", async () => {
531+
expect(processMarkdownSinglePass).toBeDefined();
532+
533+
const initialContent = "# Test\n\n![image](/images/test.png)";
534+
const pageContext = {
535+
pageId: "ordered-single-pass-page-id",
536+
pageTitle: "Ordered Single Pass Page",
537+
safeFilename: "ordered-single-pass-page",
538+
};
539+
const rawBlocks = [{ type: "callout", callout: { rich_text: [] } }];
540+
const emojiMap = new Map<string, string>([["test-emoji", "😀"]]);
541+
542+
processAndReplaceImages.mockResolvedValue({
543+
markdown: initialContent,
544+
stats: { successfulImages: 1, totalFailures: 0, totalSaved: 64 },
545+
});
546+
validateAndFixRemainingImages.mockResolvedValue(initialContent);
547+
hasS3Urls.mockReturnValue(false);
548+
getImageDiagnostics.mockReturnValue({
549+
totalMatches: 1,
550+
markdownMatches: 1,
551+
htmlMatches: 0,
552+
s3Matches: 0,
553+
s3Samples: [],
554+
});
555+
556+
const markdownTransform = await import("../markdownTransform");
557+
const processCalloutsInMarkdown =
558+
markdownTransform.processCalloutsInMarkdown as Mock;
559+
560+
const emojiProcessor = await import("../emojiProcessor");
561+
const applyEmojiMappings = emojiProcessor.EmojiProcessor
562+
.applyEmojiMappings as Mock;
563+
564+
await processMarkdownSinglePass(
565+
initialContent,
566+
pageContext,
567+
rawBlocks,
568+
emojiMap
569+
);
570+
571+
const imageOrder = processAndReplaceImages.mock.invocationCallOrder[0];
572+
const calloutOrder =
573+
processCalloutsInMarkdown.mock.invocationCallOrder[0];
574+
const emojiOrder = applyEmojiMappings.mock.invocationCallOrder[0];
575+
576+
expect(imageOrder).toBeLessThan(calloutOrder);
577+
expect(calloutOrder).toBeLessThan(emojiOrder);
578+
});
475579
});
476580

477581
describe("Configuration Boundary Tests", () => {
@@ -1357,6 +1461,7 @@ describe("processMarkdownWithRetry", () => {
13571461
];
13581462

13591463
processAndReplaceImages.mockImplementation(async () => {
1464+
// eslint-disable-next-line security/detect-object-injection -- attemptCount is loop counter, not user input
13601465
const markdown = results[attemptCount] || results[2];
13611466
attemptCount++;
13621467
return {
@@ -1815,6 +1920,7 @@ describe("processMarkdownWithRetry", () => {
18151920
// All pages should eventually succeed
18161921
results.forEach((result, i) => {
18171922
expect(result.containsS3).toBe(false);
1923+
// eslint-disable-next-line security/detect-object-injection -- i is forEach index, not user input
18181924
expect(result.retryAttempts).toBe(pages[i].retries);
18191925
});
18201926

@@ -1900,6 +2006,7 @@ describe("processMarkdownWithRetry", () => {
19002006

19012007
// Check success/failure as expected
19022008
results.forEach((result, i) => {
2009+
// eslint-disable-next-line security/detect-object-injection -- i is forEach index, not user input
19032010
if (pages[i].shouldSucceed) {
19042011
expect(result.containsS3).toBe(false);
19052012
expect(result.retryAttempts).toBe(0);
@@ -2161,10 +2268,10 @@ describe("processMarkdownWithRetry", () => {
21612268
.applyEmojiMappings as Mock;
21622269
applyEmojiMappings.mockImplementation(
21632270
(content: string, map: Map<string, string>) => {
2164-
// Simulate emoji replacement
2271+
// Simulate emoji replacement using string replaceAll (no regex needed)
21652272
let result = content;
21662273
map.forEach((emoji, key) => {
2167-
result = result.replace(new RegExp(`:${key}:`, "g"), emoji);
2274+
result = result.replaceAll(`:${key}:`, emoji);
21682275
});
21692276
return result;
21702277
}

scripts/notion-fetch/imageReplacer.test.ts

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -304,12 +304,32 @@ Some text
304304
expect(result.stats.totalFailures).toBe(1);
305305
});
306306

307-
it("should skip local images", async () => {
308-
const markdown = "![alt](/local/image.png)";
307+
it("should keep canonical /images paths unchanged", async () => {
308+
const { processImageWithFallbacks } = await import("./imageProcessing");
309+
const markdown = "![alt](/images/already-local.png)";
309310
const result = await processAndReplaceImages(markdown, "test-file");
310311

311-
expect(result.markdown).toContain("Failed image");
312-
expect(result.stats.totalFailures).toBe(1);
312+
expect(result.markdown).toBe(markdown);
313+
expect(result.stats.totalFailures).toBe(0);
314+
expect(result.stats.successfulImages).toBe(0);
315+
expect(processImageWithFallbacks).not.toHaveBeenCalled();
316+
});
317+
318+
it("should log a single aggregate message for canonical /images paths", async () => {
319+
const infoSpy = vi.spyOn(console, "info").mockImplementation(() => {});
320+
const markdown = `
321+
![one](/images/one.png)
322+
![two](/images/two.png)
323+
![three](/images/three.png)
324+
`;
325+
326+
await processAndReplaceImages(markdown, "test-file");
327+
328+
const canonicalLogs = infoSpy.mock.calls.filter((args) =>
329+
String(args[0]).includes("canonical /images/ paths unchanged")
330+
);
331+
expect(canonicalLogs).toHaveLength(1);
332+
expect(canonicalLogs[0][0]).toContain("Kept 3");
313333
});
314334

315335
it("should process multiple images concurrently", async () => {

scripts/notion-fetch/imageReplacer.ts

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@ const DEBUG_S3_IMAGES =
8181

8282
const LARGE_MARKDOWN_THRESHOLD = 700_000;
8383

84+
function isCanonicalLocalImagePath(url: string): boolean {
85+
return url.startsWith("/images/");
86+
}
87+
8488
function debugS3(message: string): void {
8589
if (DEBUG_S3_IMAGES) {
8690
console.log(chalk.magenta(`[s3-debug] ${message}`));
@@ -441,8 +445,20 @@ export async function processAndReplaceImages(
441445
error: string;
442446
fallbackUsed: true;
443447
}> = [];
448+
let canonicalLocalImagesKept = 0;
444449

445450
for (const match of imageMatches) {
451+
const trimmedUrl = match.url.trim();
452+
if (isCanonicalLocalImagePath(trimmedUrl)) {
453+
canonicalLocalImagesKept++;
454+
if (DEBUG_S3_IMAGES) {
455+
debugS3(
456+
`[${safeFilename}] keeping canonical local image unchanged: ${trimmedUrl}`
457+
);
458+
}
459+
continue;
460+
}
461+
446462
const urlValidation = validateAndSanitizeImageUrl(match.url);
447463

448464
// DEBUG: Log validation result for each image
@@ -526,6 +542,14 @@ export async function processAndReplaceImages(
526542
}
527543
}
528544

545+
if (canonicalLocalImagesKept > 0) {
546+
console.info(
547+
chalk.blue(
548+
`ℹ️ Kept ${canonicalLocalImagesKept} canonical /images/ path${canonicalLocalImagesKept === 1 ? "" : "s"} unchanged`
549+
)
550+
);
551+
}
552+
529553
// DEBUG: Log categorization summary
530554
if (DEBUG_S3_IMAGES) {
531555
const validS3Count = validImages.filter((vi) =>
@@ -680,9 +704,10 @@ export async function processAndReplaceImages(
680704

681705
// Phase 3: Report results
682706
const totalImages = imageMatches.length;
707+
const remoteImagesProcessed = validImages.length;
683708
console.info(
684709
chalk.green(
685-
`📸 Processed ${totalImages} images: ${successfulImages} successful, ${totalFailures} failed`
710+
`📸 Processed ${remoteImagesProcessed}/${totalImages} images: ${successfulImages} successful, ${totalFailures} failed`
686711
)
687712
);
688713
if (totalFailures > 0) {

scripts/notion-fetch/markdownRetryProcessor.ts

Lines changed: 18 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,8 @@ export interface RetryMetrics {
125125
* when S3 URLs persist after processing.
126126
*
127127
* **Processing Pipeline** (executed on each attempt):
128-
* 1. Process callouts (convert Notion callouts to Docusaurus admonitions)
129-
* 2. Process and replace images (fetch S3 images and save locally)
128+
* 1. Process and replace images (fetch S3 images and save locally)
129+
* 2. Process callouts (convert Notion callouts to Docusaurus admonitions)
130130
* 3. Apply emoji mappings (replace custom emoji references)
131131
* 4. Validate and fix remaining images (final S3 URL cleanup)
132132
*
@@ -200,7 +200,7 @@ export async function processMarkdownWithRetry(
200200

201201
/**
202202
* Run the full content processing pipeline for one attempt.
203-
* Processes calloutsimages → emojis → validation in sequence.
203+
* Processes imagescallouts → emojis → validation in sequence.
204204
*/
205205
const runFullContentPipeline = async (
206206
initialContent: string,
@@ -225,12 +225,12 @@ export async function processMarkdownWithRetry(
225225
let savedDelta = 0;
226226
let fallbackEmojiCount = 0;
227227

228-
// DEBUG: Log image count BEFORE callout processing
228+
// DEBUG: Log image count BEFORE image processing
229229
if (DEBUG_S3_IMAGES) {
230230
const beforeDiagnostics = getImageDiagnostics(workingContent);
231231
console.log(
232232
chalk.magenta(
233-
`[s3-debug] BEFORE callout processing: ${beforeDiagnostics.totalMatches} images (S3: ${beforeDiagnostics.s3Matches})`
233+
`[s3-debug] BEFORE image processing: ${beforeDiagnostics.totalMatches} images (S3: ${beforeDiagnostics.s3Matches})`
234234
)
235235
);
236236

@@ -245,21 +245,6 @@ export async function processMarkdownWithRetry(
245245
}
246246
}
247247

248-
if (rawBlocks && rawBlocks.length > 0) {
249-
workingContent = processCalloutsInMarkdown(workingContent, rawBlocks);
250-
console.log(chalk.blue(` ↳ Processed callouts in markdown content`));
251-
}
252-
253-
// DEBUG: Log image count AFTER callout processing
254-
if (DEBUG_S3_IMAGES) {
255-
const afterDiagnostics = getImageDiagnostics(workingContent);
256-
console.log(
257-
chalk.magenta(
258-
`[s3-debug] AFTER callout processing: ${afterDiagnostics.totalMatches} images (S3: ${afterDiagnostics.s3Matches})`
259-
)
260-
);
261-
}
262-
263248
const imageResult = await processAndReplaceImages(
264249
workingContent,
265250
attemptLabel
@@ -268,6 +253,11 @@ export async function processMarkdownWithRetry(
268253
savedDelta += imageResult.stats.totalSaved;
269254
warnIfS3("Image processing stage", workingContent);
270255

256+
if (rawBlocks && rawBlocks.length > 0) {
257+
workingContent = processCalloutsInMarkdown(workingContent, rawBlocks);
258+
console.log(chalk.blue(` ↳ Processed callouts in markdown content`));
259+
}
260+
271261
if (emojiMap.size > 0) {
272262
workingContent = EmojiProcessor.applyEmojiMappings(
273263
workingContent,
@@ -549,8 +539,8 @@ export async function processMarkdownWithRetry(
549539
* less robust to transient issues like regex bugs or timing problems.
550540
*
551541
* **Processing Pipeline** (single pass):
552-
* 1. Process callouts (convert Notion callouts to Docusaurus admonitions)
553-
* 2. Process and replace images (fetch S3 images and save locally)
542+
* 1. Process and replace images (fetch S3 images and save locally)
543+
* 2. Process callouts (convert Notion callouts to Docusaurus admonitions)
554544
* 3. Apply emoji mappings (replace custom emoji references)
555545
* 4. Validate and fix remaining images (final S3 URL cleanup)
556546
*
@@ -606,12 +596,6 @@ export async function processMarkdownSinglePass(
606596
chalk.gray(` ℹ️ Using single-pass processing (retry disabled)`)
607597
);
608598

609-
// Process callouts
610-
if (rawBlocks && rawBlocks.length > 0) {
611-
workingContent = processCalloutsInMarkdown(workingContent, rawBlocks);
612-
console.log(chalk.blue(` ↳ Processed callouts in markdown content`));
613-
}
614-
615599
// Process and replace images
616600
const imageResult = await processAndReplaceImages(
617601
workingContent,
@@ -620,6 +604,12 @@ export async function processMarkdownSinglePass(
620604
workingContent = imageResult.markdown;
621605
totalSaved += imageResult.stats.totalSaved;
622606

607+
// Process callouts
608+
if (rawBlocks && rawBlocks.length > 0) {
609+
workingContent = processCalloutsInMarkdown(workingContent, rawBlocks);
610+
console.log(chalk.blue(` ↳ Processed callouts in markdown content`));
611+
}
612+
623613
// Apply emoji mappings
624614
if (emojiMap.size > 0) {
625615
workingContent = EmojiProcessor.applyEmojiMappings(

0 commit comments

Comments
 (0)