Skip to content

Refactor translation retrieval logic for improved clarity and efficiency#31

Merged
MrHinsh merged 1 commit intomainfrom
update-language-detector
Jan 22, 2026
Merged

Refactor translation retrieval logic for improved clarity and efficiency#31
MrHinsh merged 1 commit intomainfrom
update-language-detector

Conversation

@MrHinsh
Copy link
Member

@MrHinsh MrHinsh commented Jan 22, 2026

This pull request refactors the logic for determining the best available translation for each language version of a guide. The changes improve how versions are sorted and selected, ensuring that the most usable translation (with actual content) is chosen, even if it's not the newest version. The update also simplifies and clarifies the code for maintainability.

Guide translation selection improvements:

  • Versions are now parsed and sorted in descending order by year and month, ensuring the newest versions are processed first.
  • When multiple versions exist for a language, the logic now prefers the newest version with actual content; if a newer version lacks content but an older one has it, the older version is selected.
  • The process for collecting and sorting version information is streamlined by building a slice with version metadata, replacing the previous map-based approach.

Translation usability check refinement:

  • The content check in get-guide-translations-for-version.html is improved to ensure that only pages with more than 10 words of actual body content (not just frontmatter) are considered usable translations.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced guide content validation to ensure only guides with substantive content are displayed, improving overall content quality.
    • Improved translation selection logic to intelligently prioritise recent guide versions for each language whilst maintaining appropriate fallback options.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 22, 2026 17:49
@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

These template functions tighten content validation by enforcing a meaningful word-count threshold and refactor translation selection from single-pass to two-phase sorting, enabling version-aware fallback logic that prefers newer translations whilst respecting content availability.

Changes

Cohort / File(s) Summary
Content Existence Validation
module/layouts/_partials/functions/get-guide-translations-for-version.html
Raises the bar for content validation: replaces bare existence checks with a word-count threshold (plainWords > 10), ensuring only genuinely substantive content passes. Comments updated to reflect this stricter semantic check.
Translation Selection Algorithm
module/layouts/_partials/functions/get-guide-translations-list.html
Restructures translation selection into a two-phase approach: collects versioned translations with sortable keys (YYYYMM), sorts descending by version, then iterates to build per-language best-candidate map with intelligent fallback (newer preferred, older usable versions accepted when needed). Final consolidation includes Version inference and deterministic sorting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

📖✨ Ten words now matter, not zero—
Two passes sort what once was flat,
Old versions stand ready, heroes,
When newer ones falter at that.
Translation truth, version-aware.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: refactoring translation retrieval logic with improved clarity and efficiency. It directly matches the PR objectives and the substantial changes to both translation-related files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-language-detector

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MrHinsh MrHinsh added this pull request to the merge queue Jan 22, 2026
Merged via the queue into main with commit 55eab59 Jan 22, 2026
9 of 10 checks passed
@MrHinsh MrHinsh deleted the update-language-detector branch January 22, 2026 17:51
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34c4736160

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 94 to +96
{{ $plainContent := trim $page.Plain " \n\r\t" }}
{{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }}
{{ $plainWords := split $plainContent " " | len }}
{{ $hasContent = gt $plainWords 10 }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count words using whitespace-aware split

The new content check counts words with split $plainContent " ", which only splits on literal spaces. If a translation’s body is mostly separated by newlines or tabs (common for Markdown lists/short lines), this will undercount (e.g., word\nword becomes 1 “word”), causing ReadOnline to be false and the translation to be dropped even though it has real content. Consider using a whitespace-aware word count (e.g., Hugo’s countwords or normalizing \s+ to spaces before splitting) so legitimate translations aren’t incorrectly marked unusable.

Useful? React with 👍 / 👎.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the translation retrieval logic to improve how the best available translation is selected for each language version of a guide. The changes streamline version sorting by building a sorted slice upfront and implement a fallback mechanism that prefers newer versions but will use older versions if they have actual online content when newer versions don't.

Changes:

  • Refactored version collection to use a slice with pre-calculated sort keys for descending order sorting
  • Simplified language selection algorithm to process versions newest-to-oldest with ReadOnline content preference
  • Updated content validation to check for a minimum word count threshold in rendered body content

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
module/layouts/_partials/functions/get-guide-translations-list.html Refactored to build and sort versionsWithInfo slice upfront; simplified language selection to prefer newest versions with ReadOnline content
module/layouts/_partials/functions/get-guide-translations-for-version.html Changed content validation from WordCount check to manual word counting with 10-word threshold

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{{/* Count words in Plain to ensure actual content exists, not just translated frontmatter */}}
{{ $plainContent := trim $page.Plain " \n\r\t" }}
{{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }}
{{ $plainWords := split $plainContent " " | len }}
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word counting logic has been changed from using Hugo's built-in WordCount to manually splitting by spaces. The new approach split $plainContent " " | len is less reliable than the previous $page.WordCount because it doesn't properly handle multiple consecutive spaces, tabs, or other whitespace characters. This could result in inaccurate word counts. Consider using Hugo's WordCount for more reliable word counting: {{ $hasContent = gt $page.WordCount 10 }}, or if you need to count words specifically from Plain content, use a regex-based approach: {{ $plainWords := len (findRE "\\S+" $plainContent) }}.

Suggested change
{{ $plainWords := split $plainContent " " | len }}
{{ $plainWords := len (findRE "\\S+" $plainContent) }}

Copilot uses AI. Check for mistakes.
{{ $plainContent := trim $page.Plain " \n\r\t" }}
{{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }}
{{ $plainWords := split $plainContent " " | len }}
{{ $hasContent = gt $plainWords 10 }}
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 10 for the minimum word count threshold lacks justification. This makes it difficult for future maintainers to understand why this specific threshold was chosen. Consider adding a comment explaining the rationale, such as why 10 words is considered the minimum for "actual content" versus just translated frontmatter, or consider defining this as a named constant at the top of the file for better maintainability.

Suggested change
{{ $hasContent = gt $plainWords 10 }}
{{/* Minimum word count heuristic: >10 words is treated as real content, not just translated frontmatter or a stub. */}}
{{ $minContentWordCount := 10 }}
{{ $hasContent = gt $plainWords $minContentWordCount }}

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +64
{{/* We already have this language from a newer version */}}
{{/* Only replace if current (older) version has ReadOnline content and existing doesn't */}}
Copy link

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on lines 63-64 states "We already have this language from a newer version" but doesn't clearly explain the fallback strategy. The logic allows replacing a newer version with an older version if the newer version lacks ReadOnline content. Consider clarifying the comment to explicitly state this is a fallback mechanism: "We already have this language from a newer version, but we may replace it if that newer version lacks ReadOnline content".

Suggested change
{{/* We already have this language from a newer version */}}
{{/* Only replace if current (older) version has ReadOnline content and existing doesn't */}}
{{/* We already have this language from a newer version, but this block implements a fallback */}}
{{/* Fallback: replace the newer version if this older one has ReadOnline content and the existing one doesn't */}}

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
module/layouts/_partials/functions/get-guide-translations-for-version.html (2)

74-78: JavaScript-style comments will render as literal output text.

Lines 74 and 78 use // comment syntax. Hugo templates require {{/* */}} for comments. These strings will appear verbatim in your HTML output.

Proposed fix
-  // Look up PDF by normalized lang
+  {{/* Look up PDF by normalized lang */}}
   {{ $langKey := lower $lang }}
   {{ $pdfResource := index $pdfsByLang $langKey }}

-  // Fallback: check the translation page's own resources
+  {{/* Fallback: check the translation page's own resources */}}

153-153: Sorting by a field that doesn't exist in the dict.

translationInfo (lines 135–149) never sets a "Version" key, yet line 153 attempts sort $translations "Version". This sort is effectively a no-op and may mask ordering bugs downstream.

Either add "Version" $guideVersion to translationInfo, or sort by an existing field like "Weight" or "Date".

Proposed fix — add Version to the dict
   {{ $translationInfo := dict
     "Language" $lang
     "LanguageName" $languageName
     "Weight" ($langConfig.Weight | default 999)
     "Title" $title
     "RelPermalink" $relPermalink
     "Path" $path
     "VersionPath" $guideVersionPage.Path
     "Date" $date
+    "Version" $guideVersion
     "Status" (cond (and $hasContent $hasPDF) "published" (cond $hasContent "online-only" (cond $hasPDF "pdf-only" "site-only")))
     "ReadOnline" (cond $hasContent true false)
     "ReadOnlineStub" (cond $hasPage true false)
     "ReadPDF" (cond $hasPDF true false)
     "PathPdf" $pdfPath
   }}
🤖 Fix all issues with AI agents
In `@module/layouts/_partials/functions/get-guide-translations-for-version.html`:
- Around line 94-96: The current word-count logic using split on $plainContent
into $plainWords then len is brittle; replace that with Hugo's countwords
function: keep trimming ($plainContent := trim $page.Plain " \n\r\t") then
compute the word count using countwords ($wordCount := countwords $plainContent)
and set $hasContent by comparing that ($hasContent = gt $wordCount 10) instead
of using $plainWords; update references to $plainWords accordingly.
🧹 Nitpick comments (3)
module/layouts/_partials/functions/get-guide-translations-list.html (3)

39-73: Solid two-phase logic for version-aware fallback.

Processing newest-to-oldest and replacing only when an older version has ReadOnline content while the newer doesn't is exactly the right approach per the PR objectives. This respects independent version translations per the learnings.

One minor nit: the explicit boolean comparisons (eq .ReadOnline true) are verbose. Hugo treats non-nil/non-false values as truthy, so .ReadOnline suffices.

Optional: simplify boolean checks
-    {{ $isUsable := or (eq .ReadOnline true) (eq .ReadPDF true) }}
+    {{ $isUsable := or .ReadOnline .ReadPDF }}
     ...
-        {{ if and (eq .ReadOnline true) (ne $existing.ReadOnline true) }}
+        {{ if and .ReadOnline (not $existing.ReadOnline) }}

77-92: Defensive fallback for missing Version field.

Given that Version is always merged at lines 60 or 67, this extraction block should never trigger under normal operation. It's harmless as a safety net, but consider whether it masks upstream bugs—if you ever hit this path, something unexpected occurred.

A debug log or comment clarifying "this should never happen" would help future maintainers.


98-98: Single-key sort may produce non-deterministic order for equal weights.

Sorting only by Weight means languages with identical weights will appear in arbitrary order across builds. For reproducible output, chain a secondary sort key (e.g., LanguageName).

Hugo's sort doesn't support multi-key directly, but you can nest sorts or use a composite key.

Example: secondary sort by LanguageName
-{{ $translationsList = sort $translationsList "Weight" "asc" }}
+{{/* Sort by LanguageName first, then by Weight - stable sort preserves secondary order */}}
+{{ $translationsList = sort $translationsList "LanguageName" "asc" }}
+{{ $translationsList = sort $translationsList "Weight" "asc" }}

Comment on lines 94 to +96
{{ $plainContent := trim $page.Plain " \n\r\t" }}
{{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }}
{{ $plainWords := split $plainContent " " | len }}
{{ $hasContent = gt $plainWords 10 }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Hugo countwords function documentation

💡 Result:

The Hugo countwords function (alias: strings.CountWords / countwords) returns an integer approximate word count for a given string.

  • Syntax: strings.CountWords STRING (alias: countwords).
  • Returns: int (approximate number of words).
  • Example: {{ "Hugo is a static site generator." | countwords }} → 6.
  • Notes: It converts the input to a string and counts words; behavior is similar to the .WordCount page variable. [1][2]

Sources: [1] Hugo docs (strings.CountWords / countwords). [2] Hugo tpl strings package (CountWords).


🌐 Web query:

Hugo split function whitespace handling behaviour

💡 Result:

Short answer: Hugo's split (strings.Split) does not trim whitespace — it splits exactly on the delimiter and preserves surrounding spaces. For example, split "a, b" "," → ["a", " b"]. Use strings.Trim, strings.TrimSpace, or map/pipe to trim items after splitting (e.g. range (split . ",") | apply trim) to remove whitespace. [1][2]

Sources:

  • strings.Split (Hugo functions docs). [1]
  • strings.TrimSpace (Hugo string functions reference). [2]

Use countwords instead of naïve split-and-length.

The current approach—splitting on single spaces then counting array length—breaks under real-world data: consecutive spaces produce empty strings (inflating the count), and newlines don't split at all. Hugo provides countwords explicitly for this. Use it.

Fix
    {{ $plainContent := trim $page.Plain " \n\r\t" }}
-    {{ $plainWords := split $plainContent " " | len }}
-    {{ $hasContent = gt $plainWords 10 }}
+    {{ $plainWords := countwords $plainContent }}
+    {{ $hasContent = gt $plainWords 10 }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{{ $plainContent := trim $page.Plain " \n\r\t" }}
{{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }}
{{ $plainWords := split $plainContent " " | len }}
{{ $hasContent = gt $plainWords 10 }}
{{ $plainContent := trim $page.Plain " \n\r\t" }}
{{ $plainWords := countwords $plainContent }}
{{ $hasContent = gt $plainWords 10 }}
🤖 Prompt for AI Agents
In `@module/layouts/_partials/functions/get-guide-translations-for-version.html`
around lines 94 - 96, The current word-count logic using split on $plainContent
into $plainWords then len is brittle; replace that with Hugo's countwords
function: keep trimming ($plainContent := trim $page.Plain " \n\r\t") then
compute the word count using countwords ($wordCount := countwords $plainContent)
and set $hasContent by comparing that ($hasContent = gt $wordCount 10) instead
of using $plainWords; update references to $plainWords accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant