Refactor translation retrieval logic for improved clarity and efficiency#31
Refactor translation retrieval logic for improved clarity and efficiency#31
Conversation
WalkthroughThese template functions tighten content validation by enforcing a meaningful word-count threshold and refactor translation selection from single-pass to two-phase sorting, enabling version-aware fallback logic that prefers newer translations whilst respecting content availability. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 34c4736160
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| {{ $plainContent := trim $page.Plain " \n\r\t" }} | ||
| {{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }} | ||
| {{ $plainWords := split $plainContent " " | len }} | ||
| {{ $hasContent = gt $plainWords 10 }} |
There was a problem hiding this comment.
Count words using whitespace-aware split
The new content check counts words with split $plainContent " ", which only splits on literal spaces. If a translation’s body is mostly separated by newlines or tabs (common for Markdown lists/short lines), this will undercount (e.g., word\nword becomes 1 “word”), causing ReadOnline to be false and the translation to be dropped even though it has real content. Consider using a whitespace-aware word count (e.g., Hugo’s countwords or normalizing \s+ to spaces before splitting) so legitimate translations aren’t incorrectly marked unusable.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This pull request refactors the translation retrieval logic to improve how the best available translation is selected for each language version of a guide. The changes streamline version sorting by building a sorted slice upfront and implement a fallback mechanism that prefers newer versions but will use older versions if they have actual online content when newer versions don't.
Changes:
- Refactored version collection to use a slice with pre-calculated sort keys for descending order sorting
- Simplified language selection algorithm to process versions newest-to-oldest with ReadOnline content preference
- Updated content validation to check for a minimum word count threshold in rendered body content
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| module/layouts/_partials/functions/get-guide-translations-list.html | Refactored to build and sort versionsWithInfo slice upfront; simplified language selection to prefer newest versions with ReadOnline content |
| module/layouts/_partials/functions/get-guide-translations-for-version.html | Changed content validation from WordCount check to manual word counting with 10-word threshold |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| {{/* Count words in Plain to ensure actual content exists, not just translated frontmatter */}} | ||
| {{ $plainContent := trim $page.Plain " \n\r\t" }} | ||
| {{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }} | ||
| {{ $plainWords := split $plainContent " " | len }} |
There was a problem hiding this comment.
The word counting logic has been changed from using Hugo's built-in WordCount to manually splitting by spaces. The new approach split $plainContent " " | len is less reliable than the previous $page.WordCount because it doesn't properly handle multiple consecutive spaces, tabs, or other whitespace characters. This could result in inaccurate word counts. Consider using Hugo's WordCount for more reliable word counting: {{ $hasContent = gt $page.WordCount 10 }}, or if you need to count words specifically from Plain content, use a regex-based approach: {{ $plainWords := len (findRE "\\S+" $plainContent) }}.
| {{ $plainWords := split $plainContent " " | len }} | |
| {{ $plainWords := len (findRE "\\S+" $plainContent) }} |
| {{ $plainContent := trim $page.Plain " \n\r\t" }} | ||
| {{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }} | ||
| {{ $plainWords := split $plainContent " " | len }} | ||
| {{ $hasContent = gt $plainWords 10 }} |
There was a problem hiding this comment.
The magic number 10 for the minimum word count threshold lacks justification. This makes it difficult for future maintainers to understand why this specific threshold was chosen. Consider adding a comment explaining the rationale, such as why 10 words is considered the minimum for "actual content" versus just translated frontmatter, or consider defining this as a named constant at the top of the file for better maintainability.
| {{ $hasContent = gt $plainWords 10 }} | |
| {{/* Minimum word count heuristic: >10 words is treated as real content, not just translated frontmatter or a stub. */}} | |
| {{ $minContentWordCount := 10 }} | |
| {{ $hasContent = gt $plainWords $minContentWordCount }} |
| {{/* We already have this language from a newer version */}} | ||
| {{/* Only replace if current (older) version has ReadOnline content and existing doesn't */}} |
There was a problem hiding this comment.
The comment on lines 63-64 states "We already have this language from a newer version" but doesn't clearly explain the fallback strategy. The logic allows replacing a newer version with an older version if the newer version lacks ReadOnline content. Consider clarifying the comment to explicitly state this is a fallback mechanism: "We already have this language from a newer version, but we may replace it if that newer version lacks ReadOnline content".
| {{/* We already have this language from a newer version */}} | |
| {{/* Only replace if current (older) version has ReadOnline content and existing doesn't */}} | |
| {{/* We already have this language from a newer version, but this block implements a fallback */}} | |
| {{/* Fallback: replace the newer version if this older one has ReadOnline content and the existing one doesn't */}} |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
module/layouts/_partials/functions/get-guide-translations-for-version.html (2)
74-78: JavaScript-style comments will render as literal output text.Lines 74 and 78 use
//comment syntax. Hugo templates require{{/* */}}for comments. These strings will appear verbatim in your HTML output.Proposed fix
- // Look up PDF by normalized lang + {{/* Look up PDF by normalized lang */}} {{ $langKey := lower $lang }} {{ $pdfResource := index $pdfsByLang $langKey }} - // Fallback: check the translation page's own resources + {{/* Fallback: check the translation page's own resources */}}
153-153: Sorting by a field that doesn't exist in the dict.
translationInfo(lines 135–149) never sets a"Version"key, yet line 153 attemptssort $translations "Version". This sort is effectively a no-op and may mask ordering bugs downstream.Either add
"Version" $guideVersiontotranslationInfo, or sort by an existing field like"Weight"or"Date".Proposed fix — add Version to the dict
{{ $translationInfo := dict "Language" $lang "LanguageName" $languageName "Weight" ($langConfig.Weight | default 999) "Title" $title "RelPermalink" $relPermalink "Path" $path "VersionPath" $guideVersionPage.Path "Date" $date + "Version" $guideVersion "Status" (cond (and $hasContent $hasPDF) "published" (cond $hasContent "online-only" (cond $hasPDF "pdf-only" "site-only"))) "ReadOnline" (cond $hasContent true false) "ReadOnlineStub" (cond $hasPage true false) "ReadPDF" (cond $hasPDF true false) "PathPdf" $pdfPath }}
🤖 Fix all issues with AI agents
In `@module/layouts/_partials/functions/get-guide-translations-for-version.html`:
- Around line 94-96: The current word-count logic using split on $plainContent
into $plainWords then len is brittle; replace that with Hugo's countwords
function: keep trimming ($plainContent := trim $page.Plain " \n\r\t") then
compute the word count using countwords ($wordCount := countwords $plainContent)
and set $hasContent by comparing that ($hasContent = gt $wordCount 10) instead
of using $plainWords; update references to $plainWords accordingly.
🧹 Nitpick comments (3)
module/layouts/_partials/functions/get-guide-translations-list.html (3)
39-73: Solid two-phase logic for version-aware fallback.Processing newest-to-oldest and replacing only when an older version has
ReadOnlinecontent while the newer doesn't is exactly the right approach per the PR objectives. This respects independent version translations per the learnings.One minor nit: the explicit boolean comparisons (
eq .ReadOnline true) are verbose. Hugo treats non-nil/non-false values as truthy, so.ReadOnlinesuffices.Optional: simplify boolean checks
- {{ $isUsable := or (eq .ReadOnline true) (eq .ReadPDF true) }} + {{ $isUsable := or .ReadOnline .ReadPDF }} ... - {{ if and (eq .ReadOnline true) (ne $existing.ReadOnline true) }} + {{ if and .ReadOnline (not $existing.ReadOnline) }}
77-92: Defensive fallback for missing Version field.Given that
Versionis always merged at lines 60 or 67, this extraction block should never trigger under normal operation. It's harmless as a safety net, but consider whether it masks upstream bugs—if you ever hit this path, something unexpected occurred.A debug log or comment clarifying "this should never happen" would help future maintainers.
98-98: Single-key sort may produce non-deterministic order for equal weights.Sorting only by
Weightmeans languages with identical weights will appear in arbitrary order across builds. For reproducible output, chain a secondary sort key (e.g.,LanguageName).Hugo's
sortdoesn't support multi-key directly, but you can nest sorts or use a composite key.Example: secondary sort by LanguageName
-{{ $translationsList = sort $translationsList "Weight" "asc" }} +{{/* Sort by LanguageName first, then by Weight - stable sort preserves secondary order */}} +{{ $translationsList = sort $translationsList "LanguageName" "asc" }} +{{ $translationsList = sort $translationsList "Weight" "asc" }}
| {{ $plainContent := trim $page.Plain " \n\r\t" }} | ||
| {{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }} | ||
| {{ $plainWords := split $plainContent " " | len }} | ||
| {{ $hasContent = gt $plainWords 10 }} |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Hugo countwords function documentation
💡 Result:
The Hugo countwords function (alias: strings.CountWords / countwords) returns an integer approximate word count for a given string.
- Syntax: strings.CountWords STRING (alias: countwords).
- Returns: int (approximate number of words).
- Example: {{ "Hugo is a static site generator." | countwords }} → 6.
- Notes: It converts the input to a string and counts words; behavior is similar to the .WordCount page variable. [1][2]
Sources: [1] Hugo docs (strings.CountWords / countwords). [2] Hugo tpl strings package (CountWords).
🌐 Web query:
Hugo split function whitespace handling behaviour
💡 Result:
Short answer: Hugo's split (strings.Split) does not trim whitespace — it splits exactly on the delimiter and preserves surrounding spaces. For example, split "a, b" "," → ["a", " b"]. Use strings.Trim, strings.TrimSpace, or map/pipe to trim items after splitting (e.g. range (split . ",") | apply trim) to remove whitespace. [1][2]
Sources:
- strings.Split (Hugo functions docs). [1]
- strings.TrimSpace (Hugo string functions reference). [2]
Use countwords instead of naïve split-and-length.
The current approach—splitting on single spaces then counting array length—breaks under real-world data: consecutive spaces produce empty strings (inflating the count), and newlines don't split at all. Hugo provides countwords explicitly for this. Use it.
Fix
{{ $plainContent := trim $page.Plain " \n\r\t" }}
- {{ $plainWords := split $plainContent " " | len }}
- {{ $hasContent = gt $plainWords 10 }}
+ {{ $plainWords := countwords $plainContent }}
+ {{ $hasContent = gt $plainWords 10 }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{ $plainContent := trim $page.Plain " \n\r\t" }} | |
| {{ $hasContent = and (gt $page.WordCount 0) (ne $plainContent "") }} | |
| {{ $plainWords := split $plainContent " " | len }} | |
| {{ $hasContent = gt $plainWords 10 }} | |
| {{ $plainContent := trim $page.Plain " \n\r\t" }} | |
| {{ $plainWords := countwords $plainContent }} | |
| {{ $hasContent = gt $plainWords 10 }} |
🤖 Prompt for AI Agents
In `@module/layouts/_partials/functions/get-guide-translations-for-version.html`
around lines 94 - 96, The current word-count logic using split on $plainContent
into $plainWords then len is brittle; replace that with Hugo's countwords
function: keep trimming ($plainContent := trim $page.Plain " \n\r\t") then
compute the word count using countwords ($wordCount := countwords $plainContent)
and set $hasContent by comparing that ($hasContent = gt $wordCount 10) instead
of using $plainWords; update references to $plainWords accordingly.
This pull request refactors the logic for determining the best available translation for each language version of a guide. The changes improve how versions are sorted and selected, ensuring that the most usable translation (with actual content) is chosen, even if it's not the newest version. The update also simplifies and clarifies the code for maintainability.
Guide translation selection improvements:
Translation usability check refinement:
get-guide-translations-for-version.htmlis improved to ensure that only pages with more than 10 words of actual body content (not just frontmatter) are considered usable translations.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.