Skip to content

Commit a3b7cfd

Browse files
Add in-place substitution option for linkchecker.py (#41983)
* Add in-place replacement option for linkchecker.py Add a new flag '-w' to enable an experimental in-place replacement for Markdown links only. * Apply suggestions from code review Use formatted string literals instead of simple concatenation. Co-authored-by: Matt Boersma <[email protected]> * Remove other paths that should not be changed. * Add more logic to remove paths that start with http or paths that are already linking to the localized page (i.e. start with '/<language-code>'). * Apply suggestions from code review Simplify expressions. Co-authored-by: Matt Boersma <[email protected]> * Avoid updating pages in English. * Fix syntax error in set comprehension * Expand on documentation for new -w flag * Update documentation for linkchecker.py in README * Add a blurb with information about the new -w switch that describes what it does and what is the purpose of adding this behaviour change. * Update the previously existing description to match the currently available script flags. --------- Co-authored-by: Matt Boersma <[email protected]>
1 parent 1c28e2d commit a3b7cfd

File tree

2 files changed

+47
-16
lines changed

2 files changed

+47
-16
lines changed

scripts/README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@
77
| `test_examples.sh` | This script tests whether a change affects example files bundled in the website. |
88
| `check-headers-file.sh` | This script checks the headers if you are in a production environment. |
99
| `diff_l10n_branches.py` | This script generates a report of outdated contents in `content/<l10n-lang>` directory by comparing two l10n team milestone branches. |
10-
| `hash-files.sh` | This script emits as hash for the files listed in $@ |
11-
| `linkchecker.py` | This a link checker for Kubernetes documentation website. |
12-
| `lsync.sh` | This script checks if the English version of a page has changed since a localized page has been committed. |
13-
| `replace-capture.sh` | This script sets K8S_WEBSITE in your env to your docs website root or rely on this script to determine it automatically |
14-
| `check-ctrlcode.py` | This script finds control-code(0x00-0x1f) in text files. |
15-
| `ja/verify-spelling.sh` | This script finds Japanese words that are against the guideline. |
10+
| `hash-files.sh` | This script emits as hash for the files listed in $@ |
11+
| `linkchecker.py` | This a link checker for Kubernetes documentation website. |
12+
| `lsync.sh` | This script checks if the English version of a page has changed since a localized page has been committed. |
13+
| `replace-capture.sh` | This script sets K8S_WEBSITE in your env to your docs website root or rely on this script to determine it automatically |
14+
| `check-ctrlcode.py` | This script finds control-code(0x00-0x1f) in text files. |
15+
| `ja/verify-spelling.sh` | This script finds Japanese words that are against the guideline. |
1616

1717

1818

@@ -104,14 +104,17 @@ This script emits as hash for the files listed in $@.
104104
## linkchecker.py
105105

106106
This a link checker for Kubernetes documentation website.
107-
- We cover the following cases for the language you provide via `-l`, which
108-
defaults to 'en'.
109-
- If the language specified is not English (`en`), we check if you are
110-
actually using the localized links. For example, if you specify `zh` as
111-
the language, and for link target `/docs/foo/bar`, we check if the English
112-
version exists AND if the Chinese version exists as well. A checking record
113-
is produced if the link can use the localized version.
114-
107+
- If the language for the files scanned is not English (`en`), we check if you
108+
are actually using the localized links. For example, if you specify a filter
109+
similar to as `content/zh-cn/docs/**/*.md`, we check if the English version
110+
exists AND if the Chinese version exists as well. A checking record is
111+
produced if the link can use the localized version.
112+
- If the language specified is not English (`en`), a checking record is produced,
113+
and the `-w` switch is used, the script will perform in-place substitutions
114+
for links that have the format `/docs` and currently have a localized version
115+
available. This is an experimental feature and aims to reduce the amount of
116+
work required to update links to point to localized content. It currently
117+
works for Markdown files only.
115118
```
116119
117120
Usage: linkchecker.py -h

scripts/linkchecker.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -328,6 +328,7 @@ def check_target(page, anchor, target):
328328
return None
329329
msg = ("Localized page detected, please append '/%s' to the target"
330330
% LANG)
331+
331332
return new_record("ERROR", msg, target)
332333

333334
# taget might be a redirect entry
@@ -390,7 +391,7 @@ def check_apiref_target(target, anchor):
390391
target+"#"+anchor)
391392

392393

393-
def validate_links(page):
394+
def validate_links(page, in_place_edit):
394395
"""Find and validate links on a content page.
395396
396397
The checking records are consolidated into the global variable RESULT.
@@ -410,10 +411,34 @@ def validate_links(page):
410411

411412
matches = regex.findall(content)
412413
records = []
414+
target_records = []
413415
for m in matches:
414416
r = check_target(page, m[0], m[1])
415417
if r:
416418
records.append(r)
419+
target_records.append(m[1])
420+
421+
# if multiple records are the same they need not be checked repeatedly
422+
# remove paths that are not relative too
423+
target_records = {item for item in target_records
424+
if not item.startswith("http") and
425+
not item.startswith(f"/{LANG}")}
426+
427+
# English-language pages don't have "en" in their path
428+
if in_place_edit and target_records and LANG != "en":
429+
updated_data = []
430+
for line in data:
431+
if any(rec in line for rec in target_records):
432+
for rec in target_records:
433+
line = line.replace(
434+
f"({rec})",
435+
# assumes unlocalized links are in "/docs/..." format
436+
f"(/{LANG}{rec})")
437+
updated_data.append(line)
438+
439+
with open(page, "w") as f:
440+
for line in updated_data:
441+
f.write(line)
417442

418443
# searches for pattern: {{< api-reference page="" anchor=""
419444
apiref_re = r"{{ *< *api-reference page=\"([^\"]*?)\" *anchor=\"(.*?)\""
@@ -455,6 +480,9 @@ def parse_arguments():
455480
metavar="<FILTER>",
456481
help=("File pattern to scan. "
457482
"(default='content/en/docs/**/*.md')"))
483+
PARSER.add_argument("-w", dest="in_place_edit", action="store_true",
484+
help="[EXPERIMENTAL] Turns on in-place replacement "
485+
"for localized content.")
458486

459487
return PARSER.parse_args()
460488

@@ -500,7 +528,7 @@ def main():
500528

501529
folders = [f for f in glob.glob(ARGS.filter, recursive=True)]
502530
for page in folders:
503-
validate_links(page)
531+
validate_links(page, ARGS.in_place_edit)
504532

505533
dump_result()
506534

0 commit comments

Comments
 (0)