Skip to content

Commit 448df54

Browse files
authored
Merge pull request #27315 from s-kawamura-w664/patch-script
Add script for detecting bad characters.
2 parents e29d5bc + 7fde042 commit 448df54

File tree

2 files changed

+80
-0
lines changed

2 files changed

+80
-0
lines changed

scripts/README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
| `linkchecker.py` | This a link checker for Kubernetes documentation website. |
1212
| `lsync.sh` | This script checks if the English version of a page has changed since a localized page has been committed. |
1313
| `replace-capture.sh` | This script sets K8S_WEBSITE in your env to your docs website root or rely on this script to determine it automatically |
14+
| `check-ctrlcode.py` | This script finds control-code(0x00-0x1f) in text files. |
1415

1516

1617

@@ -152,3 +153,28 @@ The following command checks a subdirectory:
152153

153154
./scripts/lsync.sh content/zh/docs/concepts/
154155

156+
## check-ctrlcode.py
157+
158+
This script finds control-code(0x00-0x1f) in text files.
159+
It will display illegal character in browser.
160+
161+
```
162+
Usage: ./check-ctrlcode.py <dir> <ext>
163+
164+
<dir> Specify the directory to check.
165+
<ext> Specify the extension to check.
166+
167+
For example, we can execute as following.
168+
169+
./check-ctrlcode.py ../content/en/ .md
170+
171+
The output is following format.
172+
173+
"{0} <L{1}:{2}:{3}>: {4}"
174+
175+
{0} : The path of file that a control-code exists.
176+
{1} : The line number that a control-code exists.
177+
{2} : The column number that a control-code exists.
178+
{3} : The found control-code.
179+
{4} : The one-line strings in the file.
180+
```

scripts/check-ctrlcode.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
#!/usr/bin/env python3
2+
3+
import os
4+
import sys
5+
import re
6+
7+
def main():
8+
args = sys.argv
9+
if (len(args) != 3):
10+
print("Usage: ./check-ctrlcode.py <dir> <ext>")
11+
sys.exit(1)
12+
13+
dirpath = args[1]
14+
ext = args[2]
15+
16+
fullpath = os.path.abspath(dirpath)
17+
if (os.path.isdir(fullpath) is not True):
18+
print("Directory not found.")
19+
sys.exit(1)
20+
21+
check_dir(fullpath, ext)
22+
23+
def check_dir(path, ext):
24+
for f in os.listdir(path):
25+
if(f[0] == "."):
26+
continue
27+
fullpath = os.path.join(path, f)
28+
if(os.path.isdir(fullpath)):
29+
check_dir(fullpath, ext)
30+
continue
31+
exts = os.path.splitext(f)
32+
if(exts[1] != ext):
33+
continue
34+
check_ctrlcode(fullpath)
35+
36+
def check_ctrlcode(filepath):
37+
line = 0
38+
with open(filepath, encoding='utf-8') as f:
39+
while True:
40+
str = f.readline()
41+
if(str == ""):
42+
break
43+
line = line + 1
44+
# check 0x00-0x1f except 0x09(HT), 0x0a(LF), 0x0d(CR)
45+
pattern = re.compile('[\u0000-\u0008\u000b\u000c\u000e-\u001f]')
46+
m = pattern.search(str)
47+
if(m == None):
48+
continue
49+
pos = m.end()
50+
ctrl = m.group().encode("utf-8")
51+
print("{0} <L{1}:{2}:{3}>: {4}\n".format(filepath, line, pos, ctrl, str.replace('\n','')))
52+
53+
54+
main()

0 commit comments

Comments
 (0)