Skip to content

Commit 5779dc4

Browse files
committed
Merge #13041: build: Add linter checking for accidental introduction of locale dependence
698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift) 0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift) Pull request description: This linter will check for code accidentally introducing locale dependencies. Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible. Context: bitcoin/bitcoin#12881 (comment) Example output: ``` $ contrib/devtools/lint-locale-dependence.sh The locale dependent function tolower(...) appears to be used: src/init.cpp: if (s[0] == '0' && std::tolower(s[1]) == 'x') { Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. Please avoid using locale dependent functions if possible. Advice not applicable in this specific case? Add an exception by updating the ignore list in contrib/devtools/lint-locale-dependence.sh ``` **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed? Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
2 parents e4082d5 + 698cfd0 commit 5779dc4

File tree

2 files changed

+258
-1
lines changed

2 files changed

+258
-1
lines changed

doc/developer-notes.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,35 @@ Strings and formatting
499499
500500
- Use `ParseInt32`, `ParseInt64`, `ParseUInt32`, `ParseUInt64`, `ParseDouble` from `utilstrencodings.h` for number parsing
501501
502-
- *Rationale*: These functions do overflow checking, and avoid pesky locale issues
502+
- *Rationale*: These functions do overflow checking, and avoid pesky locale issues.
503+
504+
- Avoid using locale dependent functions if possible. You can use the provided
505+
[`lint-locale-dependence.sh`](/contrib/devtools/lint-locale-dependence.sh)
506+
to check for accidental use of locale dependent functions.
507+
508+
- *Rationale*: Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix.
509+
510+
- These functions are known to be locale dependent:
511+
`alphasort`, `asctime`, `asprintf`, `atof`, `atoi`, `atol`, `atoll`, `atoq`,
512+
`btowc`, `ctime`, `dprintf`, `fgetwc`, `fgetws`, `fprintf`, `fputwc`,
513+
`fputws`, `fscanf`, `fwprintf`, `getdate`, `getwc`, `getwchar`, `isalnum`,
514+
`isalpha`, `isblank`, `iscntrl`, `isdigit`, `isgraph`, `islower`, `isprint`,
515+
`ispunct`, `isspace`, `isupper`, `iswalnum`, `iswalpha`, `iswblank`,
516+
`iswcntrl`, `iswctype`, `iswdigit`, `iswgraph`, `iswlower`, `iswprint`,
517+
`iswpunct`, `iswspace`, `iswupper`, `iswxdigit`, `isxdigit`, `mblen`,
518+
`mbrlen`, `mbrtowc`, `mbsinit`, `mbsnrtowcs`, `mbsrtowcs`, `mbstowcs`,
519+
`mbtowc`, `mktime`, `putwc`, `putwchar`, `scanf`, `snprintf`, `sprintf`,
520+
`sscanf`, `stoi`, `stol`, `stoll`, `strcasecmp`, `strcasestr`, `strcoll`,
521+
`strfmon`, `strftime`, `strncasecmp`, `strptime`, `strtod`, `strtof`,
522+
`strtoimax`, `strtol`, `strtold`, `strtoll`, `strtoq`, `strtoul`,
523+
`strtoull`, `strtoumax`, `strtouq`, `strxfrm`, `swprintf`, `tolower`,
524+
`toupper`, `towctrans`, `towlower`, `towupper`, `ungetwc`, `vasprintf`,
525+
`vdprintf`, `versionsort`, `vfprintf`, `vfscanf`, `vfwprintf`, `vprintf`,
526+
`vscanf`, `vsnprintf`, `vsprintf`, `vsscanf`, `vswprintf`, `vwprintf`,
527+
`wcrtomb`, `wcscasecmp`, `wcscoll`, `wcsftime`, `wcsncasecmp`, `wcsnrtombs`,
528+
`wcsrtombs`, `wcstod`, `wcstof`, `wcstoimax`, `wcstol`, `wcstold`,
529+
`wcstoll`, `wcstombs`, `wcstoul`, `wcstoull`, `wcstoumax`, `wcswidth`,
530+
`wcsxfrm`, `wctob`, `wctomb`, `wctrans`, `wctype`, `wcwidth`, `wprintf`
503531
504532
- For `strprintf`, `LogPrint`, `LogPrintf` formatting characters don't need size specifiers
505533

test/lint/lint-locale-dependence.sh

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
#!/bin/bash
2+
3+
KNOWN_VIOLATIONS=(
4+
"src/base58.cpp:.*isspace"
5+
"src/bitcoin-tx.cpp.*stoul"
6+
"src/bitcoin-tx.cpp.*trim_right"
7+
"src/bitcoin-tx.cpp:.*atoi"
8+
"src/core_read.cpp.*is_digit"
9+
"src/dbwrapper.cpp.*stoul"
10+
"src/dbwrapper.cpp:.*vsnprintf"
11+
"src/httprpc.cpp.*trim"
12+
"src/init.cpp:.*atoi"
13+
"src/netbase.cpp.*to_lower"
14+
"src/qt/rpcconsole.cpp:.*atoi"
15+
"src/qt/rpcconsole.cpp:.*isdigit"
16+
"src/rest.cpp:.*strtol"
17+
"src/rpc/server.cpp.*to_upper"
18+
"src/test/dbwrapper_tests.cpp:.*snprintf"
19+
"src/test/getarg_tests.cpp.*split"
20+
"src/torcontrol.cpp:.*atoi"
21+
"src/torcontrol.cpp:.*strtol"
22+
"src/uint256.cpp:.*isspace"
23+
"src/uint256.cpp:.*tolower"
24+
"src/util.cpp:.*atoi"
25+
"src/util.cpp:.*fprintf"
26+
"src/util.cpp:.*tolower"
27+
"src/utilmoneystr.cpp:.*isdigit"
28+
"src/utilmoneystr.cpp:.*isspace"
29+
"src/utilstrencodings.cpp:.*atoi"
30+
"src/utilstrencodings.cpp:.*isspace"
31+
"src/utilstrencodings.cpp:.*strtol"
32+
"src/utilstrencodings.cpp:.*strtoll"
33+
"src/utilstrencodings.cpp:.*strtoul"
34+
"src/utilstrencodings.cpp:.*strtoull"
35+
"src/utilstrencodings.h:.*atoi"
36+
)
37+
38+
REGEXP_IGNORE_EXTERNAL_DEPENDENCIES="^src/(crypto/ctaes/|leveldb/|secp256k1/|tinyformat.h|univalue/)"
39+
40+
LOCALE_DEPENDENT_FUNCTIONS=(
41+
alphasort # LC_COLLATE (via strcoll)
42+
asctime # LC_TIME (directly)
43+
asprintf # (via vasprintf)
44+
atof # LC_NUMERIC (via strtod)
45+
atoi # LC_NUMERIC (via strtol)
46+
atol # LC_NUMERIC (via strtol)
47+
atoll # (via strtoll)
48+
atoq
49+
btowc # LC_CTYPE (directly)
50+
ctime # (via asctime or localtime)
51+
dprintf # (via vdprintf)
52+
fgetwc
53+
fgetws
54+
fold_case # boost::locale::fold_case
55+
fprintf # (via vfprintf)
56+
fputwc
57+
fputws
58+
fscanf # (via __vfscanf)
59+
fwprintf # (via __vfwprintf)
60+
getdate # via __getdate_r => isspace // __localtime_r
61+
getwc
62+
getwchar
63+
is_digit # boost::algorithm::is_digit
64+
is_space # boost::algorithm::is_space
65+
isalnum # LC_CTYPE
66+
isalpha # LC_CTYPE
67+
isblank # LC_CTYPE
68+
iscntrl # LC_CTYPE
69+
isctype # LC_CTYPE
70+
isdigit # LC_CTYPE
71+
isgraph # LC_CTYPE
72+
islower # LC_CTYPE
73+
isprint # LC_CTYPE
74+
ispunct # LC_CTYPE
75+
isspace # LC_CTYPE
76+
isupper # LC_CTYPE
77+
iswalnum # LC_CTYPE
78+
iswalpha # LC_CTYPE
79+
iswblank # LC_CTYPE
80+
iswcntrl # LC_CTYPE
81+
iswctype # LC_CTYPE
82+
iswdigit # LC_CTYPE
83+
iswgraph # LC_CTYPE
84+
iswlower # LC_CTYPE
85+
iswprint # LC_CTYPE
86+
iswpunct # LC_CTYPE
87+
iswspace # LC_CTYPE
88+
iswupper # LC_CTYPE
89+
iswxdigit # LC_CTYPE
90+
isxdigit # LC_CTYPE
91+
localeconv # LC_NUMERIC + LC_MONETARY
92+
mblen # LC_CTYPE
93+
mbrlen
94+
mbrtowc
95+
mbsinit
96+
mbsnrtowcs
97+
mbsrtowcs
98+
mbstowcs # LC_CTYPE
99+
mbtowc # LC_CTYPE
100+
mktime
101+
normalize # boost::locale::normalize
102+
# printf # LC_NUMERIC
103+
putwc
104+
putwchar
105+
scanf # LC_NUMERIC
106+
setlocale
107+
snprintf
108+
sprintf
109+
sscanf
110+
stod
111+
stof
112+
stoi
113+
stol
114+
stold
115+
stoll
116+
stoul
117+
stoull
118+
strcasecmp
119+
strcasestr
120+
strcoll # LC_COLLATE
121+
# strerror
122+
strfmon
123+
strftime # LC_TIME
124+
strncasecmp
125+
strptime
126+
strtod # LC_NUMERIC
127+
strtof
128+
strtoimax
129+
strtol # LC_NUMERIC
130+
strtold
131+
strtoll
132+
strtoq
133+
strtoul # LC_NUMERIC
134+
strtoull
135+
strtoumax
136+
strtouq
137+
strxfrm # LC_COLLATE
138+
swprintf
139+
to_lower # boost::locale::to_lower
140+
to_title # boost::locale::to_title
141+
to_upper # boost::locale::to_upper
142+
tolower # LC_CTYPE
143+
toupper # LC_CTYPE
144+
towctrans
145+
towlower # LC_CTYPE
146+
towupper # LC_CTYPE
147+
trim # boost::algorithm::trim
148+
trim_left # boost::algorithm::trim_left
149+
trim_right # boost::algorithm::trim_right
150+
ungetwc
151+
vasprintf
152+
vdprintf
153+
versionsort
154+
vfprintf
155+
vfscanf
156+
vfwprintf
157+
vprintf
158+
vscanf
159+
vsnprintf
160+
vsprintf
161+
vsscanf
162+
vswprintf
163+
vwprintf
164+
wcrtomb
165+
wcscasecmp
166+
wcscoll # LC_COLLATE
167+
wcsftime # LC_TIME
168+
wcsncasecmp
169+
wcsnrtombs
170+
wcsrtombs
171+
wcstod # LC_NUMERIC
172+
wcstof
173+
wcstoimax
174+
wcstol # LC_NUMERIC
175+
wcstold
176+
wcstoll
177+
wcstombs # LC_CTYPE
178+
wcstoul # LC_NUMERIC
179+
wcstoull
180+
wcstoumax
181+
wcswidth
182+
wcsxfrm # LC_COLLATE
183+
wctob
184+
wctomb # LC_CTYPE
185+
wctrans
186+
wctype
187+
wcwidth
188+
wprintf
189+
)
190+
191+
function join_array {
192+
local IFS="$1"
193+
shift
194+
echo "$*"
195+
}
196+
197+
REGEXP_IGNORE_KNOWN_VIOLATIONS=$(join_array "|" "${KNOWN_VIOLATIONS[@]}")
198+
199+
# Invoke "git grep" only once in order to minimize run-time
200+
REGEXP_LOCALE_DEPENDENT_FUNCTIONS=$(join_array "|" "${LOCALE_DEPENDENT_FUNCTIONS[@]}")
201+
GIT_GREP_OUTPUT=$(git grep -E "[^a-zA-Z0-9_\`'\"<>](${REGEXP_LOCALE_DEPENDENT_FUNCTIONS}(|_r|_s))[^a-zA-Z0-9_\`'\"<>]" -- "*.cpp" "*.h")
202+
203+
EXIT_CODE=0
204+
for LOCALE_DEPENDENT_FUNCTION in "${LOCALE_DEPENDENT_FUNCTIONS[@]}"; do
205+
MATCHES=$(grep -E "[^a-zA-Z0-9_\`'\"<>]${LOCALE_DEPENDENT_FUNCTION}(|_r|_s)[^a-zA-Z0-9_\`'\"<>]" <<< "${GIT_GREP_OUTPUT}" | \
206+
grep -vE "\.(c|cpp|h):\s*(//|\*|/\*|\").*${LOCALE_DEPENDENT_FUNCTION}" | \
207+
grep -vE 'fprintf\(.*(stdout|stderr)')
208+
if [[ ${REGEXP_IGNORE_EXTERNAL_DEPENDENCIES} != "" ]]; then
209+
MATCHES=$(grep -vE "${REGEXP_IGNORE_EXTERNAL_DEPENDENCIES}" <<< "${MATCHES}")
210+
fi
211+
if [[ ${REGEXP_IGNORE_KNOWN_VIOLATIONS} != "" ]]; then
212+
MATCHES=$(grep -vE "${REGEXP_IGNORE_KNOWN_VIOLATIONS}" <<< "${MATCHES}")
213+
fi
214+
if [[ ${MATCHES} != "" ]]; then
215+
echo "The locale dependent function ${LOCALE_DEPENDENT_FUNCTION}(...) appears to be used:"
216+
echo "${MATCHES}"
217+
echo
218+
EXIT_CODE=1
219+
fi
220+
done
221+
if [[ ${EXIT_CODE} != 0 ]]; then
222+
echo "Unnecessary locale dependence can cause bugs that are very"
223+
echo "tricky to isolate and fix. Please avoid using locale dependent"
224+
echo "functions if possible."
225+
echo
226+
echo "Advice not applicable in this specific case? Add an exception"
227+
echo "by updating the ignore list in $0"
228+
fi
229+
exit ${EXIT_CODE}

0 commit comments

Comments
 (0)