Skip to content

Commit 4dbe2db

Browse files
committed
[clang][analyzer] Improved documentation for TaintPropagation Checker
The usage of the taint analysis is described through a command injection attack example. It is explained how to make a variable sanitized through configuration. Differential Revision: https://reviews.llvm.org/D145229
1 parent 3d83912 commit 4dbe2db

File tree

1 file changed

+215
-35
lines changed

1 file changed

+215
-35
lines changed

clang/docs/analyzer/checkers.rst

Lines changed: 215 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -2359,64 +2359,244 @@ pointer. These functions include: getenv, localeconv, asctime, setlocale, strerr
23592359
alpha.security.taint
23602360
^^^^^^^^^^^^^^^^^^^^
23612361
2362-
Checkers implementing `taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
2362+
Checkers implementing
2363+
`taint analysis <https://en.wikipedia.org/wiki/Taint_checking>`_.
23632364
23642365
.. _alpha-security-taint-TaintPropagation:
23652366
23662367
alpha.security.taint.TaintPropagation (C, C++)
23672368
""""""""""""""""""""""""""""""""""""""""""""""
23682369
2369-
Taint analysis identifies untrusted sources of information (taint sources), rules as to how the untrusted data flows along the execution path (propagation rules), and points of execution where the use of tainted data is risky (taints sinks).
2370+
Taint analysis identifies potential security vulnerabilities where the
2371+
attacker can inject malicious data to the program to execute an attack
2372+
(privilege escalation, command injection, SQL injection etc.).
2373+
2374+
The malicious data is injected at the taint source (e.g. ``getenv()`` call)
2375+
which is then propagated through function calls and being used as arguments of
2376+
sensitive operations, also called as taint sinks (e.g. ``system()`` call).
2377+
2378+
One can defend agains this type of vulnerability by always checking and
2379+
santizing the potentially malicious, untrusted user input.
2380+
2381+
The goal of the checker is to discover and show to the user these potential
2382+
taint source-sink pairs and the propagation call chain.
2383+
23702384
The most notable examples of taint sources are:
23712385
2372-
- network originating data
2386+
- data from network
2387+
- files or standard input
23732388
- environment variables
2374-
- database originating data
2389+
- data from databases
23752390
2376-
``GenericTaintChecker`` is the main implementation checker for this rule, and it generates taint information used by other checkers.
2391+
Let us examine a practical example of a Command Injection attack.
23772392
23782393
.. code-block:: c
23792394
2380-
void test() {
2381-
char x = getchar(); // 'x' marked as tainted
2382-
system(&x); // warn: untrusted data is passed to a system call
2383-
}
2395+
// Command Injection Vulnerability Example
2396+
int main(int argc, char** argv) {
2397+
char cmd[2048] = "/bin/cat ";
2398+
char filename[1024];
2399+
printf("Filename:");
2400+
scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
2401+
strcat(cmd, filename);
2402+
system(cmd); // Warning: Untrusted data is passed to a system call
2403+
}
23842404
2385-
// note: compiler internally checks if the second param to
2386-
// sprintf is a string literal or not.
2387-
// Use -Wno-format-security to suppress compiler warning.
2388-
void test() {
2389-
char s[10], buf[10];
2390-
fscanf(stdin, "%s", s); // 's' marked as tainted
2405+
The program prints the content of any user specified file.
2406+
Unfortunately the attacker can execute arbitrary commands
2407+
with shell escapes. For example with the following input the `ls` command is also
2408+
executed after the contents of `/etc/shadow` is printed.
2409+
`Input: /etc/shadow ; ls /`
23912410
2392-
sprintf(buf, s); // warn: untrusted data as a format string
2393-
}
2411+
The analysis implemented in this checker points out this problem.
23942412
2395-
void test() {
2396-
size_t ts;
2397-
scanf("%zd", &ts); // 'ts' marked as tainted
2398-
int *p = (int *)malloc(ts * sizeof(int));
2399-
// warn: untrusted data as buffer size
2400-
}
2413+
One can protect against such attack by for example checking if the provided
2414+
input refers to a valid file and removing any invalid user input.
2415+
2416+
.. code-block:: c
2417+
2418+
// No vulnerability anymore, but we still get the warning
2419+
void sanitizeFileName(char* filename){
2420+
if (access(filename,F_OK)){// Verifying user input
2421+
printf("File does not exist\n");
2422+
filename[0]='\0';
2423+
}
2424+
}
2425+
int main(int argc, char** argv) {
2426+
char cmd[2048] = "/bin/cat ";
2427+
char filename[1024];
2428+
printf("Filename:");
2429+
scanf (" %1023[^\n]", filename); // The attacker can inject a shell escape here
2430+
sanitizeFileName(filename);// filename is safe after this point
2431+
if (!filename[0])
2432+
return -1;
2433+
strcat(cmd, filename);
2434+
system(cmd); // Superflous Warning: Untrusted data is passed to a system call
2435+
}
2436+
2437+
Unfortunately, the checker cannot discover automatically that the programmer
2438+
have performed data sanitation, so it still emits the warning.
24012439
2402-
There are built-in sources, propagations and sinks defined in code inside ``GenericTaintChecker``.
2403-
These operations are handled even if no external taint configuration is provided.
2440+
One can get rid of this superflous warning by telling by specifying the
2441+
sanitation functions in the taint configuation file (see
2442+
:doc:`user-docs/TaintAnalysisConfiguration`).
24042443
2405-
Default sources defined by ``GenericTaintChecker``:
2406-
``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``, ``getch``, ``getchar``, ``getchar_unlocked``, ``getwd``, ``getcwd``, ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``, ``gets``, ``gets_s``, ``getseuserbyname``, ``readlink``, ``readlinkat``, ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
2444+
.. code-block:: YAML
24072445
2408-
Default propagations defined by ``GenericTaintChecker``:
2409-
``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``, ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``, ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``, ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``, ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``, ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``, ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``, ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``, ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``, ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``, ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
2446+
Filters:
2447+
- Name: sanitizeFileName
2448+
Args: [0]
24102449
2411-
Default sinks defined in ``GenericTaintChecker``:
2412-
``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``, ``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``, ``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``, ``alloca``, ``memccpy``, ``realloc``, ``bcopy``
2450+
The clang invocation to pass the configuration file location:
2451+
2452+
.. code-block:: bash
2453+
2454+
clang --analyze -Xclang -analyzer-config -Xclang alpha.security.taint.TaintPropagation:Config=`pwd`/taint_config.yml ...
2455+
2456+
If you are validating your inputs instead of sanitizing them, or don't want to
2457+
mention each sanitizing function in our configuration,
2458+
you can use a more generic approach.
2459+
2460+
Introduce a generic no-op `csa_mark_sanitized(..)` function to
2461+
tell the Clang Static Analyzer
2462+
that the variable is safe to be used on that analysis path.
2463+
2464+
.. code-block:: c
24132465
2414-
The user can configure taint sources, sinks, and propagation rules by providing a configuration file via checker option ``alpha.security.taint.TaintPropagation:Config``.
2466+
// Marking sanitized variables safe.
2467+
// No vulnerability anymore, no warning.
24152468
2416-
External taint configuration is in `YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The taint-related options defined in the config file extend but do not override the built-in sources, rules, sinks.
2417-
The format of the external taint configuration file is not stable, and could change without any notice even in a non-backward compatible way.
2469+
// User csa_mark_sanitize function is for the analyzer only
2470+
#ifdef __clang_analyzer__
2471+
void csa_mark_sanitized(const void *);
2472+
#endif
2473+
2474+
int main(int argc, char** argv) {
2475+
char cmd[2048] = "/bin/cat ";
2476+
char filename[1024];
2477+
printf("Filename:");
2478+
scanf (" %1023[^\n]", filename);
2479+
if (access(filename,F_OK)){// Verifying user input
2480+
printf("File does not exist\n");
2481+
return -1;
2482+
}
2483+
#ifdef __clang_analyzer__
2484+
csa_mark_sanitized(filename); // Indicating to CSA that filename variable is safe to be used after this point
2485+
#endif
2486+
strcat(cmd, filename);
2487+
system(cmd); // No warning
2488+
}
2489+
2490+
Similarly to the previous example, you need to
2491+
define a `Filter` function in a `YAML` configuration file
2492+
and add the `csa_mark_sanitized` function.
2493+
2494+
.. code-block:: YAML
2495+
2496+
Filters:
2497+
- Name: csa_mark_sanitized
2498+
Args: [0]
2499+
2500+
Then calling `csa_mark_sanitized(X)` will tell the analyzer that `X` is safe to
2501+
be used after this point, because its contents are verified. It is the
2502+
responisibility of the programmer to ensure that this verification was indeed
2503+
correct. Please note that `csa_mark_sanitized` function is only declared and
2504+
used during Clang Static Analysis and skipped in (production) builds.
2505+
2506+
Further examples of injection vulnerabilities this checker can find.
2507+
2508+
.. code-block:: c
2509+
2510+
void test() {
2511+
char x = getchar(); // 'x' marked as tainted
2512+
system(&x); // warn: untrusted data is passed to a system call
2513+
}
2514+
2515+
// note: compiler internally checks if the second param to
2516+
// sprintf is a string literal or not.
2517+
// Use -Wno-format-security to suppress compiler warning.
2518+
void test() {
2519+
char s[10], buf[10];
2520+
fscanf(stdin, "%s", s); // 's' marked as tainted
2521+
2522+
sprintf(buf, s); // warn: untrusted data used as a format string
2523+
}
2524+
2525+
void test() {
2526+
size_t ts;
2527+
scanf("%zd", &ts); // 'ts' marked as tainted
2528+
int *p = (int *)malloc(ts * sizeof(int));
2529+
// warn: untrusted data used as buffer size
2530+
}
2531+
2532+
There are built-in sources, propagations and sinks even if no external taint
2533+
configuration is provided.
2534+
2535+
Default sources:
2536+
``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``,
2537+
``getch``, ``getchar``, ``getchar_unlocked``, ``getwd``, ``getcwd``,
2538+
``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``,
2539+
``gets``, ``gets_s``, ``getseuserbyname``, ``readlink``, ``readlinkat``,
2540+
``scanf``, ``scanf_s``, ``socket``, ``wgetch``
2541+
2542+
Default propagations rules:
2543+
``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``,
2544+
``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``,
2545+
``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``,
2546+
``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``,
2547+
``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``,
2548+
``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``,
2549+
``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
2550+
``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
2551+
``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
2552+
``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
2553+
``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
2554+
``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
2555+
``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
2556+
2557+
Default sinks:
2558+
``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``,
2559+
``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``,
2560+
``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``,
2561+
``alloca``, ``memccpy``, ``realloc``, ``bcopy``
2562+
2563+
Please note that there are no built-in filter functions.
2564+
2565+
One can configure their own taint sources, sinks, and propagation rules by
2566+
providing a configuration file via checker option
2567+
``alpha.security.taint.TaintPropagation:Config``. The configuration file is in
2568+
`YAML <http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The
2569+
taint-related options defined in the config file extend but do not override the
2570+
built-in sources, rules, sinks. The format of the external taint configuration
2571+
file is not stable, and could change without any notice even in a non-backward
2572+
compatible way.
2573+
2574+
For a more detailed description of configuration options, please see the
2575+
:doc:`user-docs/TaintAnalysisConfiguration`. For an example see
2576+
:ref:`clangsa-taint-configuration-example`.
2577+
2578+
**Configuration**
2579+
2580+
* `Config` Specifies the name of the YAML configuration file. The user can
2581+
define their own taint sources and sinks.
2582+
2583+
**Related Guidelines**
2584+
2585+
* `CWE Data Neutralization Issues
2586+
<https://cwe.mitre.org/data/definitions/137.html>`_
2587+
* `SEI Cert STR02-C. Sanitize data passed to complex subsystems
2588+
<https://wiki.sei.cmu.edu/confluence/display/c/STR02-C.+Sanitize+data+passed+to+complex+subsystems>`_
2589+
* `SEI Cert ENV33-C. Do not call system()
2590+
<https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152177>`_
2591+
* `ENV03-C. Sanitize the environment when invoking external programs
2592+
<https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanitize+the+environment+when+invoking+external+programs>`_
2593+
2594+
**Limitations**
24182595
2419-
For a more detailed description of configuration options, please see the :doc:`user-docs/TaintAnalysisConfiguration`. For an example see :ref:`clangsa-taint-configuration-example`.
2596+
* The taintedness property is not propagated through function calls which are
2597+
unknown (or too complex) to the analyzer, unless there is a specific
2598+
propagation rule built-in to the checker or given in the YAML configuration
2599+
file. This causes potential true positive findings to be lost.
24202600
24212601
alpha.unix
24222602
^^^^^^^^^^^

0 commit comments

Comments
 (0)