Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,10 @@ jobs:
run: cd build && make -j3

- name: Test
run: cd build && LLVM_PROFILE_FILE="coverage-%m.profraw" ctest -j1 --output-on-failure
run: |
cd build
LLVM_PROFILE_FILE="coverage-%m.profraw" ctest -j1 --output-on-failure
LLVM_PROFILE_FILE="coverage-%m.profraw" srcdir=.. pcre2test=./pcre2test ../RunTest -malloc

- name: Report
run: |
Expand Down
11 changes: 10 additions & 1 deletion RunTest
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ arg16=
arg32=
nojit=
bigstack=
malloc=
sim=
skip=
valgrind=
Expand Down Expand Up @@ -294,6 +295,7 @@ while [ $# -gt 0 ] ; do
-16) arg16=yes;;
-32) arg32=yes;;
bigstack|-bigstack) bigstack=yes;;
malloc|-malloc) malloc=yes;;
nojit|-nojit) nojit=yes;;
sim|-sim) shift; sim=$1;;
valgrind|-valgrind) valgrind="valgrind --tool=memcheck -q --smc-check=all-non-file --error-exitcode=70";;
Expand Down Expand Up @@ -350,6 +352,13 @@ else
setstack=""
fi

# If the malloc option is given, then call pcre2test with -malloc.

if [ "$malloc" != "" ] ; then
# XXX RENAME setstack to somthing like "extraoptions"
setstack="$setstack -malloc"
fi

# All of 8-bit, 16-bit, and 32-bit character strings may be supported, but only
# one need be.

Expand Down Expand Up @@ -558,7 +567,7 @@ for bmode in "$test8" "$test16" "$test32"; do
$sim $valgrind ${opt:+$vjs} $pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry
saverc=$?
if [ $saverc = 0 ] ; then
$sim $valgrind ${opt:+$vjs} $pcre2test -q $bmode $opt -error -80,-62,-2,-1,0,100,101,191,300 >>testtry
$sim $valgrind ${opt:+$vjs} $pcre2test -q $setstack $bmode $opt -error -80,-62,-2,-1,0,100,101,191,300 >>testtry
checkresult $? 2 "$opt"
else
checkresult $saverc 2 "$opt"
Expand Down
6 changes: 6 additions & 0 deletions doc/html/pcre2test.html
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,12 @@ <h1>pcre2test man page</h1>
the total times for all compiles and matches are output.
</p>
<p>
<b>-malloc</b>
Exercise malloc() failures, by first counting the number of calls made to malloc
during pattern compilation and matching, then re-running the compilation and
matching that many times, exercising a failure of each malloc() call.
</p>
<p>
<b>-version</b>
Output the PCRE2 version number and then exit.
</p>
Expand Down
5 changes: 5 additions & 0 deletions doc/pcre2test.1
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,11 @@ compile phase.
These behave like \fB-t\fP and \fB-tm\fP, but in addition, at the end of a run,
the total times for all compiles and matches are output.
.TP 10
\fB-malloc\fP
Exercise malloc() failures, by first counting the number of calls made to malloc
during pattern compilation and matching, then re-running the compilation and
matching that many times, exercising a failure of each malloc() call.
.TP 10
\fB-version\fP
Output the PCRE2 version number and then exit.
.
Expand Down
1,129 changes: 567 additions & 562 deletions doc/pcre2test.txt

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions src/pcre2_jit_compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -8943,6 +8943,7 @@ while (1)

has_vreverse = (*ccbegin == OP_VREVERSE);
if (*ccbegin == OP_REVERSE || has_vreverse)
// XXX what if error?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice patch!

Did you checked this by returning a NULL?

The general concept is, that if an error occurs during JIT compilation, the error is set in the compiler, and all emit instruction commands are ignored. This way we don't need to check the returned value for each function. If a structure is allocated, and NULL is returned, we set memory error in the compiler and return without accessing the members of the structure. Most structures are only used once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the tests should have exercised this path (since we fail each malloc systematically).

I wasn't going to examine the JIT code... but then I had to, because of the segfault I triggered.

I realised the general error-handling strategy, which seems sensible. I built a call graph of all the JIT functions. Then I tried to identify all the ones which could "fail" (hard to determine, since the "failing" functions can be void-returning). Then I worked up the call graph to see if I could prove that either: the caller of a "failing" function has an error-check; or alternatively that all the actions it takes are safe even if the previous call failed.

The "XXX" comments I left are for the cases I wasn't able to understand or verify.

For example, here, if compile_reverse_matchingpath fails due to malloc, is it safe to call compile_matchingpath below (with a NULL ccbegin)? I don't understand the code well enough to be confident either way.

Copy link
Collaborator

@zherczeg zherczeg Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ccbegin cannot be NULL, since it points to something inside the (already alloc'ed) PCRE byte code. In theory we just generate code as usual, but nothing happens. This is very inefficient, but also a highly unlikely case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_reverse_matchingpath does return NULL when malloc fails. This would assign NULL to ccbegin. And compile_matchingpath will derefence that and crash.

In my head, there is a potential problem here, but the tests didn't exercise it, so I don't know whether it's real.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might talk about something different.

https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_compile.c#L8759

return cc cannot return NULL. I don't see any other return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PUSH_BACKTRACK is a return.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that is a misuse of the macro. It should return with cc + 1 + 2 * IMM2_SIZE to simply skip the construct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have pushed a speculative fix to change the return to cc+1+2*IMM2_SIZE. I can exercise the problem by adding a "return NULL" underneath the problematic PUSH_BACKTRACK: if that specific thing fails then I get a segfault.

I hope it's OK that none of the other uses of PUSH_BACKTRACK do the same (they all return NULL). None of the other return values appear to be used at their call sites though, so I think the others are OK.

ccbegin = compile_reverse_matchingpath(common, ccbegin, &altbacktrack);

compile_matchingpath(common, ccbegin, cc, &altbacktrack);
Expand Down Expand Up @@ -9714,6 +9715,7 @@ else if (opcode == OP_ASSERTBACK_NA && PRIVATE_DATA(ccbegin + 1))

has_vreverse = (*matchingpath == OP_VREVERSE);
if (*matchingpath == OP_REVERSE || has_vreverse)
// XXX what if error?
matchingpath = compile_reverse_matchingpath(common, matchingpath, backtrack);
}
else if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA || opcode == OP_SCRIPT_RUN || opcode == OP_SBRA || opcode == OP_SCOND)
Expand All @@ -9725,6 +9727,7 @@ else if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA || opcode == OP_SC
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), TMP2, 0);

if (*matchingpath == OP_REVERSE)
// XXX what if error?
matchingpath = compile_reverse_matchingpath(common, matchingpath, backtrack);
}
else if (opcode == OP_ASSERT_SCS)
Expand Down Expand Up @@ -12694,6 +12697,8 @@ if (current->cc[1] > OP_ASSERTBACK_NOT)
{
/* Manual call of compile_bracket_matchingpath and compile_bracket_backtrackingpath. */
compile_bracket_matchingpath(common, current->cc, current);
if (SLJIT_UNLIKELY(sljit_get_compiler_error(common->compiler)))
return;
Comment on lines +12699 to +12700
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a segfault fix!

If malloc fails inside compile_bracket_matchingpath, then current->top will be left to NULL, causing a segfault inside compile_bracket_backtrackingpath when it access stuff there.

I have triaged this for security according to this logic: on modern platforms, it's very hard to get malloc to fail (more likely for the OS to allocate memory from swap if needed, and then terminate processes if that runs out). And, if the malloc call does fail, it sets a nice clean NULL which will give a deterministic segfault, rather than some bad access to the heap. This is therefore only a minor, hard-to-exploit, denial of service.

compile_bracket_backtrackingpath(common, current->top);
}
else
Expand Down Expand Up @@ -12979,6 +12984,7 @@ while (current)

case OP_BRAMINZERO:
compile_braminzero_backtrackingpath(common, current);
// XXX what if error?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not cause any error. An idea:
https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitLir.h#L688

sljit_set_compiler_memory_error() can be used to kill code compilation. Another (global) flag to stop malloc. Then we can see what happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. JIT allocates memory from its own small slab, rather than through PCRE2's malloc callback. So, the changes in pcre2test aren't exercising those allocation failures?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just partly. The large chunks are allocated with malloc, but the small blocks are simply allocated from the chunk. The whole thing lives until the compiler is deleted. Deleting individual small blocks are not possible.

https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitLir.c#L690

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a flag or mode so that all allocations go through malloc (new "fragment") rather than reusing a previous one? Then we'd be able to exercise all the failures more easily. Would that be best with a runtime flag, or a compile flag? It looks like we'd need to make a change in the sljit repo, not just in pcre2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTR_FAIL_IF_NULL sets the error code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an always alloc flag to sljit_alloc. It should be debug only, disabled by default. It will affect free as well, so it is not that trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTR_FAIL_IF_NULL sets the error code.

I think the case of size > 64/128 bytes doesn't do that, and just returns directly without setting an error condition?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an api misuse. An assert could be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making a change to sjlitLir ensure_buf/ensure_abuf. I made it so that they always allocated a fresh fragment.

Unfortunately... they allocate a lot of fragments, and it takes a looooong time for the tests to successively test the behaviour when each one fails.

For the time being, I'm just going to leave this something which isn't tested automatically.

We can be positive though: the error-handling in scheme in the JIT code is very good, and the compiler really does handle malloc failures very well, as far as I can tell.

break;

case OP_MARK:
Expand Down
192 changes: 179 additions & 13 deletions src/pcre2test.c
Original file line number Diff line number Diff line change
Expand Up @@ -1024,6 +1024,7 @@ static BOOL restrict_for_perl_test = FALSE;
static BOOL show_memory = FALSE;
static BOOL preprocess_only = FALSE;
static BOOL inside_if = FALSE;
static BOOL malloc_testing = FALSE;

static int jitrc; /* Return from JIT compile */
static int test_mode = DEFAULT_TEST_MODE;
Expand Down Expand Up @@ -2943,12 +2944,60 @@ return sys_errlist[n];
* Local memory functions *
*************************************************/

static int mallocs_until_failure = INT_MAX;
static int mallocs_called = 0;

/* Alternative memory functions, to test functionality. */

static void *my_malloc(size_t size, void *data)
{
void *block = malloc(size);
void *block;

(void)data;


/*
XXX

list of all malloc'ing functions:

FIXED COUNT:
,pcre2_code_copy
,pcre2_code_copy_with_tables
,pcre2_compile_context_copy_8,__libc_start_main,_start,
,pcre2_convert_context_copy_8,__libc_start_main,_start,
,pcre2_general_context_copy_8,__libc_start_main,_start,
,pcre2_match_context_copy_8,__libc_start_main,_start,
,pcre2_compile_context_create_8,__libc_start_main,_start,
,pcre2_convert_context_create_8,__libc_start_main,_start,
,pcre2_general_context_create_8,__libc_start_main,_start,
,pcre2_match_context_create_8,__libc_start_main,_start,
,pcre2_general_context_create_8,__libc_start_main,_start,

,pcre2_maketables_8,__libc_start_main,_start,
,pcre2_match_data_create_8,__libc_start_main,_start,
,pcre2_match_data_create_from_pattern_8,__libc_start_main,_start,
,pcre2_pattern_convert_8,__libc_start_main,_start,

,pcre2_serialize_decode_8,__libc_start_main,_start,
,pcre2_serialize_encode_8,__libc_start_main,_start,
,pcre2_substring_get_bynumber_8,__libc_start_main,_start,
,pcre2_substring_get_byname_8,__libc_start_main,_start,
,pcre2_substring_list_get_8,__libc_start_main,_start,

// XXX VARIABLE:
,pcre2_match_8,__libc_start_main,_start,
,pcre2_dfa_match_8,__libc_start_main,_start,
,pcre2_substitute_8,__libc_start_main,_start,


*/

mallocs_called++;
if (mallocs_until_failure != INT_MAX && mallocs_until_failure-- <= 0)
return NULL;

block = malloc(size);
if (show_memory)
{
if (block == NULL)
Expand Down Expand Up @@ -6219,9 +6268,35 @@ if (timeit > 0)

/* A final compile that is used "for real". */

mallocs_called = 0;
PCRE2_COMPILE(compiled_code, use_pbuffer, patlen,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset, use_pat_context);

/* For malloc testing, we repeat the compilation. */

if (malloc_testing)
{
for (int i = 0, target_mallocs = mallocs_called; i <= target_mallocs; i++)
{
if (TEST(compiled_code, !=, NULL))
{ SUB1(pcre2_code_free, compiled_code); }

errorcode = erroroffset = 0;
mallocs_until_failure = i;
PCRE2_COMPILE(compiled_code, use_pbuffer, patlen,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset, use_pat_context);
mallocs_until_failure = INT_MAX;

if (i < target_mallocs &&
!(TEST(compiled_code, ==, NULL) && errorcode == PCRE2_ERROR_HEAP_FAILED))
{
fprintf(outfile, "** malloc() compile test did not fail as expected (%d)\n",
errorcode);
return PR_ABEND;
}
}
}

/* If valgrind is supported, mark the pbuffer as accessible again. We leave the
pattern in the test-mode's buffer defined because it may be read from a callout
during matching. */
Expand Down Expand Up @@ -6262,9 +6337,9 @@ else
#endif
#endif

/* Call the JIT compiler if requested. When timing, we must free and recompile
the pattern each time because that is the only way to free the JIT compiled
code. We know that compilation will always succeed. */
/* Call the JIT compiler if requested. When timing, or testing malloc failures,
we must free and recompile the pattern each time because that is the only way to
free the JIT compiled code. We know that compilation will always succeed. */

if (TEST(compiled_code, !=, NULL) && pat_patctl.jit != 0)
{
Expand All @@ -6275,14 +6350,20 @@ if (TEST(compiled_code, !=, NULL) && pat_patctl.jit != 0)

for (i = 0; i < timeit; i++)
{
clock_t start_time;
clock_t start_time = clock();
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
time_taken += clock() - start_time;

SUB1(pcre2_code_free, compiled_code);
PCRE2_COMPILE(compiled_code, use_pbuffer, patlen,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
use_pat_context);
start_time = clock();
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
time_taken += clock() - start_time;
if (TEST(compiled_code, ==, NULL))
{
fprintf(outfile, "** Unexpected - pattern compilation not successful\n");
return PR_ABEND;
}

if (jitrc != 0)
{
fprintf(outfile, "JIT compilation was not successful");
Expand All @@ -6295,15 +6376,47 @@ if (TEST(compiled_code, !=, NULL) && pat_patctl.jit != 0)
fprintf(outfile, "JIT compile %8.4f microseconds\n",
((1000000 / CLOCKS_PER_SEC) * (double)time_taken) / timeit);
}
else

mallocs_called = 0;
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);

/* For malloc testing, we repeat the compilation. */

if (malloc_testing)
{
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
if (jitrc != 0 && (pat_patctl.control & CTL_JITVERIFY) != 0)
for (int i = 0, target_mallocs = mallocs_called; i <= target_mallocs; i++)
{
fprintf(outfile, "JIT compilation was not successful");
if (!print_error_message(jitrc, " (", ")\n")) return PR_ABEND;
SUB1(pcre2_code_free, compiled_code);
PCRE2_COMPILE(compiled_code, use_pbuffer, patlen,
pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset,
use_pat_context);
if (TEST(compiled_code, ==, NULL))
{
fprintf(outfile, "** Unexpected - pattern compilation not successful\n");
return PR_ABEND;
}

mallocs_until_failure = i;
PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit);
mallocs_until_failure = INT_MAX;

if (i < target_mallocs && jitrc != PCRE2_ERROR_NOMEMORY)
{
fprintf(outfile, "** malloc() JIT compile test did not fail as expected (%d)\n",
jitrc);
return PR_ABEND;
}
}
}

/* Check whether JIT compilation failed; but continue with an error message
if not. */

if (jitrc != 0 && (pat_patctl.control & CTL_JITVERIFY) != 0)
{
fprintf(outfile, "JIT compilation was not successful");
if (!print_error_message(jitrc, " (", ")\n")) return PR_ABEND;
}
}

/* Compilation failed; go back for another re, skipping to blank line
Expand Down Expand Up @@ -8419,6 +8532,7 @@ for (gmatched = 0;; gmatched++)

/* Run a single DFA or NFA match. */

mallocs_called = 0;
if ((dat_datctl.control & CTL_DFA) != 0)
{
if (dfa_workspace == NULL)
Expand Down Expand Up @@ -8448,6 +8562,50 @@ for (gmatched = 0;; gmatched++)
capcount = dat_datctl.oveccount;
}
}

/* For malloc testing, we repeat the matching. */

if (malloc_testing && (dat_datctl.control & CTL_CALLOUT_NONE) != 0)
{
int target_capcount = capcount;

for (int i = 0, target_mallocs = mallocs_called; i <= target_mallocs; i++)
{
mallocs_until_failure = i;

if ((dat_datctl.control & CTL_DFA) != 0)
{
if (dfa_matched++ == 0)
dfa_workspace[0] = -1; /* To catch bad restart */
PCRE2_DFA_MATCH(capcount, compiled_code, pp, arg_ulen,
dat_datctl.offset, dat_datctl.options | g_notempty, match_data,
use_dat_context, dfa_workspace, DFA_WS_DIMENSION);
}
else
{
if ((pat_patctl.control & CTL_JITFAST) != 0)
PCRE2_JIT_MATCH(capcount, compiled_code, pp, arg_ulen, dat_datctl.offset,
dat_datctl.options | g_notempty, match_data, use_dat_context);
else
PCRE2_MATCH(capcount, compiled_code, pp, arg_ulen, dat_datctl.offset,
dat_datctl.options | g_notempty, match_data, use_dat_context);
}

mallocs_until_failure = INT_MAX;

if (capcount == 0)
capcount = dat_datctl.oveccount;

// XXX this is a hack. Clearly some memory is being held-on-to so it's
// not managing to exercise one of the code paths
if (i < target_mallocs && !(capcount == target_capcount || capcount == PCRE2_ERROR_NOMEMORY))
{
fprintf(outfile, "** malloc() match test did not fail as expected (%d)\n",
capcount);
return PR_ABEND;
}
}
}
}

/* The result of the match is now in capcount. First handle a successful
Expand Down Expand Up @@ -9077,6 +9235,7 @@ printf(" -t [<n>] time compilation and execution, repeating <n> times\n");
printf(" -tm [<n>] time execution (matching) only, repeating <n> times\n");
printf(" -T same as -t, but show total times at the end\n");
printf(" -TM same as -tm, but show total time at the end\n");
printf(" -malloc exercise malloc() failures\n");
printf(" -v|--version show PCRE2 version and exit\n");
}

Expand Down Expand Up @@ -9847,6 +10006,13 @@ while (argc > 1 && argv[op][0] == '-' && argv[op][1] != 0)
if (both) timeit = timeitm;
}

/* Set malloc testing */

else if (strcmp(arg, "-malloc") == 0)
{
malloc_testing = TRUE;
}

/* Give help */

else if (strcmp(arg, "-help") == 0 ||
Expand Down