Conversation
|
According to Codecov, the coverage goes up by 0.5% as a result of this. Nice! Apparently 1-in-200 lines of code are error branches handling malloc failures, which were previously untested. |
| if (SLJIT_UNLIKELY(sljit_get_compiler_error(common->compiler))) | ||
| return; |
There was a problem hiding this comment.
This is a segfault fix!
If malloc fails inside compile_bracket_matchingpath, then current->top will be left to NULL, causing a segfault inside compile_bracket_backtrackingpath when it access stuff there.
I have triaged this for security according to this logic: on modern platforms, it's very hard to get malloc to fail (more likely for the OS to allocate memory from swap if needed, and then terminate processes if that runs out). And, if the malloc call does fail, it sets a nice clean NULL which will give a deterministic segfault, rather than some bad access to the heap. This is therefore only a minor, hard-to-exploit, denial of service.
src/pcre2_jit_compile.c
Outdated
|
|
||
| has_vreverse = (*ccbegin == OP_VREVERSE); | ||
| if (*ccbegin == OP_REVERSE || has_vreverse) | ||
| // XXX what if error? |
There was a problem hiding this comment.
Nice patch!
Did you checked this by returning a NULL?
The general concept is, that if an error occurs during JIT compilation, the error is set in the compiler, and all emit instruction commands are ignored. This way we don't need to check the returned value for each function. If a structure is allocated, and NULL is returned, we set memory error in the compiler and return without accessing the members of the structure. Most structures are only used once.
There was a problem hiding this comment.
In theory, the tests should have exercised this path (since we fail each malloc systematically).
I wasn't going to examine the JIT code... but then I had to, because of the segfault I triggered.
I realised the general error-handling strategy, which seems sensible. I built a call graph of all the JIT functions. Then I tried to identify all the ones which could "fail" (hard to determine, since the "failing" functions can be void-returning). Then I worked up the call graph to see if I could prove that either: the caller of a "failing" function has an error-check; or alternatively that all the actions it takes are safe even if the previous call failed.
The "XXX" comments I left are for the cases I wasn't able to understand or verify.
For example, here, if compile_reverse_matchingpath fails due to malloc, is it safe to call compile_matchingpath below (with a NULL ccbegin)? I don't understand the code well enough to be confident either way.
There was a problem hiding this comment.
Hm, ccbegin cannot be NULL, since it points to something inside the (already alloc'ed) PCRE byte code. In theory we just generate code as usual, but nothing happens. This is very inefficient, but also a highly unlikely case.
There was a problem hiding this comment.
compile_reverse_matchingpath does return NULL when malloc fails. This would assign NULL to ccbegin. And compile_matchingpath will derefence that and crash.
In my head, there is a potential problem here, but the tests didn't exercise it, so I don't know whether it's real.
There was a problem hiding this comment.
We might talk about something different.
https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_compile.c#L8759
return cc cannot return NULL. I don't see any other return.
There was a problem hiding this comment.
PUSH_BACKTRACK is a return.
There was a problem hiding this comment.
Oh, that is a misuse of the macro. It should return with cc + 1 + 2 * IMM2_SIZE to simply skip the construct.
There was a problem hiding this comment.
OK, I have pushed a speculative fix to change the return to cc+1+2*IMM2_SIZE. I can exercise the problem by adding a "return NULL" underneath the problematic PUSH_BACKTRACK: if that specific thing fails then I get a segfault.
I hope it's OK that none of the other uses of PUSH_BACKTRACK do the same (they all return NULL). None of the other return values appear to be used at their call sites though, so I think the others are OK.
src/pcre2_jit_compile.c
Outdated
|
|
||
| case OP_BRAMINZERO: | ||
| compile_braminzero_backtrackingpath(common, current); | ||
| // XXX what if error? |
There was a problem hiding this comment.
It should not cause any error. An idea:
https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitLir.h#L688
sljit_set_compiler_memory_error() can be used to kill code compilation. Another (global) flag to stop malloc. Then we can see what happens.
There was a problem hiding this comment.
I see. JIT allocates memory from its own small slab, rather than through PCRE2's malloc callback. So, the changes in pcre2test aren't exercising those allocation failures?
There was a problem hiding this comment.
Just partly. The large chunks are allocated with malloc, but the small blocks are simply allocated from the chunk. The whole thing lives until the compiler is deleted. Deleting individual small blocks are not possible.
https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitLir.c#L690
There was a problem hiding this comment.
Note: SLJIT_MALLOC uses the mem controller malloc.
https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_compile.c#L66
https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_compile.c#L72
There was a problem hiding this comment.
Could we add a flag or mode so that all allocations go through malloc (new "fragment") rather than reusing a previous one? Then we'd be able to exercise all the failures more easily. Would that be best with a runtime flag, or a compile flag? It looks like we'd need to make a change in the sljit repo, not just in pcre2.
There was a problem hiding this comment.
PTR_FAIL_IF_NULL sets the error code.
There was a problem hiding this comment.
We can add an always alloc flag to sljit_alloc. It should be debug only, disabled by default. It will affect free as well, so it is not that trivial.
There was a problem hiding this comment.
PTR_FAIL_IF_NULLsets the error code.
I think the case of size > 64/128 bytes doesn't do that, and just returns directly without setting an error condition?
There was a problem hiding this comment.
That is an api misuse. An assert could be useful.
There was a problem hiding this comment.
I tried making a change to sjlitLir ensure_buf/ensure_abuf. I made it so that they always allocated a fresh fragment.
Unfortunately... they allocate a lot of fragments, and it takes a looooong time for the tests to successively test the behaviour when each one fails.
For the time being, I'm just going to leave this something which isn't tested automatically.
We can be positive though: the error-handling in scheme in the JIT code is very good, and the compiler really does handle malloc failures very well, as far as I can tell.
|
@zherczeg I found and fixed one additional crash that can be exercised by another of the |
src/pcre2_jit_compile.c
Outdated
| common->positive_assertion_quit = save_positive_assertion_quit; | ||
| common->accept = save_accept; | ||
| return NULL; | ||
| goto end; |
There was a problem hiding this comment.
Nice catch! I would use the return cc + 1 + LINK_SIZE; everywhere.
There was a problem hiding this comment.
I am interpreting that as, "don't use the goto".
If you meant, "change the returns in other places as well" let me know where.
There was a problem hiding this comment.
I did yet another attempt at fixing the JIT segfault. I realised, just before merging this, that return cc + 1 + LINK_SIZE is not correct. The return statement is inside a loop which increments cc, so that's not the correct pointer to return!
It basically does have to return NULL in this instance... so I updated the call site with an error check.
|
My latest change here in the JIT file appears very safe. I'll merge this PR, I think I've been messing with it for long enough now. |
Fixes #627