Skip to content

Add testing for malloc() failures#697

Merged
NWilson merged 9 commits intomasterfrom
user/niwilson/malloc-test
Feb 23, 2025
Merged

Add testing for malloc() failures#697
NWilson merged 9 commits intomasterfrom
user/niwilson/malloc-test

Conversation

@NWilson
Copy link
Member

@NWilson NWilson commented Feb 13, 2025

Fixes #627

@NWilson
Copy link
Member Author

NWilson commented Feb 13, 2025

According to Codecov, the coverage goes up by 0.5% as a result of this. Nice! Apparently 1-in-200 lines of code are error branches handling malloc failures, which were previously untested.

Comment on lines +12700 to +12701
if (SLJIT_UNLIKELY(sljit_get_compiler_error(common->compiler)))
return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a segfault fix!

If malloc fails inside compile_bracket_matchingpath, then current->top will be left to NULL, causing a segfault inside compile_bracket_backtrackingpath when it access stuff there.

I have triaged this for security according to this logic: on modern platforms, it's very hard to get malloc to fail (more likely for the OS to allocate memory from swap if needed, and then terminate processes if that runs out). And, if the malloc call does fail, it sets a nice clean NULL which will give a deterministic segfault, rather than some bad access to the heap. This is therefore only a minor, hard-to-exploit, denial of service.


has_vreverse = (*ccbegin == OP_VREVERSE);
if (*ccbegin == OP_REVERSE || has_vreverse)
// XXX what if error?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice patch!

Did you checked this by returning a NULL?

The general concept is, that if an error occurs during JIT compilation, the error is set in the compiler, and all emit instruction commands are ignored. This way we don't need to check the returned value for each function. If a structure is allocated, and NULL is returned, we set memory error in the compiler and return without accessing the members of the structure. Most structures are only used once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the tests should have exercised this path (since we fail each malloc systematically).

I wasn't going to examine the JIT code... but then I had to, because of the segfault I triggered.

I realised the general error-handling strategy, which seems sensible. I built a call graph of all the JIT functions. Then I tried to identify all the ones which could "fail" (hard to determine, since the "failing" functions can be void-returning). Then I worked up the call graph to see if I could prove that either: the caller of a "failing" function has an error-check; or alternatively that all the actions it takes are safe even if the previous call failed.

The "XXX" comments I left are for the cases I wasn't able to understand or verify.

For example, here, if compile_reverse_matchingpath fails due to malloc, is it safe to call compile_matchingpath below (with a NULL ccbegin)? I don't understand the code well enough to be confident either way.

Copy link
Collaborator

@zherczeg zherczeg Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ccbegin cannot be NULL, since it points to something inside the (already alloc'ed) PCRE byte code. In theory we just generate code as usual, but nothing happens. This is very inefficient, but also a highly unlikely case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_reverse_matchingpath does return NULL when malloc fails. This would assign NULL to ccbegin. And compile_matchingpath will derefence that and crash.

In my head, there is a potential problem here, but the tests didn't exercise it, so I don't know whether it's real.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might talk about something different.

https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_jit_compile.c#L8759

return cc cannot return NULL. I don't see any other return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PUSH_BACKTRACK is a return.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that is a misuse of the macro. It should return with cc + 1 + 2 * IMM2_SIZE to simply skip the construct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have pushed a speculative fix to change the return to cc+1+2*IMM2_SIZE. I can exercise the problem by adding a "return NULL" underneath the problematic PUSH_BACKTRACK: if that specific thing fails then I get a segfault.

I hope it's OK that none of the other uses of PUSH_BACKTRACK do the same (they all return NULL). None of the other return values appear to be used at their call sites though, so I think the others are OK.


case OP_BRAMINZERO:
compile_braminzero_backtrackingpath(common, current);
// XXX what if error?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not cause any error. An idea:
https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitLir.h#L688

sljit_set_compiler_memory_error() can be used to kill code compilation. Another (global) flag to stop malloc. Then we can see what happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. JIT allocates memory from its own small slab, rather than through PCRE2's malloc callback. So, the changes in pcre2test aren't exercising those allocation failures?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just partly. The large chunks are allocated with malloc, but the small blocks are simply allocated from the chunk. The whole thing lives until the compiler is deleted. Deleting individual small blocks are not possible.

https://github.com/zherczeg/sljit/blob/master/sljit_src/sljitLir.c#L690

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a flag or mode so that all allocations go through malloc (new "fragment") rather than reusing a previous one? Then we'd be able to exercise all the failures more easily. Would that be best with a runtime flag, or a compile flag? It looks like we'd need to make a change in the sljit repo, not just in pcre2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTR_FAIL_IF_NULL sets the error code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an always alloc flag to sljit_alloc. It should be debug only, disabled by default. It will affect free as well, so it is not that trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTR_FAIL_IF_NULL sets the error code.

I think the case of size > 64/128 bytes doesn't do that, and just returns directly without setting an error condition?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an api misuse. An assert could be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making a change to sjlitLir ensure_buf/ensure_abuf. I made it so that they always allocated a fresh fragment.

Unfortunately... they allocate a lot of fragments, and it takes a looooong time for the tests to successively test the behaviour when each one fails.

For the time being, I'm just going to leave this something which isn't tested automatically.

We can be positive though: the error-handling in scheme in the JIT code is very good, and the compiler really does handle malloc failures very well, as far as I can tell.

@NWilson NWilson marked this pull request as ready for review February 18, 2025 21:29
@NWilson NWilson requested a review from zherczeg February 20, 2025 08:32
@NWilson
Copy link
Member Author

NWilson commented Feb 20, 2025

@zherczeg I found and fixed one additional crash that can be exercised by another of the return NULL error-handling statements in the JIT.

common->positive_assertion_quit = save_positive_assertion_quit;
common->accept = save_accept;
return NULL;
goto end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I would use the return cc + 1 + LINK_SIZE; everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interpreting that as, "don't use the goto".

If you meant, "change the returns in other places as well" let me know where.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did yet another attempt at fixing the JIT segfault. I realised, just before merging this, that return cc + 1 + LINK_SIZE is not correct. The return statement is inside a loop which increments cc, so that's not the correct pointer to return!

It basically does have to return NULL in this instance... so I updated the call site with an error check.

@NWilson
Copy link
Member Author

NWilson commented Feb 22, 2025

My latest change here in the JIT file appears very safe. I'll merge this PR, I think I've been messing with it for long enough now.

@NWilson NWilson merged commit 500c68b into master Feb 23, 2025
33 checks passed
@NWilson NWilson deleted the user/niwilson/malloc-test branch February 23, 2025 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checking allocation failures

2 participants