Skip to content

Conversation

@zherczeg
Copy link
Collaborator

The new code is not necessary, but it will be useful for a future eclass implementation. It is a bit of a slowdown actually, although the code looks nicer.

@NWilson I don't plan any jit support for eclass until the bitset from xclasses moves to a single bitset in eclass, because that changes the bytecode layout.

Ideally, the eclass should only contain an optional bitset for the first 256 characters, and a list from xclass/allany/eclass_operation items. When an xclass is empty (e.g. its bitset handles all of its elements), the empty xclass should be represented by an OP_ALLANY+OP_ECLASS_NOT byte code pair, which can be detected by jit (or even the interpreter).

@zherczeg zherczeg force-pushed the xclass_range branch 2 times, most recently from 8fd3979 to 8694b18 Compare November 16, 2024 10:51
@NWilson
Copy link
Member

NWilson commented Nov 16, 2024

OK, feel free to make whatever changes you like to the JIT code.

I'll need to take a break from PCRE2 for a week, I'm very busy this next week. But then I'll move onto the optimisations PR.

@carenas
Copy link
Contributor

carenas commented Nov 16, 2024

I don't plan any jit support for eclass until the bitset from xclasses moves to a single bitset in eclass

This might be the first release we do where a major feature isn't supported by JIT though.

@PhilipHazel
Copy link
Collaborator

For PCRE2, yes, possibly, but there was over a decade of PCRE1 before JIT was even invented. :-) And the first releases with JIT didn't support everything ... it was gradually extended.

@zherczeg
Copy link
Collaborator Author

Is this the plan? Am I missed something?

@carenas
Copy link
Contributor

carenas commented Nov 17, 2024

at least git will be affected, because it "mistakenly" uses the JIT fast path and therefore will behave erratically.

i can produce a patch to fix it, but the release process takes about 1 month there, so it might be a constrain for how long we have to keep the RC around, or even if we might need to do an RC2

@NWilson
Copy link
Member

NWilson commented Nov 18, 2024

Are you referring to this code in git?

https://github.com/git/git/blob/090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559/grep.c#L356

It seems to ...mandate that PCRE2 regexes are supported by the JIT. It dies if you provide a regex that can't be jitted! Hmm. The error message even says, "Couldn't JIT the PCRE2 pattern '%.*s'%s, got '%d'%s" ... "\nPerhaps prefix (*NO_JIT) to your pattern?". So apparently, if you use a new regex like (?[...]) then git will tell you to add (*NO_JIT), rather than silently falling back to the interpreter.

That's a bit harsh. Anyway, I thought that perhaps Zoltan was intending to add JIT support for ECLASS, but he's waiting until I do the required optimisations. (That seems sensible to me - why write code, when you know you'll need to change it later.)

The ECLASS changes in pcre2_match.c are really very short & sweet.

@zherczeg
Copy link
Collaborator Author

Me thought in the same way. For me, merging the bitsest is an important optimization for both speed and pattern size. It will change the byte-code layout, and OP_CLASS/OP_NCLASS will not possible in eclass anymore. It is pointless to work on jit support before that. The non-recursive parser will not change the byte code layout (hopefully), and less important.

@NWilson plans to do the optimizations as we discussed, so I just wait. I think it is too much to expect him to do the jit support, so I will do it. The new code will likely not be too much, but a lot of knowledge is needed, and it will have some tricky parts.

@zherczeg
Copy link
Collaborator Author

Does anybody plans to review this patch?

@carenas
Copy link
Contributor

carenas commented Nov 19, 2024

Does anybody plans to review this patch?

I didn't see anything that worried me when I did originally, but didn't "approve" to give others a chance.

could take a closer look but might take a while and at best would be about some missing "const", most likely.

The commit message might be better though by explaining it as a preparatory patch to implement jit support in the future.

The new code is useful for a future eclass
implementation in jit.
@zherczeg
Copy link
Collaborator Author

I have improved the commit message.

@zherczeg zherczeg merged commit 793ce6b into PCRE2Project:master Nov 20, 2024
18 checks passed
@zherczeg zherczeg deleted the xclass_range branch November 20, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants