-
Notifications
You must be signed in to change notification settings - Fork 242
Change xclass character min/max detection #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8fd3979 to
8694b18
Compare
|
OK, feel free to make whatever changes you like to the JIT code. I'll need to take a break from PCRE2 for a week, I'm very busy this next week. But then I'll move onto the optimisations PR. |
This might be the first release we do where a major feature isn't supported by JIT though. |
|
For PCRE2, yes, possibly, but there was over a decade of PCRE1 before JIT was even invented. :-) And the first releases with JIT didn't support everything ... it was gradually extended. |
|
Is this the plan? Am I missed something? |
8694b18 to
609af86
Compare
|
at least git will be affected, because it "mistakenly" uses the JIT fast path and therefore will behave erratically. i can produce a patch to fix it, but the release process takes about 1 month there, so it might be a constrain for how long we have to keep the RC around, or even if we might need to do an RC2 |
|
Are you referring to this code in https://github.com/git/git/blob/090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559/grep.c#L356 It seems to ...mandate that PCRE2 regexes are supported by the JIT. It dies if you provide a regex that can't be jitted! Hmm. The error message even says, That's a bit harsh. Anyway, I thought that perhaps Zoltan was intending to add JIT support for ECLASS, but he's waiting until I do the required optimisations. (That seems sensible to me - why write code, when you know you'll need to change it later.) The ECLASS changes in |
|
Me thought in the same way. For me, merging the bitsest is an important optimization for both speed and pattern size. It will change the byte-code layout, and OP_CLASS/OP_NCLASS will not possible in eclass anymore. It is pointless to work on jit support before that. The non-recursive parser will not change the byte code layout (hopefully), and less important. @NWilson plans to do the optimizations as we discussed, so I just wait. I think it is too much to expect him to do the jit support, so I will do it. The new code will likely not be too much, but a lot of knowledge is needed, and it will have some tricky parts. |
|
Does anybody plans to review this patch? |
I didn't see anything that worried me when I did originally, but didn't "approve" to give others a chance. could take a closer look but might take a while and at best would be about some missing "const", most likely. The commit message might be better though by explaining it as a preparatory patch to implement jit support in the future. |
The new code is useful for a future eclass implementation in jit.
609af86 to
b3f6381
Compare
|
I have improved the commit message. |
The new code is not necessary, but it will be useful for a future eclass implementation. It is a bit of a slowdown actually, although the code looks nicer.
@NWilson I don't plan any jit support for eclass until the bitset from xclasses moves to a single bitset in eclass, because that changes the bytecode layout.
Ideally, the eclass should only contain an optional bitset for the first 256 characters, and a list from xclass/allany/eclass_operation items. When an xclass is empty (e.g. its bitset handles all of its elements), the empty xclass should be represented by an OP_ALLANY+OP_ECLASS_NOT byte code pair, which can be detected by jit (or even the interpreter).