-
Notifications
You must be signed in to change notification settings - Fork 2
Description
I've been working on understanding the fast-path literal optimisation which recently trickled through to the cloudflare fork:
Lines 188 to 200 in 35ab598
| if (likely(here.op == 0)) { | |
| *out++ = (unsigned char)(here.val); | |
| TABLE_LOAD(lcode, hold & lmask); | |
| old = hold; | |
| hold >>= here.bits; | |
| bits -= here32; | |
| if (likely(here.op == 0)) { | |
| *out++ = (unsigned char)(here.val); | |
| TABLE_LOAD(lcode, hold & lmask); | |
| old = hold; | |
| hold >>= here.bits; | |
| bits -= here32; | |
| } |
I'm curious if you tried putting that logic at the end of the handler for the first literal in the main loop (with necessary adjustments):
Lines 204 to 209 in 35ab598
| if (likely(op == 0)) { /* literal */ | |
| Tracevv((stderr, here.val >= 0x20 && here.val < 0x7f ? | |
| "inflate: literal '%c'\n" : | |
| "inflate: literal 0x%02x\n", here.val)); | |
| *out++ = (unsigned char)(here.val); | |
| } |
Because it should mean you can eliminate these refills, right?
Lines 217 to 219 in 35ab598
| if (unlikely((bits & 63) < 15 + 13)) { | |
| REFILL(); | |
| } |
Lines 235 to 238 in 35ab598
| if (unlikely((bits & 63) < 10)) { | |
| REFILL(); | |
| } | |
That is, at the start of the loop you have at least 56 bits available. If you bite off a couple of fast-path literals before getting into the conventional length/distance decode case then you might be down to something like 34 bits. That means you can no longer guarantee 15 + 5 + 15 + 13 bits avialible, which makes extra top-ups necessary.
I might be wrong about that because there are gotos in there to confuse me. But supposing I didn't mess up...
I figure if you try to scoop up extra literals after the first one, then that's up to 15 bits you've spent (because you might have used a trampoline table entry path for the first literal), leaving 41 bits guaranteed so you can keep sipping up literals at least 41/state->lenbits times. I think that's actually three more literals, isn't it?
I think this path is a branch prediction stressor, anyway, so it's a lot of cross-platform research which I'm not able to do to back up my "helpful suggestion".