Unboxed tokens by kornelski · Pull Request #257 · cloudflare/lol-html

kornelski · 2025-01-01T14:49:46Z

This may be easier to review this as individual commits, since big diffs are from splitting and renaming files.

This refactoring removes TokenCapturer and TokenCapturerEvent, which enables Lexeme::to_token to be inlined. This way the Token isn't moved around as much, and it's not necessary to use Box<Token>.

This allowed TextDecoder to be simplified, and handle just the decoding, similar to TextEncoder. I've added a fast path for handling text chunks, with zero-copy for UTF-8 and ASCII. It could have been even faster, if it didn't keep the existing tricky behavior: #255.

Overall, it makes most benchmarks 5-8% faster, with 25-40% faster on text-heavy documents.

kornelski requested review from a team, Noah-Kennedy, jasnell, jongiddy, orium and scotchmist as code owners January 1, 2025 14:49

kornelski force-pushed the unboxed-tokens branch 3 times, most recently from f81165f to ec78225 Compare January 6, 2025 14:46

kornelski added 10 commits January 7, 2025 14:56

Avoid boxing every token

0f8ed21

Remove unnecessary macro

193f9a0

Inline feed()

5232c10

Remove TokenCapturer

3de73c4

Move capture flag check

f725c67

Split out TextEncoder

58a69cf

Move TextDecoder

aac3a90

Remove token construction from TextDecoder

c594256

Avoid copying when decoding UTF-8 or ASCII

dce3c57

Avoid panic branches in TextDecoder

d773b99

kornelski force-pushed the unboxed-tokens branch from ec78225 to d773b99 Compare January 7, 2025 14:56

orium approved these changes Feb 4, 2025

View reviewed changes

orium merged commit b5b1a1e into master Feb 4, 2025
5 checks passed

orium deleted the unboxed-tokens branch February 4, 2025 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unboxed tokens#257

Unboxed tokens#257
orium merged 10 commits intomasterfrom
unboxed-tokens

kornelski commented Jan 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kornelski commented Jan 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants