Skip to content

LibWeb/CSS: Port the CSS Tokenizer to Rust#8926

Open
AtkinsSJ wants to merge 3 commits intoLadybirdBrowser:masterfrom
AtkinsSJ:rusty-tokens
Open

LibWeb/CSS: Port the CSS Tokenizer to Rust#8926
AtkinsSJ wants to merge 3 commits intoLadybirdBrowser:masterfrom
AtkinsSJ:rusty-tokens

Conversation

@AtkinsSJ
Copy link
Copy Markdown
Member

This is an AI translation of the CSS Tokenizer. test-css-tokenizer has been updated to run both Tokenizers and compare the output to make sure it's identical. There isn't yet any support for using it inside LibWeb.

I am very much a Rust novice, and this is very much written by AI. (Codex to be specific.) I have reviewed it myself and as far as I can tell it's correct, and the tests suggest that too, but as I said I am not familiar with Rust. I'm sure the module structure etc is not exactly what it should be.

The code itself is not idiomatic but that's to be expected as a direct translation, and improving that is a future task.

In particular, feedback on the CMake and the FFI parts could use some scrutiny by someone who understands those better.

test-css-tokenizer is updated to run both the C++ and Rust tokenizers
and compare their output, to ensure they behave identically. The Parser
still uses the C++ Tokenizer.

The LibWeb crate, FFI layer etc are all based on the existing ones for
other libraries.

This is a direct AI translation to get us started, and not idiomatic
Rust. Future work can be done to make it more sensible.
@jdahlin
Copy link
Copy Markdown
Contributor

jdahlin commented Apr 15, 2026

Have you verified speed and/or memory usage just to ensure there is no significant regression?

@AtkinsSJ
Copy link
Copy Markdown
Member Author

Good point. Here's a hyperfine run on my MacBook Pro, running on the UA stylesheet which is >2000 lines.

❯ hyperfine -L backend cpp,rust 'Build/release/bin/css-tokenizer --backend {backend} Libraries/LibWeb/CSS/Default.css > /dev/null 2>&1'
Benchmark 1: Build/release/bin/css-tokenizer --backend cpp Libraries/LibWeb/CSS/Default.css > /dev/null 2>&1
  Time (mean ± σ):      39.4 ms ±   0.5 ms    [User: 29.8 ms, System: 8.3 ms]
  Range (min … max):    38.9 ms …  41.7 ms    71 runs
 
Benchmark 2: Build/release/bin/css-tokenizer --backend rust Libraries/LibWeb/CSS/Default.css > /dev/null 2>&1
  Time (mean ± σ):      39.5 ms ±   0.3 ms    [User: 29.8 ms, System: 8.5 ms]
  Range (min … max):    38.9 ms …  41.1 ms    71 runs
 
Summary
  Build/release/bin/css-tokenizer --backend cpp Libraries/LibWeb/CSS/Default.css > /dev/null 2>&1 ran
    1.00 ± 0.02 times faster than Build/release/bin/css-tokenizer --backend rust Libraries/LibWeb/CSS/Default.css > /dev/null 2>&1

After doing this a few times, the Rust one is consistently 2-3% slower. So, not significantly different, though it is a regression. I'll keep an eye on it when working on it later.

I can't seem to get a memory reading, do you have any tips for that?

@nico
Copy link
Copy Markdown
Contributor

nico commented Apr 15, 2026

I can't seem to get a memory reading, do you have any tips for that?

/usr/bin/time -v gives you a max rss reading. Maybe that together with a representative input could give you an idea?

@AtkinsSJ
Copy link
Copy Markdown
Member Author

Thanks, I couldn't get that to work before but turns out macos has a gnu-time package, so it's all good now. 😅

C++:

	Command being timed: "Build/release/bin/css-tokenizer --backend cpp Libraries/LibWeb/CSS/Default.css"
	User time (seconds): 0.04
	System time (seconds): 0.02
	Percent of CPU this job got: 78%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.08
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 36048
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 34
	Minor (reclaiming a frame) page faults: 2930
	Voluntary context switches: 567
	Involuntary context switches: 14
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0

Rust:

	Command being timed: "Build/release/bin/css-tokenizer --backend rust Libraries/LibWeb/CSS/Default.css"
	User time (seconds): 0.04
	System time (seconds): 0.02
	Percent of CPU this job got: 77%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.08
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 40192
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 33
	Minor (reclaiming a frame) page faults: 3190
	Voluntary context switches: 654
	Involuntary context switches: 24
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0

So that's... 11% more memory used by my reckoning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants