perf: SIMD scan ASCII runs in input loop#288
perf: SIMD scan ASCII runs in input loop#288dyxushuai wants to merge 7 commits intorockorager:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a SIMD-accelerated fast path for scanning and processing contiguous runs of printable ASCII characters (0x20-0x7E) in the input loop, bypassing the parser for these common cases to reduce per-byte overhead. The optimization shows a ~4x improvement in the benchmark (22,725 ns/frame → 5,743 ns/frame for mixed input streams).
Key Changes:
- Added
asciiPrintableRunLen()function that uses SIMD vector operations to scan for printable ASCII runs - Integrated ASCII fast path in the non-Windows input loop to emit key_press events directly for ASCII characters
- Added benchmark functions to compare baseline parser performance against the SIMD-optimized path
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Loop.zig | Implements asciiPrintableRunLen() SIMD function and integrates ASCII fast path in the input loop before parser invocation |
| bench/bench.zig | Adds asciiPrintableRunLen() function copy and benchmark harnesses (benchParseStreamBaseline, benchParseStreamSimd) to measure performance improvement |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem
Input parsing reads one byte at a time and funnels everything through Parser, even when large ASCII runs are present. This adds avoidable per-byte overhead in common input streams.
Fix
Add a SIMD-assisted scan in the input loop to detect contiguous printable ASCII runs (0x20..0x7E). For these runs we emit key_press events directly. If the next byte begins a combining mark, we leave the last ASCII byte for the parser to avoid breaking combining/keycap sequences.
Bench (local, zig build bench, iterations=200, 80x24)
Mixed stream: ASCII + CSI + UTF-8
Improvement: -18,539 ns/frame (-77.2%), 4.38x speedup.
Tests