Skip to content

Commit 91a11ce

Browse files
committed
Add readme
1 parent c706ca3 commit 91a11ce

File tree

5 files changed

+108
-8
lines changed

5 files changed

+108
-8
lines changed

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,18 @@ members = ["cpu-features"]
33

44
[package]
55
name = "json-escape-simd"
6-
version = "0.1.0"
6+
version = "1.0.0"
77
edition = "2024"
88
rust-version = "1.89.0"
9+
include = ["src/**/*.rs"]
910

1011
[[example]]
1112
name = "escape"
1213
path = "examples/escape.rs"
1314

15+
[features]
16+
force_aarch64_generic = [] # Force use of generic implementation on aarch64
17+
1418
[[bench]]
1519
name = "escape"
1620
harness = false

README.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# json-escape-simd
2+
3+
Optimized SIMD routines for escaping JSON strings. This repository contains the `json-escape-simd` crate, comparison fixtures, and Criterion benches against commonly used alternatives.
4+
5+
> [!IMPORTANT]
6+
>
7+
> On aarch64 NEON hosts the available register width is **128** bits, which is narrower than the lookup table this implementation prefers. As a result the SIMD path may not outperform the generic fallback, which is reflected in the benchmark numbers below.
8+
>
9+
> On some modern macOS devices with larger register numbers, the SIMD path may outperform the generic fallback, see the [M3 max benchmark](#apple-m3-max) below.
10+
11+
> [!NOTE]
12+
>
13+
> The `force_aarch64_generic` feature flag can be used to force use of the generic fallback on aarch64. This is useful for testing the generic fallback on aarch64 devices with smaller register numbers.
14+
15+
## Benchmarks
16+
17+
Numbers below come from `cargo bench` runs on GitHub Actions hardware. Criterion reports are summarized to make it easier to spot relative performance. "vs fastest" shows how much slower each implementation is compared to the fastest entry in the table (1.00× means fastest).
18+
19+
### GitHub Actions x86_64 (`ubuntu-latest`)
20+
21+
`AVX2` enabled.
22+
23+
**RxJS payload (~10k iterations)**
24+
25+
| Implementation | Median time | vs fastest |
26+
| --------------------- | ------------- | ---------- |
27+
| **`escape simd`** | **345.06 µs** | **1.00×** |
28+
| `escape v_jsonescape` | 576.25 µs | 1.67× |
29+
| `escape generic` | 657.94 µs | 1.91× |
30+
| `serde_json` | 766.72 µs | 2.22× |
31+
| `json-escape` | 782.65 µs | 2.27× |
32+
33+
**Fixtures payload (~300 iterations)**
34+
35+
| Implementation | Median time | vs fastest |
36+
| --------------------- | ------------ | ---------- |
37+
| **`escape simd`** | **12.84 ms** | **1.00×** |
38+
| `escape v_jsonescape` | 19.66 ms | 1.53× |
39+
| `escape generic` | 22.53 ms | 1.75× |
40+
| `serde_json` | 24.65 ms | 1.92× |
41+
| `json-escape` | 26.64 ms | 2.07× |
42+
43+
### GitHub Actions aarch64 (`ubuntu-24.04-arm`)
44+
45+
Neon enabled.
46+
47+
**RxJS payload (~10k iterations)**
48+
49+
| Implementation | Median time | vs fastest |
50+
| --------------------- | ------------- | ---------- |
51+
| **`escape generic`** | **546.89 µs** | **1.00×** |
52+
| `escape simd` | 589.29 µs | 1.08× |
53+
| `serde_json` | 612.33 µs | 1.12× |
54+
| `json-escape` | 624.66 µs | 1.14× |
55+
| `escape v_jsonescape` | 789.14 µs | 1.44× |
56+
57+
**Fixtures payload (~300 iterations)**
58+
59+
| Implementation | Median time | vs fastest |
60+
| --------------------- | ------------ | ---------- |
61+
| **`escape generic`** | **17.81 ms** | **1.00×** |
62+
| `serde_json` | 19.77 ms | 1.11× |
63+
| `json-escape` | 20.84 ms | 1.17× |
64+
| `escape simd` | 21.04 ms | 1.18× |
65+
| `escape v_jsonescape` | 25.57 ms | 1.44× |
66+
67+
### Apple M3 Max
68+
69+
70+
71+
**RxJS payload (~10k iterations)**
72+
73+
| Implementation | Median time | vs fastest |
74+
| --------------------- | ------------- | ---------- |
75+
| **`escape simd`** | **307.20 µs** | **1.00×** |
76+
| `escape generic` | 490.00 µs | 1.60× |
77+
| `serde_json` | 570.35 µs | 1.86× |
78+
| `escape v_jsonescape` | 599.72 µs | 1.95× |
79+
| `json-escape` | 644.73 µs | 2.10× |
80+
81+
**Fixtures payload (~300 iterations)**
82+
83+
| Implementation | Median time | vs fastest |
84+
| --------------------- | ------------ | ---------- |
85+
| **`escape generic`** | **17.89 ms** | **1.00×** |
86+
| **`escape simd`** | **17.92 ms** | **1.00×** |
87+
| `serde_json` | 19.78 ms | 1.11× |
88+
| `escape v_jsonescape` | 21.09 ms | 1.18× |
89+
| `json-escape` | 22.43 ms | 1.25× |

src/aarch64.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ use std::arch::aarch64::{
55
use crate::{ESCAPE, HEX_BYTES, UU};
66

77
const CHUNK: usize = 64;
8-
const PREFETCH_DISTANCE: usize = CHUNK * 4;
8+
// 128 bytes ahead
9+
const PREFETCH_DISTANCE: usize = CHUNK * 2;
910
const SLASH_SENTINEL: u8 = 0xFF;
1011

1112
#[inline]
@@ -30,9 +31,8 @@ pub fn escape_neon<S: AsRef<str>>(input: S) -> String {
3031
let ptr = bytes.as_ptr().add(i);
3132

3233
core::arch::asm!(
33-
"prfm pldl1keep, [{0}, #{1}]",
34-
in(reg) ptr,
35-
const PREFETCH_DISTANCE,
34+
"prfm pldl1keep, [{0}]",
35+
in(reg) ptr.add(PREFETCH_DISTANCE),
3636
);
3737

3838
let quad = vld1q_u8_x4(ptr);

src/lib.rs

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#[cfg(target_arch = "x86_64")]
22
mod x86;
33

4-
#[cfg(target_arch = "aarch64")]
4+
#[cfg(all(target_arch = "aarch64", not(feature = "force_aarch64_generic")))]
55
mod aarch64;
66

77
const BB: u8 = b'b'; // \x08
@@ -150,7 +150,14 @@ pub fn escape<S: AsRef<str>>(input: S) -> String {
150150

151151
#[cfg(target_arch = "aarch64")]
152152
{
153-
return aarch64::escape_neon(input);
153+
#[cfg(feature = "force_aarch64_generic")]
154+
{
155+
return escape_generic(input);
156+
}
157+
#[cfg(not(feature = "force_aarch64_generic"))]
158+
{
159+
return aarch64::escape_neon(input);
160+
}
154161
}
155162

156163
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]

0 commit comments

Comments
 (0)