Skip to content

Conversation

@quodlibetor
Copy link
Contributor

This adds a pair of new public types: ReusableMap and BorrowedMap.

Reusable map stores an ObjectAsVec and provides a deserialize method which returns a BorrowedMap that uses unsafe to tie lifetimes to both the input json str reference and the deserializer.

When the BorrowedMap is dropped it runs clear() on the internal ObjectAsVec, freeing it to be reused.

Benches show a 4-15% increase in Avg and a 4-12% increase in median throughput compared to OwnedValue on my machine:

parse
simple_json
serde_json parse only                            Avg: 98.270 MB/s Median: 97.457 MB/s     [79.288 MB/s .. 129.84 MB/s]
serde_json_borrow::OwnedValue parse only         Avg: 126.92 MB/s Median: 131.21 MB/s     [64.485 MB/s .. 277.54 MB/s]
serde_json_borrow::ReusableMap parse only        Avg: 145.78 MB/s Median: 147.55 MB/s     [86.206 MB/s .. 202.15 MB/s]
SIMD_json_borrow parse only                      Avg: 82.997 MB/s Median: 84.742 MB/s     [58.033 MB/s .. 136.56 MB/s]
hdfs
serde_json parse only                            Avg: 258.02 MB/s Median: 266.40 MB/s     [186.76 MB/s .. 302.37 MB/s]
serde_json_borrow::OwnedValue parse only         Avg: 366.20 MB/s Median: 375.88 MB/s     [231.04 MB/s .. 496.03 MB/s]
serde_json_borrow::ReusableMap parse only        Avg: 407.99 MB/s Median: 404.68 MB/s     [320.25 MB/s .. 559.69 MB/s]
SIMD_json_borrow parse only                      Avg: 258.31 MB/s Median: 257.01 MB/s     [197.53 MB/s .. 301.19 MB/s]
hdfs_with_array
serde_json parse only                            Avg: 308.14 MB/s Median: 314.23 MB/s     [275.76 MB/s .. 320.35 MB/s]
serde_json_borrow::OwnedValue parse only         Avg: 505.74 MB/s Median: 524.64 MB/s     [356.95 MB/s .. 548.81 MB/s]
serde_json_borrow::ReusableMap parse only        Avg: 540.35 MB/s Median: 544.24 MB/s     [466.89 MB/s .. 578.15 MB/s]
SIMD_json_borrow parse only                      Avg: 310.45 MB/s Median: 312.34 MB/s     [300.17 MB/s .. 316.79 MB/s]
wiki
serde_json parse only                            Avg: 589.71 MB/s Median: 614.37 MB/s     [369.78 MB/s .. 679.87 MB/s]
serde_json_borrow::OwnedValue parse only         Avg: 627.86 MB/s Median: 689.41 MB/s     [215.13 MB/s .. 782.33 MB/s]
serde_json_borrow::ReusableMap parse only        Avg: 680.08 MB/s Median: 706.46 MB/s     [519.18 MB/s .. 799.92 MB/s]
SIMD_json_borrow parse only                      Avg: 552.17 MB/s Median: 613.51 MB/s     [313.16 MB/s .. 698.85 MB/s]
gh-archive
serde_json parse only                            Avg: 336.46 MB/s Median: 337.61 MB/s     [321.00 MB/s .. 344.31 MB/s]
serde_json_borrow::OwnedValue parse only         Avg: 618.58 MB/s Median: 620.74 MB/s     [574.05 MB/s .. 645.41 MB/s]
serde_json_borrow::ReusableMap parse only        Avg: 643.53 MB/s Median: 648.20 MB/s     [556.04 MB/s .. 668.70 MB/s]
SIMD_json_borrow parse only                      Avg: 605.58 MB/s Median: 619.44 MB/s     [502.05 MB/s .. 636.55 MB/s]

Actual improvement will depend a lot on the underlying shape of the data, especially how many nested objects there are. Now that this exists there's also a pattern in place, I could imagine storing a freelist of maps and vecs to reuse for nested objects that could probably squeeze out a bit more performance.

This adds a pair of new public types: `ReusableMap` and `BorrowedMap`.

Reusable map stores an ObjectAsVec and provides a `deserialize` method
which returns a `BorrowedMap` that uses unsafe to tie lifetimes to both
the input json str reference and the deserializer.

When the `BorrowedMap` is dropped it runs `clear()` on the internal
`ObjectAsVec`, freeing it to be reused.

Benches show a 4-15% increase in Avg and a 4-12% increase in median
throughput compared to OwnedValue on my machine:

    parse
    simple_json
    serde_json parse only                            Avg: 98.270 MB/s Median: 97.457 MB/s     [79.288 MB/s .. 129.84 MB/s]
    serde_json_borrow::OwnedValue parse only         Avg: 126.92 MB/s Median: 131.21 MB/s     [64.485 MB/s .. 277.54 MB/s]
    serde_json_borrow::ReusableMap parse only        Avg: 145.78 MB/s Median: 147.55 MB/s     [86.206 MB/s .. 202.15 MB/s]
    SIMD_json_borrow parse only                      Avg: 82.997 MB/s Median: 84.742 MB/s     [58.033 MB/s .. 136.56 MB/s]
    hdfs
    serde_json parse only                            Avg: 258.02 MB/s Median: 266.40 MB/s     [186.76 MB/s .. 302.37 MB/s]
    serde_json_borrow::OwnedValue parse only         Avg: 366.20 MB/s Median: 375.88 MB/s     [231.04 MB/s .. 496.03 MB/s]
    serde_json_borrow::ReusableMap parse only        Avg: 407.99 MB/s Median: 404.68 MB/s     [320.25 MB/s .. 559.69 MB/s]
    SIMD_json_borrow parse only                      Avg: 258.31 MB/s Median: 257.01 MB/s     [197.53 MB/s .. 301.19 MB/s]
    hdfs_with_array
    serde_json parse only                            Avg: 308.14 MB/s Median: 314.23 MB/s     [275.76 MB/s .. 320.35 MB/s]
    serde_json_borrow::OwnedValue parse only         Avg: 505.74 MB/s Median: 524.64 MB/s     [356.95 MB/s .. 548.81 MB/s]
    serde_json_borrow::ReusableMap parse only        Avg: 540.35 MB/s Median: 544.24 MB/s     [466.89 MB/s .. 578.15 MB/s]
    SIMD_json_borrow parse only                      Avg: 310.45 MB/s Median: 312.34 MB/s     [300.17 MB/s .. 316.79 MB/s]
    wiki
    serde_json parse only                            Avg: 589.71 MB/s Median: 614.37 MB/s     [369.78 MB/s .. 679.87 MB/s]
    serde_json_borrow::OwnedValue parse only         Avg: 627.86 MB/s Median: 689.41 MB/s     [215.13 MB/s .. 782.33 MB/s]
    serde_json_borrow::ReusableMap parse only        Avg: 680.08 MB/s Median: 706.46 MB/s     [519.18 MB/s .. 799.92 MB/s]
    SIMD_json_borrow parse only                      Avg: 552.17 MB/s Median: 613.51 MB/s     [313.16 MB/s .. 698.85 MB/s]
    gh-archive
    serde_json parse only                            Avg: 336.46 MB/s Median: 337.61 MB/s     [321.00 MB/s .. 344.31 MB/s]
    serde_json_borrow::OwnedValue parse only         Avg: 618.58 MB/s Median: 620.74 MB/s     [574.05 MB/s .. 645.41 MB/s]
    serde_json_borrow::ReusableMap parse only        Avg: 643.53 MB/s Median: 648.20 MB/s     [556.04 MB/s .. 668.70 MB/s]
    SIMD_json_borrow parse only                      Avg: 605.58 MB/s Median: 619.44 MB/s     [502.05 MB/s .. 636.55 MB/s]

Actual improvement will depend a lot on the underlying shape of the
data, especially how many nested objects there are. Now that this exists
there's also a pattern in place, I could imagine storing a freelist of
maps and vecs to reuse for nested objects that could probably squeeze
out a bit more performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant