Skip to content

Commit 503dcb6

Browse files
committed
rewrite json pointer post
1 parent d92b510 commit 503dcb6

File tree

4 files changed

+147
-141
lines changed

4 files changed

+147
-141
lines changed

.jekyll-metadata

2.68 KB
Binary file not shown.
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
---
2+
title: "Better JSON Pointer"
3+
date: 2024-04-30 09:00:00 +1200
4+
tags: [json-pointer, architecture, performance]
5+
toc: true
6+
pin: false
7+
---
8+
9+
This post was going to be something else, and somewhat more boring. Be glad you're not reading that.
10+
11+
But instead of blindly forging on, I stopped to consider whether I actually wanted to push out the changes I had made. In the end, I'm glad I hesitated.
12+
13+
In this post and probably the couple that follow, I will cover my experience trying to squeeze some more performance out of a simple, immutable type.
14+
15+
## Current state (as it was)
16+
17+
The `JsonPointer` class is a typical object-oriented approach to implementing the JSON Pointer specification, RFC 6901.
18+
19+
Syntactically, a JSON Pointer is nothing more a series of string segments separated by forward slashes. All of the pointer segments follow the same rule: any tildes (`~`) or forward slashes (`/`) need to be escaped; otherwise, just use the string as-is.
20+
21+
Since all of the segments follow a rule, a class is created to model a segment (`PointerSegment`) and then a another class is created to house a series of them (`JsonPointer`). Easy.
22+
23+
Tack on some functionality for parsing, evaluation, and maybe some pointer math (combining and building pointers), and you have a full implementation.
24+
25+
## An idea is formed
26+
27+
In thinking about how the model could be better, I realized that the class is immutable, and it doesn't directly hold a lot of data. What if it were a struct? Then it could live on the stack, eliminating a memory allocation.
28+
29+
Then, instead of holding a collection of strings, it could hold just the full string and a collection of `Range` objects could indicate the segments: one string allocation instead of an array of objects that hold strings.
30+
31+
This raises a question of whether the string should hold pointer-encoded segments. If it did, then `.ToString()` could just return the string, eliminating the need to build it, and I could provide new allocation-free string comparison methods that accounted for encoding so that users could still operate on segments.
32+
33+
I implemented all of this, and it worked! It actually worked quite well:
34+
35+
| Version| n | Mean | Error | StdDev | Gen0 | Allocated |
36+
|------- |------ |-----------:|----------:|-----------:|---------:|----------:|
37+
| v4.0.1 | 1 | 2.778 us | 0.0546 us | 0.1025 us | 4.1962 | 8.57 KB |
38+
| v5.0.0 | 1 | 1.718 us | 0.0335 us | 0.0435 us | 1.4915 | 3.05 KB |
39+
| v4.0.1 | 10 | 26.749 us | 0.5000 us | 0.7330 us | 41.9617 | 85.7 KB |
40+
| v5.0.0 | 10 | 16.719 us | 0.3219 us | 0.4186 us | 14.8926 | 30.47 KB |
41+
| v4.0.1 | 100 | 286.995 us | 5.6853 us | 12.5983 us | 419.4336 | 857.03 KB |
42+
| v5.0.0 | 100 | 157.159 us | 2.5567 us | 2.1350 us | 149.1699 | 304.69 KB |
43+
44+
... for parsing. Pointer math was a bit different:
45+
46+
| Version| n | Mean | Error | StdDev | Gen0 | Allocated |
47+
|------- |------ |------------:|------------:|------------:|---------:|----------:|
48+
| v4.0.1 | 1 | 661.2 ns | 12.86 ns | 11.40 ns | 1.1473 | 2.34 KB |
49+
| v5.0.0 | 1 | 916.3 ns | 17.46 ns | 15.47 ns | 1.1120 | 2.27 KB |
50+
| v4.0.1 | 10 | 6,426.4 ns | 124.10 ns | 121.88 ns | 11.4746 | 23.44 KB |
51+
| v5.0.0 | 10 | 9,128.2 ns | 180.82 ns | 241.39 ns | 11.1237 | 22.73 KB |
52+
| v4.0.1 | 100 | 64,469.6 ns | 1,309.01 ns | 1,093.08 ns | 114.7461 | 234.38 KB |
53+
| v5.0.0 | 100 | 92,437.0 ns | 1,766.38 ns | 1,963.33 ns | 111.3281 | 227.34 KB |
54+
55+
While the memory allocation decrease was... fine, the 50% run-time increase was unacceptable. I couldn't figure out what was going on here, so I left it for about a week and started on some updates for _JsonSchema.Net_ (post coming soon).
56+
57+
Initially for the pointer math, I was just creating a new string and then parsing that. The memory usage was a bit higher than what's shown above, but the run-time was almost double. After a bit of thought, I realized I can explicitly build the string _and_ the range array, which cut down on both the run time and the memory, but only these numbers.
58+
59+
## Eureka!
60+
61+
After a couple days, I finally figured out that by storing each segment, the old way could re-use segments between pointers.
62+
63+
For example, let's combine `/foo/bar` and `/baz`. The pointers for those hold the arrays `['foo', 'bar']` and `['baz']`. When combining under the old way, I'd just merge the arrays: `['foo', 'bar', 'baz']`. It's allocating a new array, but not new strings. All of the segment strings stayed the same.
64+
65+
Under the new way, I'd actually build a new string `/foo/bar/baz` and then build a new array of `Range`s to point to the substrings.
66+
67+
So this new architecture isn't better after all.
68+
69+
## Deep in thought
70+
71+
I thought some more about the two approaches. The old approach does pointer math really well, but I don't like that I have an object (`JsonPointer`) that contains more objects (`PointerSegment`) that each contain strings. That seems wasteful.
72+
73+
Also, why did I make it a struct? Structs should be a fixed size, and strings are never a fixed size (which is a major reason `string` is a class). Secondly, the memory of a struct should also live on the stack, and strings and arrays (even arrays of structs) are stored on the heap; so really it's only the container that's on the stack. A struct just isn't the right choice for this type, so change it back to a class.
74+
75+
What if the pointer just held the strings directly instead of having a secondary `PointerSegment` class? Then all of the decoding/encoding logic would have to live somewhere else, but that's fine. So I don't need a model for the segments; plain strings will do.
76+
77+
Lastly, I could make it implement `IReadOnlyList<string>`. That would give users a `.Count` property, an indexer to access segments, and allow them to iterate over segments directly.
78+
79+
## A new implementation
80+
81+
Taking in all of this analysis, I updated `JsonPointer` again:
82+
83+
- It's a class again.
84+
- It holds an array of (decoded) strings for the segments.
85+
- It will cache its string representation.
86+
- Parsing a pointer already has the string; just store it.
87+
- Constructing a pointer and calling `.ToString()` builds on the fly and caches.
88+
89+
`PointerSegment`, which had also been changed to a struct in the first set of changes, remains a struct and acts as an intermediate type so that building pointers in code can mix strings and integer indices. (See the `.Create()` method used in the code samples below.) Keeping this as a struct means no allocations.
90+
91+
I fixed all of my tests and ran the benchmarks again:
92+
93+
| Parsing | Count | Mean | Error | StdDev | Gen0 | Allocated |
94+
|------- |------ |-----------:|----------:|----------:|---------:|----------:|
95+
| 5.0.0 | 1 | 3.825 us | 0.0760 us | 0.0961 us | 3.0823 | 6.3 KB |
96+
| 5.0.0 | 10 | 36.155 us | 0.6979 us | 0.9074 us | 30.8228 | 62.97 KB |
97+
| 5.0.0 | 100 | 362.064 us | 6.7056 us | 6.2724 us | 308.1055 | 629.69 KB |
98+
99+
| Math | Count | Mean | Error | StdDev | Gen0 | Allocated |
100+
|------- |------ |------------:|----------:|----------:|--------:|----------:|
101+
| 5.0.0 | 1 | 538.2 ns | 10.12 ns | 10.83 ns | 0.9794 | 2 KB |
102+
| 5.0.0 | 10 | 5,188.1 ns | 97.80 ns | 104.65 ns | 9.7885 | 20 KB |
103+
| 5.0.0 | 100 | 58,245.0 ns | 646.43 ns | 539.80 ns | 97.9004 | 200 KB |
104+
105+
For parsing, run time is a higher, generally about 30%, but allocations are down 26%.
106+
107+
For pointer math, run time and allocations are both down, about 20% and 15%, respectively.
108+
109+
I'm comfortable with the parsing time being a bit higher since I expect more usage of the pointer math.
110+
111+
## Some new toys
112+
113+
In addition to the simple indexer you get from `IReadOnlyList<string>`, if you're working in .Net 8, you also get a `Range` indexer which allows you to create a pointer using a subset of the segments. This is really handy when you want to get the parent of a pointer
114+
115+
```c#
116+
var pointer = JsonPointer.Create("foo", "bar", 5, "baz");
117+
var parent = pointer[..^1]; // /foo/bar/5
118+
```
119+
120+
or maybe the relative local pointer (i.e. the last segment)
121+
122+
```c#
123+
var pointer = JsonPointer.Create("foo", "bar", 5, "baz");
124+
var local = pointer[^1..]; // /baz
125+
```
126+
127+
These operations are pretty common in _JsonSchema.Net_.
128+
129+
For those of you who haven't made it to .Net 8 just yet, this functionality is also available as methods:
130+
131+
```c#
132+
var pointer = JsonPointer.Create("foo", "bar", 5, "baz");
133+
var parent = pointer.GetAncestor(1); // /foo/bar/5
134+
var local = pointer.GetLocal(1); // /baz
135+
```
136+
137+
Personally, I like the indexer syntax. I was concerned at first that having an indexer return a new object might feel unorthodox to some developers, but that's exactly what `string` is doing, so I'm fine with it.
138+
139+
## Wrap up
140+
141+
I like where this landed a lot more than where it was in the middle. Something just felt off with the design, and I was having trouble isolating what the issue was. I like that `PointerSegment` isn't part of the model anymore, and it's just "syntax candy" to help build pointers. I really like the performance.
142+
143+
I learned a lot about memory management, which will be the subject of the next post. But more than that, I learned that sometimes inaction is the right action. I hesitated, and the library is better for it.

_posts/2024/2024-04-17-more-performance-updates.md

Lines changed: 0 additions & 141 deletions
This file was deleted.

assets/css/style.scss

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,7 @@ img {
5959
code.highlighter-rouge {
6060
font-size: .85em !important;
6161
}
62+
63+
table {
64+
width: -webkit-fill-available;
65+
}

0 commit comments

Comments
 (0)