-
Couldn't load subscription status.
- Fork 15
Description
The specification gives a description of how to track string backreferences, which I found a little confusing. My understanding is that pseudocode should look like:
function read_key_name(stream, state):
if stream.peek_byte() == 0x20:
stream.advance(1)
return ""
else:
switch parse_key_token(stream):
case Backref(i) => return state.key_backreferences[i]
case String(s) =>
maybe_add_key_backreference(state,s)
return s
function maybe_add_key_backreference(state, string):
if string.length_bytes <= 64:
if state.next_key_backref == 1024:
state.next_key_backref = 0
state.key_backreferences.clear()
state.key_backreferences[state.next_key_backref] := string
state.next_key_backref := state.next_key_backref + 1
else:
# do nothing because the string is not eligible
return
Indeed, this is how this rust implementation interpreted the specification.
That is, a backreference of n refers to the nth non-backreference key of <= 64 bytes since the last reset, and we reset each 1024 non-duplicate property keys of length <= 64 bytes.
However, if one looks at jackson’s generator, key backreferences are saved for all non-empty strings.
So I think that, for property names (i.e. keys), there should be no notion of the ‘eligibility’ of keys (except for clarifying on eligibility empty strings, i.e. which of 0x20 and 0x34 0xfc¹ should be included in the backreference buffer?). I didn’t investigate what the behaviour is for shared strings (instead of shared property names).
I think it’s probably better to modify the spec to match jackson but maybe jackson should be changing instead. Certainly, I think it would be better if the spec were more precise.
¹ The specification specifies that 0x34 is followed by 64 or more bytes of string data however I think most parsers accept less and indeed encoding less than 64 bytes after a 0x34 is the only reasonable way to encode a unicode property name of 58-63 bytes.