-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Since we intend to favor streaming parsing, we need to consider a format suited for streaming.
Strings + Lazy parsing
One of the problems we are going to encounter is the combination of strings and lazy parsing:
- consider two independent lazy functions
fooandbar, wherebaris somewhere further down the stream fromfoo; - assume that
foodefines a literal stringsthat does not show up in our AOT dictionary; - how should
barrefer tosin such a way that we do not first need to parsefoo?
One way to do this is the following:
- divide the stream in packets;
- each packet starts with a table of strings, which may now used by every packet further down the line.
If we do so, the packet containing foo will define literal string s. The packet containing bar will either be the same packet or a packet further down the line, and will be able to access s.
As a bonus, this will let us compress these strings table using a well-known algorithm, such as brotli.
Model State + Lazy Parsing
We will need to adapt our models to restart from a well-specified state whenever parsing a lazy function.
(TBD)
Offsets + Entropy + Streaming
We need the ability to tell the decoder where to fetch a lazy function. In non-entropy-coding versions, we could reference the actual offset at which a lazy function was encoded. With entropy coding, offsets make no sense.
A partial solution would be the following:
- each packet may contain a number of (aligned) lazy declarations;
- each packet's header declares the lazy declarations included in this packet (as keys, actual value of the key is an arbitrary string), with their starting-offset-in-packet;
- when encoding a
[lazy]field, we specify the key at which to find the content of the field; - note that a lazy declaration could span over several packets.