Skip to content

Commit a5bdf1e

Browse files
authored
add support for optional fields in records (#6669)
This commit adds support for optional fields in records making fusing of varied JSON data that arises from common sum types much more uniform. The SUP format is updated with a ? following field names to represent optional fields and _ to represent a field value that is not present. While the SuperSQL parser can parse SUP literals with optional fields, record expressions do not yet support optional fields as this support will come in a subequent PR. The vector representation encodes optional field columns by run-length encoding their presence and generates a slot map on demand when serializing to BSUP or when dereferencing a field to in turn generate a vector.Dynamic mixed with Missing error values. This commit is not backward compatible with previous BSUP versions and we will update the BSUP version nunber in a subsequent PR prior to the RINCON release.
1 parent 6e9da9b commit a5bdf1e

File tree

123 files changed

+3410
-2324
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+3410
-2324
lines changed

api/queryio/jsup_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ func TestJSUPWriter(t *testing.T) {
1616
const record = `{x:1}`
1717
const expected = `
1818
{"type":"QueryChannelSet","value":{"channel":"main"}}
19-
{"type":{"kind":"record","id":30,"fields":[{"name":"x","type":{"kind":"primitive","name":"int64"}}]},"value":["1"]}
19+
{"type":{"kind":"record","id":30,"fields":[{"name":"x","type":{"kind":"primitive","name":"int64"},"opt":false}]},"value":["1"]}
2020
{"type":"QueryChannelEnd","value":{"channel":"main"}}
2121
{"type":"QueryError","value":{"error":"test.err"}}
2222
`

book/src/formats/bsup.md

Lines changed: 37 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -186,17 +186,18 @@ Any references to a type ID in the body of a typedef are encoded as a `uvarint`.
186186
A record typedef creates a new type ID equal to the next stream type ID
187187
with the following structure:
188188
```
189-
--------------------------------------------------------
190-
|0x00|<nfields>|<name1><type-id-1><name2><type-id-2>...|
191-
--------------------------------------------------------
189+
----------------------------------------------------------------------
190+
|0x00|<nfields>|<name1><opt-1><type-id-1><name2><opt-2><type-id-2>...|
191+
----------------------------------------------------------------------
192192
```
193193
Record types consist of an ordered set of fields where each field consists of
194194
a name and its type. Unlike JSON, the ordering of the fields is significant
195195
and must be preserved through any APIs that consume, process, and emit BSUP records.
196196

197197
A record type is encoded as a count of fields, i.e., `<nfields>` from above,
198198
followed by the field definitions,
199-
where a field definition is a field name followed by a type ID, i.e.,
199+
where a field definition is a field name, followed by its optionality coded
200+
as a single byte, followed by a type ID, i.e.,
200201
`<name1>` followed by `<type-id-1>` etc. as indicated above.
201202

202203
The field names in a record must be unique.
@@ -215,6 +216,9 @@ string data.
215216
> even if the field names available to the dot operator are restricted
216217
> by language syntax for identifiers.
217218
219+
The optionality byte indicates "optional" for the value 1 and mandatory for
220+
the value 0.
221+
218222
The type ID follows the field name and is encoded as a `uvarint`.
219223

220224
#### 2.1.2 Array Typedef
@@ -336,14 +340,10 @@ whereby the inner loop need not consult and interpret the type ID of each elemen
336340

337341
#### 2.2.1 Tag-Encoding of Values
338342

339-
Each value is prefixed with a "tag" that defines:
340-
* whether it is the null value, and
341-
* its encoded length in bytes.
343+
Each value is prefixed with a length "tag" indicating its encoded length in bytes.
342344

343-
The tag is 0 for the null value and `length+1` for non-null values where
344-
`length` is the encoded length of the value. Note that this encoding
345-
differentiates between a null value and a zero-length value. Many data types
346-
have a meaningful interpretation of a zero-length value, for example, an
345+
Zero-length values are possible as many data types
346+
have a meaningful empty interpretation, for example, an
347347
empty array, the empty record, etc.
348348

349349
The tag itself is encoded as a `uvarint`.
@@ -368,18 +368,19 @@ tend to be zero-filled for small integers.
368368

369369
#### 2.2.3 Tag-Encoded Body of Complex Values
370370

371-
The body of a length-N container comprises zero or more tag-encoded values,
371+
The body of a length-N container for a complex value
372+
comprises zero or more tag-encoded values,
372373
where the values are encoded as follows:
373374

374-
| Type | Value |
375-
|----------|-----------------------------------------|
376-
| `array` | concatenation of elements |
377-
| `set` | normalized concatenation of elements |
378-
| `record` | concatenation of elements |
379-
| `map` | concatenation of key and value elements |
380-
| `union` | concatenation of tag and value |
381-
| `enum` | position of enum element |
382-
| `error` | wrapped element |
375+
| Type | Value |
376+
|----------|-------------------------------------------|
377+
| `array` | concatenation of elements |
378+
| `set` | normalized concatenation of elements |
379+
| `record` | option bits and concatenation of elements |
380+
| `map` | concatenation of key and value elements |
381+
| `union` | concatenation of tag and value |
382+
| `enum` | position of enum element |
383+
| `error` | wrapped element |
383384

384385
Since N, the byte length of any of these container values, is known,
385386
there is no need to encode a count of the
@@ -390,6 +391,17 @@ For sets, the concatenation of elements must be normalized so that the
390391
sequence of bytes encoding each element's tag-counted value is
391392
lexicographically greater than that of the preceding element.
392393

394+
For records, when its type has optional fields,
395+
a bit vector of length NF is encoded as a tag-encoded value of
396+
floor((NF+7)/8) bytes to indicate the omission of an optional value
397+
where NF is the number of optional fields in the record.
398+
The field order of optional fields determines their position
399+
in the bit vector with bit numbers 0-7 (least significant to most significant)
400+
in the first byte, number 8-15 in the second byte, and so forth.
401+
Following the option bits is a concatenation
402+
of elements comprising the mandatory values and the optional values that are present
403+
all in field order.
404+
393405
A union value is encoded as a container with two elements. The first
394406
element, called the tag, is the `uvarint` encoding of the
395407
positional index determining the type of the value in reference to the
@@ -546,11 +558,12 @@ complex type it represents as described below.
546558

547559
A record type value has the form:
548560
```
549-
--------------------------------------------------
550-
|30|<nfields>|<name1><typeval><name2><typeval>...|
551-
--------------------------------------------------
561+
---------------------------------------------------------
562+
|30|<nfields>|<opts>|<name1><typeval><name2><typeval>...|
563+
---------------------------------------------------------
552564
```
553565
where `<nfields>` is the number of fields in the record encoded as a `uvarint`,
566+
`<opts>` is a bit vector of length `<nfields>` indicating which fields are optional,
554567
`<name1>` etc. are the field names encoded as in the
555568
record typedef, and each `<typeval>` is a recursive encoding of a type value.
556569

book/src/formats/jsup.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,8 @@ super -f jsup input.sup | jq .
298298
"type": {
299299
"kind": "primitive",
300300
"name": "string"
301-
}
301+
},
302+
"opt": false
302303
},
303304
{
304305
"name": "r",
@@ -311,17 +312,20 @@ super -f jsup input.sup | jq .
311312
"type": {
312313
"kind": "primitive",
313314
"name": "int64"
314-
}
315+
},
316+
"opt": false
315317
},
316318
{
317319
"name": "b",
318320
"type": {
319321
"kind": "primitive",
320322
"name": "int64"
321-
}
323+
},
324+
"opt": false
322325
}
323326
]
324-
}
327+
},
328+
"opt": false
325329
}
326330
]
327331
},
@@ -356,7 +360,8 @@ super -f jsup input.sup | jq .
356360
"type": {
357361
"kind": "primitive",
358362
"name": "string"
359-
}
363+
},
364+
"opt": false
360365
},
361366
{
362367
"name": "r",
@@ -373,10 +378,12 @@ super -f jsup input.sup | jq .
373378
"kind": "primitive",
374379
"name": "int64"
375380
}
376-
}
381+
},
382+
"opt": false
377383
}
378384
]
379-
}
385+
},
386+
"opt": false
380387
}
381388
]
382389
},
@@ -401,7 +408,8 @@ super -f jsup input.sup | jq .
401408
"type": {
402409
"kind": "primitive",
403410
"name": "string"
404-
}
411+
},
412+
"opt": false
405413
},
406414
{
407415
"name": "r",
@@ -430,13 +438,16 @@ super -f jsup input.sup | jq .
430438
"name": "string"
431439
}
432440
]
433-
}
441+
},
442+
"opt": false
434443
}
435444
]
436-
}
445+
},
446+
"opt": false
437447
}
438448
]
439-
}
449+
},
450+
"opt": false
440451
}
441452
]
442453
},

book/src/formats/model.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -86,20 +86,19 @@ A field name is any UTF-8 string.
8686

8787
A field value is a value of any type.
8888

89-
In contrast to many schema-oriented data formats, the super data model has no way to specify
90-
a field as "optional" since any field value can be a null value.
91-
92-
If an instance of a record value omits a value
93-
by dropping the field altogether rather than using a null, then that record
94-
value corresponds to a different record type that elides the field in question.
89+
A field is either mandatory or optional as indicated by its optionality.
90+
The optionalities of fields in two records must be the same for those
91+
record types to be equivlanet, e.g., type `{a:string,b?:string}` is distinct
92+
from `{a:string,b:string}`.
9593

9694
A record type is uniquely defined by its ordered list of field-type pairs.
9795

9896
The type order of two records is as follows:
9997
* Record with fewer columns than other is ordered before the other.
10098
* Records with the same number of columns are ordered as follows according to:
10199
* the lexicographic order of the field names from left to right,
102-
* or if all the field names are the same, the type order of the field types from left to right.
100+
* secondarily the optionality of a field name for two names that are the same,
101+
* or if all the field names and optionalities are the same, the type order of the field types from left to right.
103102

104103
### 2.2 Array
105104

book/src/super-sql/operators/fuse.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ fuse
3636
{a:1}
3737
{b:2}
3838
# expected output
39-
{a:1::(int64|null),b:null::(int64|null)}
40-
{a:null::(int64|null),b:2::(int64|null)}
39+
{a?:1,b?:_::int64}
40+
{a?:_::int64,b?:2}
4141
```
4242

4343
---
@@ -64,6 +64,6 @@ fuse
6464
{a:[1,2]}
6565
{a:["foo","bar"],b:10.0.0.1}
6666
# expected output
67-
{a:[1,2]::[int64|string],b:null::(ip|null)}
68-
{a:["foo","bar"]::[int64|string],b:10.0.0.1::(ip|null)}
67+
{a:[1,2]::[int64|string],b?:_::ip}
68+
{a:["foo","bar"]::[int64|string],b?:10.0.0.1}
6969
```

book/src/super-sql/sql/set-ops.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -183,10 +183,10 @@ fork
183183
# input
184184
185185
# expected output
186-
{x:1::(int64|null),y:2::(int64|null),z:null::(int64|null)}
187-
{x:3::(int64|null),y:4::(int64|null),z:null::(int64|null)}
188-
{x:5::(int64|null),y:6::(int64|null),z:null::(int64|null)}
189-
{x:null::(int64|null),y:null::(int64|null),z:2::(int64|null)}
190-
{x:null::(int64|null),y:null::(int64|null),z:3::(int64|null)}
186+
{x?:1,y?:2,z?:_::int64}
187+
{x?:3,y?:4,z?:_::int64}
188+
{x?:5,y?:6,z?:_::int64}
189+
{x?:_::int64,y?:_::int64,z?:2}
190+
{x?:_::int64,y?:_::int64,z?:3}
191191
```
192192
---

book/src/tutorials/prs.bsup

17 Bytes
Binary file not shown.

book/src/tutorials/shaping.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -476,9 +476,9 @@ fuse
476476
{y:"foo"}
477477
{x:2,y:"bar"}
478478
# expected output
479-
{x:1::(int64|null),y:null::(string|null)}
480-
{x:null::(int64|null),y:"foo"::(string|null)}
481-
{x:2::(int64|null),y:"bar"::(string|null)}
479+
{x?:1,y?:_::string}
480+
{x?:_::int64,y?:"foo"}
481+
{x?:2,y?:"bar"}
482482
```
483483

484484
Whereas a type union for field `x` is produced in the following:
@@ -511,7 +511,7 @@ fuse(this)
511511
{x:"foo",y:"foo"}
512512
{x:2,y:"bar"}
513513
# expected output
514-
<{x:int64|string,y:string|null}>
514+
<{x:int64|string,y?:string}>
515515
```
516516

517517
Since the `fuse` here is an aggregate function, it can also be used with

cmd/super/db/manage/ztests/compact-size.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ script: |
1414
outputs:
1515
- name: stdout
1616
data: |
17-
{min:0,max:150,count:102::uint64,size:600}
18-
{min:200,max:250,count:51::uint64,size:241}
17+
{min:0,max:150,count:102::uint64,size:602}
18+
{min:200,max:250,count:51::uint64,size:243}

cmd/super/db/manage/ztests/compact.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ script: |
1212
outputs:
1313
- name: stdout
1414
data: |
15-
{min:1,max:200,count:2000::uint64,size:1035}
15+
{min:1,max:200,count:2000::uint64,size:1036}

0 commit comments

Comments
 (0)