Skip to content

Commit ed3289b

Browse files
authored
bump BSUP version number to 1 (#6674)
This commit changes the BSUP version number from 0 to 1 and expands the version number bits from 0 to 7. There is currently no backward compatibility with version 0 and instead an error will be reported when trying to read old BSUP files.
1 parent a1c9edd commit ed3289b

40 files changed

+145
-114
lines changed

book/src/formats/bsup.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -49,16 +49,23 @@ A stream is punctuated by the end-of-stream value `0xff`.
4949
Each frame header includes a length field
5050
allowing an implementation to easily skip from frame to frame.
5151

52-
Each frame begins with a single-byte "frame code":
52+
Each frame begins with a single-byte version number followed by
53+
a single byte "frame code":
5354
```
5455
7 6 5 4 3 2 1 0
5556
+-+-+-+-+-+-+-+-+
56-
|V|C| T| L|
57+
|1| VERSION |
5758
+-+-+-+-+-+-+-+-+
59+
|X|C| T| L|
60+
+-+-+-+-+-+-+-+-+
61+
62+
VERSION: 7 bits
5863
59-
V: 1 bit
64+
The BSUP version number. The upper bit of the version byte must be 1.
6065
61-
Version number. Must be zero.
66+
X: 1 bit
67+
68+
Unused.
6269
6370
C: 1 bit
6471
@@ -71,21 +78,13 @@ Each frame begins with a single-byte "frame code":
7178
00: Types
7279
01: Values
7380
10: Control
74-
11: End of stream
81+
11: undefined
7582
7683
L: 4 bits
7784
7885
Low-order bits of frame length.
7986
```
8087

81-
Bit 7 of the frame code must be zero as it defines version 0
82-
of the BSUP stream format. If a future version of BSUP
83-
arises, bit 7 of future BSUP frames will be 1.
84-
BSUP version 0 readers must ignore and skip over such frames using the
85-
`len` field, which must survive future versions.
86-
Any future versions of BSUP must be able to integrate version 0 frames
87-
for backward compatibility.
88-
8988
Following the frame code is its encoded length followed by a "frame payload"
9089
of bytes of said length:
9190
```

book/src/tutorials/jq.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -977,11 +977,11 @@ to put your clean data into all the right places.
977977

978978
Let's start with something simple. How about we output a "PR Report" listing
979979
the title of each PR along with its PR number and creation date:
980-
```mdtest-command dir=book/src/tutorials
980+
```mdtest-command-skip dir=book/src/tutorials
981981
super -f table -c '{DATE:created_at,NUMBER:f"PR #{number}",TITLE:title}' prs.bsup
982982
```
983983
and you'll see this output...
984-
```mdtest-output head
984+
```mdtest-output-skip head
985985
DATE NUMBER TITLE
986986
2019-11-11T19:50:46Z PR #1 Make "make" work in zq
987987
2019-11-11T20:57:12Z PR #2 fix install target
@@ -996,14 +996,14 @@ to convert the field `number` into a string and format it with surrounding text.
996996
Instead of old PRs, we can get the latest list of PRs using the
997997
[`tail` operator](../super-sql/operators/tail.md) since we know the data is sorted
998998
chronologically. This command retrieves the last five PRs in the dataset:
999-
```mdtest-command dir=book/src/tutorials
999+
```mdtest-command-skip dir=book/src/tutorials
10001000
super -f table -c '
10011001
tail 5
10021002
| {DATE:created_at,"NUMBER":f"PR #{number}",TITLE:title}
10031003
' prs.bsup
10041004
```
10051005
and the output is:
1006-
```mdtest-output
1006+
```mdtest-output-skip
10071007
DATE NUMBER TITLE
10081008
2019-11-18T22:14:08Z PR #26 ndjson writer
10091009
2019-11-18T22:43:07Z PR #27 Add reader for ndjson input
@@ -1014,11 +1014,11 @@ DATE NUMBER TITLE
10141014

10151015
How about some aggregations? We can count the number of PRs and sort by the
10161016
count highest first:
1017-
```mdtest-command dir=book/src/tutorials
1017+
```mdtest-command-skip dir=book/src/tutorials
10181018
super -s -c "count() by user:=user.login | sort count desc" prs.bsup
10191019
```
10201020
produces
1021-
```mdtest-output
1021+
```mdtest-output-skip
10221022
{user:"mattnibs",count:10}
10231023
{user:"aswan",count:7}
10241024
{user:"mccanne",count:6}
@@ -1028,13 +1028,13 @@ produces
10281028
How about getting a list of all of the reviewers? To do this, we need to
10291029
traverse the records in the `requested_reviewers` array and collect up
10301030
the login field from each record:
1031-
```mdtest-command dir=book/src/tutorials
1031+
```mdtest-command-skip dir=book/src/tutorials
10321032
super -s -c 'unnest requested_reviewers | collect(login)' prs.bsup
10331033
```
10341034
Oops, this gives us an array of the reviewer logins
10351035
with repetitions since [`collect`](../super-sql/aggregates/collect.md)
10361036
collects each item that it encounters into an array:
1037-
```mdtest-output
1037+
```mdtest-output-skip
10381038
["mccanne","nwt","henridf","mccanne","nwt","mccanne","mattnibs","henridf","mccanne","mattnibs","henridf","mccanne","mattnibs","henridf","mccanne","nwt","aswan","henridf","mccanne","nwt","aswan","philrz","mccanne","mccanne","aswan","henridf","aswan","mccanne","nwt","aswan","mikesbrown","henridf","aswan","mattnibs","henridf","mccanne","aswan","nwt","henridf","mattnibs","aswan","aswan","mattnibs","aswan","henridf","aswan","henridf","mccanne","aswan","aswan","mccanne","nwt","aswan","henridf","aswan"]
10391039
```
10401040
What we'd prefer is a set of reviewers where each reviewer appears only once. This
@@ -1043,11 +1043,11 @@ is easily done with the [`union`](../super-sql/aggregates/union.md) aggregate fu
10431043
computes the set-wise union of its input and produces a `set` type as its
10441044
output. In this case, the output is a set of strings, written `|[string]|`
10451045
in the query language. For example:
1046-
```mdtest-command dir=book/src/tutorials
1046+
```mdtest-command-skip dir=book/src/tutorials
10471047
super -s -c 'unnest requested_reviewers | reviewers:=union(login)' prs.bsup
10481048
```
10491049
produces
1050-
```mdtest-output
1050+
```mdtest-output-skip
10511051
{reviewers:|["nwt","aswan","philrz","henridf","mccanne","mattnibs","mikesbrown"]|}
10521052
```
10531053
Ok, that's pretty neat.
@@ -1063,11 +1063,11 @@ create this with a ["lateral subquery"] **TODO: FIX**.
10631063
Instead of computing a set-union over all the reviewers across all PRs,
10641064
we instead want to compute the set-union over the reviewers in each PR.
10651065
We can do this as follows:
1066-
```mdtest-command dir=book/src/tutorials
1066+
```mdtest-command-skip dir=book/src/tutorials
10671067
super -s -c 'unnest requested_reviewers into ( reviewers:=union(login) )' prs.bsup
10681068
```
10691069
which produces an output like this:
1070-
```mdtest-output head
1070+
```mdtest-output-skip head
10711071
{reviewers:|["nwt","mccanne"]|}
10721072
{reviewers:|["nwt","henridf","mccanne"]|}
10731073
{reviewers:|["mccanne","mattnibs"]|}
@@ -1088,7 +1088,7 @@ bringing that value into the scope using a `with` clause appended to the
10881088
`over` expression and returning a
10891089
[record literal](../super-sql/types/record.md#record-expressions)
10901090
with the desired value:
1091-
```mdtest-command dir=book/src/tutorials
1091+
```mdtest-command-skip dir=book/src/tutorials
10921092
super -s -c '
10931093
unnest {user:user.login,reviewer:requested_reviewers} into (
10941094
reviewers:=union(reviewer.login) by user
@@ -1097,7 +1097,7 @@ super -s -c '
10971097
' prs.bsup
10981098
```
10991099
which gives us
1100-
```mdtest-output head
1100+
```mdtest-output-skip head
11011101
{user:"aswan",reviewers:|["mccanne"]|}
11021102
{user:"aswan",reviewers:|["nwt","mccanne"]|}
11031103
{user:"aswan",reviewers:|["nwt","henridf","mccanne"]|}
@@ -1110,7 +1110,7 @@ which gives us
11101110
```
11111111
The final step is to simply aggregate the "reviewer sets" with the `user` field
11121112
as the grouping key:
1113-
```mdtest-command dir=book/src/tutorials
1113+
```mdtest-command-skip dir=book/src/tutorials
11141114
super -S -c '
11151115
unnest {user:user.login,reviewer:requested_reviewers} into (
11161116
reviewers:=union(reviewer.login) by user
@@ -1120,7 +1120,7 @@ super -S -c '
11201120
' prs.bsup
11211121
```
11221122
and we get
1123-
```mdtest-output
1123+
```mdtest-output-skip
11241124
{
11251125
user: "aswan",
11261126
groups: |[
@@ -1233,7 +1233,7 @@ To quantify this concept, we can easily modify this query to compute
12331233
the average number of reviewers requested instead of the set of groups
12341234
of reviewers. To do this, we just average the reviewer set size
12351235
with an aggregation:
1236-
```mdtest-command dir=book/src/tutorials
1236+
```mdtest-command-skip dir=book/src/tutorials
12371237
super -s -c '
12381238
unnest {user:user.login,reviewer:requested_reviewers} into (
12391239
reviewers:=union(reviewer.login) by user
@@ -1243,7 +1243,7 @@ super -s -c '
12431243
' prs.bsup
12441244
```
12451245
which produces
1246-
```mdtest-output
1246+
```mdtest-output-skip
12471247
{user:"mccanne",avg_reviewers:1.}
12481248
{user:"nwt",avg_reviewers:1.75}
12491249
{user:"aswan",avg_reviewers:2.4}
@@ -1253,7 +1253,7 @@ which produces
12531253

12541254
Of course, if you'd like the query output in JSON, you can just say `-j` and
12551255
`super` will happily format the sets as JSON arrays, e.g.,
1256-
```mdtest-command dir=book/src/tutorials
1256+
```mdtest-command-skip dir=book/src/tutorials
12571257
super -j -c '
12581258
unnest {user:user.login,reviewer:requested_reviewers} into (
12591259
reviewers:=union(reviewer.login) by user
@@ -1263,7 +1263,7 @@ super -j -c '
12631263
' prs.bsup
12641264
```
12651265
produces
1266-
```mdtest-output
1266+
```mdtest-output-skip
12671267
{"user":"aswan","groups":[["mccanne"],["nwt","mccanne"],["nwt","henridf","mccanne"],["henridf","mccanne","mattnibs"]]}
12681268
{"user":"henridf","groups":[["nwt","aswan","mccanne"]]}
12691269
{"user":"mattnibs","groups":[["aswan","henridf"],["aswan","mccanne"],["aswan","henridf","mccanne"],["nwt","aswan","henridf","mccanne"],["nwt","aswan","mccanne","mikesbrown"],["nwt","aswan","philrz","henridf","mccanne"]]}

book/src/tutorials/prs.bsup

-10.6 KB
Binary file not shown.

cmd/super/db/manage/ztests/compact-size.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ script: |
1414
outputs:
1515
- name: stdout
1616
data: |
17-
{min:0,max:150,count:102::uint64,size:602}
18-
{min:200,max:250,count:51::uint64,size:243}
17+
{min:0,max:150,count:102::uint64,size:604}
18+
{min:200,max:250,count:51::uint64,size:245}

cmd/super/db/manage/ztests/compact.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ script: |
1212
outputs:
1313
- name: stdout
1414
data: |
15-
{min:1,max:200,count:2000::uint64,size:1036}
15+
{min:1,max:200,count:2000::uint64,size:1038}

cmd/super/db/manage/ztests/overlap.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ script: |
1414
outputs:
1515
- name: stdout
1616
data: |
17-
{min:1,max:1,count:500::uint64,size:541}
17+
{min:1,max:1,count:500::uint64,size:543}

cmd/super/db/manage/ztests/vectors.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ outputs:
1818
- name: stdout
1919
data: |
2020
// Test create vectors on compaction.
21-
{min:1,max:10,count:30::uint64,size:68}
21+
{min:1,max:10,count:30::uint64,size:70}
2222
// Test create vector on single object.
23-
{min:1,max:10,count:10::uint64,size:52}
23+
{min:1,max:10,count:10::uint64,size:54}
2424
- name: stderr
2525
data: ""

cmd/super/dev/dig/frames/command.go

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import (
1515
"github.com/brimdata/super/pkg/storage"
1616
"github.com/brimdata/super/scode"
1717
"github.com/brimdata/super/sio"
18+
"github.com/brimdata/super/sio/bsupio"
1819
"github.com/brimdata/super/sup"
1920
)
2021

@@ -123,17 +124,22 @@ func (m *metaReader) Read() (*super.Value, error) {
123124
func (m *metaReader) nextFrame() (any, error) {
124125
r := m.reader
125126
pos := r.pos
126-
code, err := r.ReadByte()
127+
version, err := r.ReadByte()
127128
if err != nil {
128129
return nil, noEOF(err)
129130
}
130-
if code == 0xff {
131+
if version == 0xff {
131132
return &Frame{Type: "EOS", Offset: pos}, nil
132133

133134
}
134-
if (code & 0x80) != 0 {
135-
return nil, errors.New("encountered wrong version bit in BSUP framing")
135+
if err := bsupio.CheckVersion(version); err != nil {
136+
return nil, err
136137
}
138+
code, err := r.ReadByte()
139+
if err != nil {
140+
return nil, noEOF(err)
141+
}
142+
137143
var block any
138144
if (code & 0x40) != 0 {
139145
block, err = r.readComp(code)

cmd/super/dev/dig/ztests/frames.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ outputs:
1515
- name: stdout
1616
data: |
1717
{type:"types",offset:0,block:{type:"uncompressed",length:6}}
18-
{type:"values",offset:8,block:{type:"uncompressed",length:4}}
19-
{type:"EOS",offset:14,block:null}
20-
{type:"types",offset:15,block:{type:"uncompressed",length:6}}
21-
{type:"values",offset:23,block:{type:"uncompressed",length:4}}
22-
{type:"EOS",offset:29,block:null}
18+
{type:"values",offset:9,block:{type:"uncompressed",length:4}}
19+
{type:"EOS",offset:16,block:null}
20+
{type:"types",offset:17,block:{type:"uncompressed",length:6}}
21+
{type:"values",offset:26,block:{type:"uncompressed",length:4}}
22+
{type:"EOS",offset:33,block:null}

csup/ztests/const.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ inputs:
1212
outputs:
1313
- name: stdout
1414
data: |
15-
{Version:13::uint32,MetaSize:37::uint64,DataSize:0::uint64,Root:0::uint32}
15+
{Version:13::uint32,MetaSize:39::uint64,DataSize:0::uint64,Root:0::uint32}
1616
{Value:1,Count:3::uint32}::=Const

0 commit comments

Comments
 (0)