@@ -64,12 +64,14 @@ then write the metadata into the reassembly section along with the trailer
6464at the end. This allows a stream to be converted to a Super Columnar file
6565in a single pass.
6666
67- {{< tip "Note" >}}
67+ {{% tip "Note" %}}
68+
6869That said, the layout is
6970flexible enough that an implementation may optimize the data layout with
7071additional passes or by writing the output to multiple files then
7172merging them together (or even leaving the Super Columnar entity as separate files).
72- {{< /tip >}}
73+
74+ {{% /tip %}}
7375
7476### The Data Section
7577
@@ -85,17 +87,20 @@ There is no information in the data section for how segments relate
8587to one another or how they are reconstructed into columns. They are just
8688blobs of Super Binary data.
8789
88- {{< tip "Note" >}}
90+ {{% tip "Note" %}}
91+
8992Unlike Parquet, there is no explicit arrangement of the column chunks into
9093row groups but rather they are allowed to grow at different rates so a
9194high-volume column might be comprised of many segments while a low-volume
9295column must just be one or several. This allows scans of low-volume record types
9396(the "mice") to perform well amongst high-volume record types (the "elephants"),
9497i.e., there are not a bunch of seeks with tiny reads of mice data interspersed
9598throughout the elephants.
96- {{< /tip >}}
9799
98- {{< tip "TBD" >}}
100+ {{% /tip %}}
101+
102+ {{% tip "TBD" %}}
103+
99104The mice/elephants model creates an interesting and challenging layout
100105problem. If you let the row indexes get too far apart (call this "skew"), then
101106you have to buffer very large amounts of data to keep the column data aligned.
@@ -109,15 +114,17 @@ if you use lots of buffering on ingest, you can write the mice in front of the
109114elephants so the read path requires less buffering to align columns. Or you can
110115do two passes where you store segments in separate files then merge them at close
111116according to an optimization plan.
112- {{< /tip >}}
117+
118+ {{% /tip %}}
113119
114120### The Reassembly Section
115121
116122The reassembly section provides the information needed to reconstruct
117123column streams from segments, and in turn, to reconstruct the original values
118124from column streams, i.e., to map columns back to composite values.
119125
120- {{< tip "Note" >}}
126+ {{% tip "Note" %}}
127+
121128Of course, the reassembly section also provides the ability to extract just subsets of columns
122129to be read and searched efficiently without ever needing to reconstruct
123130the original rows. How well this performs is up to any particular
@@ -127,7 +134,8 @@ Also, the reassembly section is in general vastly smaller than the data section
127134so the goal here isn't to express information in cute and obscure compact forms
128135but rather to represent data in an easy-to-digest, programmer-friendly form that
129136leverages Super Binary.
130- {{< /tip >}}
137+
138+ {{% /tip %}}
131139
132140The reassembly section is a Super Binary stream. Unlike Parquet,
133141which uses an externally described schema
@@ -147,9 +155,11 @@ A super type's integer position in this sequence defines its identifier
147155encoded in the [ super column] ( #the-super-column ) . This identifier is called
148156the super ID.
149157
150- {{< tip "Note" >}}
158+ {{% tip "Note" %}}
159+
151160Change the first N values to type values instead of nulls?
152- {{< /tip >}}
161+
162+ {{% /tip %}}
153163
154164The next N+1 records contain reassembly information for each of the N super types
155165where each record defines the column streams needed to reconstruct the original
@@ -171,11 +181,13 @@ type signature:
171181In the rest of this document, we will refer to this type as ` <segmap> ` for
172182shorthand and refer to the concept as a "segmap".
173183
174- {{< tip "Note" >}}
184+ {{% tip "Note" %}}
185+
175186We use the type name "segmap" to emphasize that this information represents
176187a set of byte ranges where data is stored and must be read from * rather than*
177188the data itself.
178- {{< /tip >}}
189+
190+ {{% /tip %}}
179191
180192#### The Super Column
181193
@@ -216,11 +228,13 @@ This simple top-down arrangement, along with the definition of the other
216228column structures below, is all that is needed to reconstruct all of the
217229original data.
218230
219- {{< tip "Note" >}}
231+ {{% tip "Note" %}}
232+
220233Each row reassembly record has its own layout of columnar
221234values and there is no attempt made to store like-typed columns from different
222235schemas in the same physical column.
223- {{< /tip >}}
236+
237+ {{% /tip %}}
224238
225239The notation ` <any_column> ` refers to any instance of the five column types:
226240* [ ` <record_column> ` ] ( #record-column ) ,
@@ -296,9 +310,11 @@ in the same column order implied by the union type, and
296310* ` tags ` is a column of ` int32 ` values where each subsequent value encodes
297311the tag of the union type indicating which column the value falls within.
298312
299- {{< tip "TBD" >}}
313+ {{% tip "TBD" %}}
314+
300315Change code to conform to columns array instead of record{c0,c1,...}
301- {{< /tip >}}
316+
317+ {{% /tip %}}
302318
303319The number of times each value of ` tags ` appears must equal the number of values
304320in each respective column.
@@ -350,14 +366,16 @@ data in the file,
350366it will typically fit comfortably in memory and it can be very fast to scan the
351367entire reassembly structure for any purpose.
352368
353- {{< tip "Example" >}}
369+ {{% tip "Example" %}}
370+
354371For a given query, a "scan planner" could traverse all the
355372reassembly records to figure out which segments will be needed, then construct
356373an intelligent plan for reading the needed segments and attempt to read them
357374in mostly sequential order, which could serve as
358375an optimizing intermediary between any underlying storage API and the
359376Super Columnar decoding logic.
360- {{< /tip >}}
377+
378+ {{% /tip %}}
361379
362380To decode the "next" row, its schema index is read from the root reassembly
363381column stream.
0 commit comments