Skip to content

Commit 71c3668

Browse files
authored
[Feat] initial draft of structure-of-arrays using object schema
Using a payload-less object `${..}` as the schema following the optimized type marker `$` allows one to efficiently pack object data into an optimized container, ideally suited for table-like data. ``` [$ {<schema>} #<count> <payload> // row-major (interleaved) {$ {<schema>} #<count> <payload> // column-major (columnar) ```
1 parent c39c97a commit 71c3668

File tree

1 file changed

+355
-3
lines changed

1 file changed

+355
-3
lines changed

Binary_JData_Specification.md

Lines changed: 355 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ Binary JData: A portable interchange format for complex binary data
33

44
- **Maintainer**: Qianqian Fang <q.fang at neu.edu>
55
- **License**: Apache License, Version 2.0
6-
- **Version**: 1 (Draft 3)
7-
- **URL**: https://neurojson.org/bjdata/draft3
8-
- **Status**: Frozen on March 23, 2025. For future updates, please see the Development URL below
6+
- **Version**: 1 (Draft 4-preview)
7+
- **URL**: https://neurojson.org/bjdata/
8+
- **Status**: Under development
99
- **Development**: https://github.com/NeuroJSON/bjdata
1010
- **Acknowledgement**: This project is supported by US National Institute of Health (NIH)
1111
grant [U24-NS124027 (NeuroJSON)](https://neurojson.org)
@@ -30,6 +30,13 @@ extended binary data types.
3030
- [Value types](#value_types)
3131
- [Container types](#container_types)
3232
- [Optimized format](#container_optimized)
33+
- [Structrure-of-arrays (SOA)](#strucutre-of-arrays)
34+
- [Row-major SOA](#row-major-soa)
35+
- [Column-major SOA](#row-major_column-major)
36+
- [Nested Containers in Schema](#nested_container_in_schema)
37+
- [N-Dmensional SoA](#nd_soa)
38+
- [SoA Example](#nd_soa)
39+
- [Permitted type markers in SOA schema](#schema_marker_table)
3340
- [Recommended File Specifiers](#recommended-file-specifiers)
3441
- [Acknowledgement](#acknowledgement)
3542

@@ -649,6 +656,351 @@ Optimized with both _type_ and _count_
649656
// No end marker since a count was specified.
650657
```
651658

659+
660+
## <a name="structure-of-arrays"/>Structure-of-Arrays (SoA)
661+
662+
BJData supports **Structure-of-Arrays (SoA)** to store packed object data in either row-major or column-major orders.
663+
664+
### SoA Container Syntax
665+
666+
#### Core Syntax
667+
668+
```
669+
[[][$] [{]<schema>[}] [#]<count> <payload> // row-major (interleaved)
670+
[{][$] [{]<schema>[}] [#]<count> <payload> // column-major (columnar)
671+
```
672+
673+
where:
674+
- `[` or `{` - container type (determines memory layout)
675+
- `$` - optimized type marker
676+
- `{<schema>}` - payload-less object defining the record structure
677+
- `#` - count marker
678+
- `<count>` - 1D integer OR ND dimension array
679+
- `<payload>` - tightly packed data
680+
681+
#### Schema Definition
682+
683+
The schema is a **payload-less object**: keys followed by type markers only, no values.
684+
685+
```
686+
schema = '{' 1*(field-def) '}'
687+
field-def = name type-spec
688+
name = int-type length string-bytes
689+
type-spec = fixed-type | bool-type | null-type | fixed-string | fixed-highprec
690+
| nested-schema | fixed-array
691+
fixed-type = 'U' | 'i' | 'u' | 'I' | 'l' | 'm' | 'L' | 'M' | 'h' | 'd' | 'D' | 'C' | 'B'
692+
bool-type = 'T' ; boolean (1 byte: T or F in payload)
693+
null-type = 'Z' ; null (0 bytes in payload)
694+
fixed-string = 'S' int-type length ; fixed-size string
695+
fixed-highprec= 'H' int-type length ; fixed-size high-precision number
696+
nested-schema = '{' 1*(field-def) '}'
697+
fixed-array = '[' 1*(type-spec) ']' ; regular array with explicit types
698+
```
699+
700+
**Key rules:**
701+
1. Fixed-length numeric types: `U i u I l m L M h d D C B`
702+
2. `T` in schema means "boolean type" - each value is 1 byte (`T` or `F` marker) in payload
703+
3. `Z` in schema means "null field" - no bytes in payload (placeholder/reserved field)
704+
4. `S` and `H` require a length specifier, making them fixed-length
705+
5. Nested objects `{...}` are allowed if all fields are fixed-length
706+
6. Fixed arrays use regular syntax `[type type ...]` - no optimized containers inside schema
707+
7. **No `$` or `#` markers allowed anywhere inside the schema**
708+
8. `F` and `N` are not used in schema (use `T` for boolean, `Z` for null)
709+
710+
#### Fixed-Length Strings (`S`) and High-Precision Numbers (`H`)
711+
712+
In normal BJData, `S` and `H` are variable-length:
713+
```
714+
S i 5 h e l l o ; string value, length 5
715+
H i 3 1 2 3 ; high-precision value "123"
716+
```
717+
718+
In a **schema context**, they define fixed-length types:
719+
```
720+
{ i4 name S i 16 } ; "name" is a 16-byte fixed string
721+
{ i5 value H i 32 } ; "value" is a 32-byte fixed high-precision number
722+
```
723+
724+
In the payload, each record contributes exactly the specified bytes - no length prefix. Strings shorter than the length are right-padded with null bytes (0x00).
725+
726+
#### Boolean Type (`T`)
727+
728+
In normal BJData, `T` and `F` are zero-length value markers:
729+
```
730+
T ; true (no payload)
731+
F ; false (no payload)
732+
```
733+
734+
In a **schema context**, `T` means "boolean type" - a 1-byte field:
735+
```
736+
{ i6 active T } ; "active" is a boolean field
737+
```
738+
739+
In the payload, each boolean value is stored as a single byte: `T` (0x54) for true, `F` (0x46) for false.
740+
741+
#### Null Type (`Z`)
742+
743+
In a **schema context**, `Z` means "null/placeholder field" with **zero bytes** in payload:
744+
```
745+
{
746+
i2 id m ; uint32 (4 bytes)
747+
i8 reserved Z ; placeholder (0 bytes)
748+
i4 data d ; float64 (8 bytes)
749+
}
750+
```
751+
752+
This is useful for:
753+
- Reserved fields for future expansion
754+
- Marking fields that exist in the schema but carry no data
755+
- Sparse structures where some fields are always null
756+
757+
---
758+
759+
### <a name="row-major_column-major"/>Row-Major vs Column-Major Layout
760+
761+
Using existing container markers (no new markers needed):
762+
763+
| Syntax | Layout | Description |
764+
|--------|--------|-------------|
765+
| `[$` | **Row-major (Interleaved)** | Array of records - each complete record stored sequentially |
766+
| `{$` | **Column-major (Columnar)** | Object of arrays - all values of each field stored together |
767+
768+
#### Row-Major: `[$`
769+
770+
```
771+
[$ {<schema>} #<count> <interleaved-payload>
772+
```
773+
774+
Payload order: `<record1><record2><record3>...`
775+
776+
**Example:** 3 particles with `{x:float64, y:float64, id:uint32, active:bool}`
777+
```
778+
[ $ { i1 x d i1 y d i2 id m i6 active T } # i 3
779+
<x1:8><y1:8><id1:4><active1:1> <x2:8><y2:8><id2:4><active2:1> ...
780+
```
781+
Payload: 3 × 21 bytes = 63 bytes, interleaved
782+
783+
#### Column-Major: `{$`
784+
785+
```
786+
{$ {<schema>} #<count> <columnar-payload>
787+
```
788+
789+
Payload order: `<all field1 values><all field2 values>...`
790+
791+
**Example:** Same 3 particles
792+
```
793+
{ $ { i1 x d i1 y d i2 id m i6 active T } # i 3
794+
<x1:8><x2:8><x3:8> <y1:8><y2:8><y3:8> <id1:4><id2:4><id3:4> <T><F><T>
795+
```
796+
Payload: (3×8) + (3×8) + (3×4) + (3×1) = 63 bytes, columnar
797+
798+
**Why this design:**
799+
- `[` = "ordered sequence" → sequence of records (row-major)
800+
- `{` = "named fields" → fields as separate arrays (column-major)
801+
- No new markers needed
802+
803+
---
804+
805+
### <a name="nested_container_in_schema"/>Nested Containers in Schema
806+
807+
#### Nested Objects
808+
809+
```
810+
{
811+
i4 name S i 32 ; 32-byte fixed string
812+
i8 position { ; nested object (24 bytes total)
813+
i1 x d
814+
i1 y d
815+
i1 z d
816+
}
817+
i6 active T ; boolean (1 byte)
818+
i5 flags U ; uint8 (1 byte)
819+
}
820+
```
821+
822+
Record size: 32 + 24 + 1 + 1 = 58 bytes
823+
824+
#### Fixed-Length Arrays in Schema
825+
826+
Use regular array syntax with repeated type markers:
827+
828+
```
829+
{
830+
i2 id m ; uint32 (4 bytes)
831+
i3 pos [d d d] ; array of 3 float64 (24 bytes)
832+
i5 color [U U U U] ; array of 4 uint8 (4 bytes)
833+
i5 flags [T T T T] ; array of 4 booleans (4 bytes)
834+
}
835+
```
836+
837+
Record size: 4 + 24 + 4 + 4 = 36 bytes
838+
839+
For longer arrays, repeat the type marker:
840+
```
841+
{
842+
i4 data [d d d d d d d d d d] ; array of 10 float64 (80 bytes)
843+
}
844+
```
845+
846+
#### Nested Arrays with Mixed Types
847+
848+
```
849+
{
850+
i6 vertex [d d d] ; position: 3 float64 (24 bytes)
851+
i6 normal [h h h] ; normal: 3 float16 (6 bytes)
852+
i5 color [U U U U] ; RGBA: 4 uint8 (4 bytes)
853+
i7 visible T ; visibility: boolean (1 byte)
854+
}
855+
```
856+
857+
Record size: 24 + 6 + 4 + 1 = 35 bytes
858+
859+
#### Combined Example
860+
861+
```json
862+
{
863+
"id": 12345,
864+
"name": "sensor_01",
865+
"position": {"x": 1.0, "y": 2.0, "z": 3.0},
866+
"readings": [0.1, 0.2, 0.3, 0.4, 0.5],
867+
"active": true
868+
}
869+
```
870+
871+
**Schema:**
872+
```
873+
{
874+
i2 id m ; uint32 (4 bytes)
875+
i4 name S i 16 ; fixed 16-byte string
876+
i8 position { ; nested object (24 bytes)
877+
i1 x d
878+
i1 y d
879+
i1 z d
880+
}
881+
i8 readings [d d d d d] ; array of 5 float64 (40 bytes)
882+
i6 active T ; boolean (1 byte)
883+
}
884+
```
885+
886+
Record size: 4 + 16 + 24 + 40 + 1 = 85 bytes
887+
888+
---
889+
890+
### <a name="nd_soa"/>N-Dimensional SoA
891+
892+
Both `[$` and `{$` support ND dimensions:
893+
894+
```
895+
[$ {<schema>} #[<dim1> <dim2> ...] <payload>
896+
{$ {<schema>} #[<dim1> <dim2> ...] <payload>
897+
```
898+
899+
**Example:** 4×3 grid of particles (row-major)
900+
```
901+
[ $ { i1 x d i1 y d i6 active T } # [ i 4 i 3 ]
902+
<12 records in row-major order>
903+
```
904+
905+
Total: 12 records × 17 bytes = 204 bytes
906+
907+
### <a name="soa_example"/> SOA Example
908+
909+
**Data:** 2 sensors
910+
911+
```json
912+
[
913+
{"id": 1, "pos": {"x": 1.0, "y": 2.0}, "val": [0.1, 0.2, 0.3], "on": true},
914+
{"id": 2, "pos": {"x": 3.0, "y": 4.0}, "val": [0.4, 0.5, 0.6], "on": false}
915+
]
916+
```
917+
918+
**Row-major encoding:**
919+
```
920+
Byte Hex Meaning
921+
---- ---- -------
922+
0 5B [ (array-style SoA = row-major)
923+
1 24 $
924+
2 7B { (schema start)
925+
3 69 i (int8 key length)
926+
4 02 2
927+
5-6 6964 "id"
928+
7 6D m (uint32)
929+
8 69 i
930+
9 03 3
931+
10-12 706F73 "pos"
932+
13 7B { (nested object start)
933+
14 69 i
934+
15 01 1
935+
16 78 "x"
936+
17 64 d (float64)
937+
18 69 i
938+
19 01 1
939+
20 79 "y"
940+
21 64 d (float64)
941+
22 7D } (nested object end)
942+
23 69 i
943+
24 03 3
944+
25-27 76616C "val"
945+
28 5B [ (array start)
946+
29 64 d (float64)
947+
30 64 d
948+
31 64 d
949+
32 5D ] (array end)
950+
33 69 i
951+
34 02 2
952+
35-36 6F6E "on"
953+
37 54 T (boolean type)
954+
38 7D } (schema end)
955+
39 23 #
956+
40 69 i
957+
41 02 2 (count = 2)
958+
--- PAYLOAD (2 records × 45 bytes) ---
959+
42-45 id1: uint32 = 1
960+
46-53 pos.x1: float64 = 1.0
961+
54-61 pos.y1: float64 = 2.0
962+
62-69 val1[0]: float64 = 0.1
963+
70-77 val1[1]: float64 = 0.2
964+
78-85 val1[2]: float64 = 0.3
965+
86 on1: T (true)
966+
87-90 id2: uint32 = 2
967+
91-98 pos.x2: float64 = 3.0
968+
99-106 pos.y2: float64 = 4.0
969+
107-114 val2[0]: float64 = 0.4
970+
115-122 val2[1]: float64 = 0.5
971+
123-130 val2[2]: float64 = 0.6
972+
131 on2: F (false)
973+
```
974+
975+
Record size: 4 + 8 + 8 + 24 + 1 = 45 bytes
976+
Total: 42 (header) + 90 (payload) = 132 bytes
977+
978+
979+
### <a name="schema_marker_table"/>Permitted type markers in SOA schema
980+
981+
| Marker | In Schema Means | Payload Size | Notes |
982+
|--------|-----------------|--------------|-------|
983+
| `U` | uint8 | 1 byte | |
984+
| `i` | int8 | 1 byte | |
985+
| `u` | uint16 | 2 bytes | |
986+
| `I` | int16 | 2 bytes | |
987+
| `l` | int32 | 4 bytes | |
988+
| `m` | uint32 | 4 bytes | |
989+
| `L` | int64 | 8 bytes | |
990+
| `M` | uint64 | 8 bytes | |
991+
| `h` | float16 | 2 bytes | |
992+
| `d` | float32 | 4 bytes | |
993+
| `D` | float64 | 8 bytes | |
994+
| `C` | char | 1 byte | |
995+
| `B` | byte | 1 byte | |
996+
| `T` | boolean | 1 byte | Payload: `T` or `F` |
997+
| `Z` | null/placeholder | 0 bytes | No payload |
998+
| `S` + len | fixed string | len bytes | No length prefix in payload |
999+
| `H` + len | fixed high-prec | len bytes | No length prefix in payload |
1000+
| `{...}` | nested object | sum of fields | |
1001+
| `[...]` | fixed array | sum of elements | |
1002+
1003+
6521004
Recommended File Specifiers
6531005
------------------------------
6541006

0 commit comments

Comments
 (0)