11Binary JData: A portable interchange format for complex binary data
22============================================================
33
4- - ** Status of this document** : Request for comments
54- ** Maintainer** : Qianqian Fang <q.fang at neu.edu>
65- ** License** : Apache License, Version 2.0
7- - ** Version** : 1 (Draft 2)
8- - ** Last Stable Release** : [ Version 1 (Draft 2)] ( https://github.com/NeuroJSON/bjdata/blob/Draft_2/Binary_JData_Specification.md )
6+ - ** Version** : 1 (Draft 3)
7+ - ** URL** : https://neurojson.org/bjdata/draft3
8+ - ** Status** : Frozen on March 23, 2025. For future updates, please see the Development URL below
9+ - ** Development** : https://github.com/NeuroJSON/bjdata
10+ - ** Acknowledgement** : This project is supported by US National Institute of Health (NIH)
11+ grant [ U24-NS124027 (NeuroJSON)] ( https://neurojson.org )
912- ** Abstract** :
1013
1114> The Binary JData (BJData) Specification defines an efficient serialization
1215 protocol for unambiguously storing complex and strongly-typed binary data found
1316in diverse applications. The BJData specification is the binary counterpart
1417to the JSON format, both of which are used to serialize complex data structures
15- supported by the JData specification (http ://openjdata .org ). The BJData spec is
16- derived and extended from the Universal Binary JSON (UBJSON, http ://ubjson.org )
18+ supported by the JData specification (https ://neurojson .org/jdata ). The BJData spec is
19+ derived and extended from the Universal Binary JSON (UBJSON, https ://ubjson.org )
1720specification (Draft 12). It adds supports for N-dimensional packed arrays and
1821extended binary data types.
1922
@@ -45,31 +48,32 @@ backends, medical imaging, and scientific data storage.
4548The lack of support for strongly-typed and binary data has been one of the main
4649barriers towards widespread adoption of JSON in these domains. In recent years,
4750efforts to address these limitation have resulted in an array of versatile binary
48- JSON formats, such as BSON (Binary JSON, http ://bson.org ), UBJSON (Universal Binary
49- JSON, http ://ubjson.org ), MessagePack (https://msgpack.org ), CBOR (Concise Binary
51+ JSON formats, such as BSON (Binary JSON, https ://bson.org ), UBJSON (Universal Binary
52+ JSON, https ://ubjson.org ), MessagePack (https://msgpack.org ), CBOR (Concise Binary
5053Object Representation, [ RFC 7049] , https://cbor.io ) etc. These binary JSON
5154counterparts are broadly used in speed-sensitive data processing applications and
5255address various needs from a diverse range of applications.
5356
5457To better champion findable, accessible, interoperable, and reusable
5558([ FAIR principle] ( https://www.nature.com/articles/sdata201618 ) ) data in
56- scientific data storage and management, we have created the ** OpenJData Initiative **
57- (http ://openjdata .org ) to develop a set of open-standards for portable, human-readable
59+ scientific data storage and management, we have created the ** NeuroJSON Project **
60+ (https ://neurojson .org ) to develop a set of open-standards for portable, human-readable
5861and high-performance data annotation and serialization aimed towards enabling
59- scientific researchers, IT engineers, as well as general data users to efficiently
60- annotate and store complex data structures arising from diverse applications.
62+ neuroimaging researchers, scientific researchers, IT engineers, as well as general
63+ data users to efficiently annotate and store complex data structures arising
64+ from diverse applications.
6165
62- The OpenJData framework first converts complex data structures, such as N-D
66+ The NeuroJSON data sharing framework first converts complex data structures, such as N-D
6367arrays, trees, tables and graphs, into easy-to-serialize, portable data annotations
6468via the ** JData Specification** (https://github.com/NeuroJSON/jdata ) and then serializes
6569and stores the annotated JData constructs using widely-supported data formats.
66- To balance data portability, readability and efficiency, OpenJData defines a
70+ To balance data portability, readability and efficiency, NeuroJSON defines a
6771** dual-interface** : a text-based format ** syntactically compatible with JSON** ,
6872and a binary-JSON format to achieve significantly smaller file sizes and faster
6973encoding/decoding.
7074
7175The Binary JData (BJData) format is the ** official binary interface** for the JData
72- specification . It is derived from the widely supported UBJSON Specification
76+ Specification . It is derived from the widely supported UBJSON Specification
7377Draft 12 (https://github.com/ubjson/universal-binary-json ), and adds native
7478support for ** N-dimensional packed arrays** - an essential data structure for
7579scientific applications - as well as extended binary data types, including unsigned
@@ -95,7 +99,7 @@ License
9599------------------------------
96100
97101The Binary JData Specification is licensed under the
98- [ Apache 2.0 License] ( http ://www.apache.org/licenses/LICENSE-2.0.html) .
102+ [ Apache 2.0 License] ( https ://www.apache.org/licenses/LICENSE-2.0.html) .
99103
100104
101105Format Specification
@@ -116,7 +120,7 @@ the data following it.
116120` uint16 ` , ` int16 ` , ` uint32 ` , ` int32 ` , ` uint64 ` or ` int64 ` ) specifying the length
117121of the following data payload.
118122
119- - ** data** (_ optional_ ) - A contiguous byte-stream containing serialized binary
123+ - ** data** (_ optional_ ) - A contiguous byte-stream containing serialized binary
120124data representing the actual binary data for this type of value.
121125
122126### Notes
@@ -170,6 +174,7 @@ Type | Total size | ASCII Marker(s) | Length required | Data (payload)
170174[ float64/double] ( #value_numeric ) | 9 bytes | * D* | No | Yes
171175[ high-precision number] ( #value_numeric ) | 1 byte + int num val + string byte len | * H* | Yes | Yes
172176[ char] ( #value_char ) | 2 bytes | * C* | No | Yes
177+ [ byte] ( #value_byte ) | 2 bytes | * B* | No | Yes
173178[ string] ( #value_string ) | 1 byte + int num val + string byte len | * S* | Yes | Yes (if not empty)
174179[ array] ( #container_array ) | 2+ bytes | * \[ * and * \] * | Optional | Yes (if not empty)
175180[ object] ( #container_object ) | 2+ bytes | * {* and * }* | Optional | Yes (if not empty)
@@ -246,8 +251,8 @@ uint32| Yes | 0 | 4,294,967,295
246251int64 | No | -9,223,372,036,854,775,808 | 9,223,372,036,854,775,807
247252uint64| Yes | 0 | 18,446,744,073,709,551,615
248253float16/half | Yes | See [ IEEE 754 Spec] ( https://en.wikipedia.org/wiki/IEEE_754-2008_revision ) | See [ IEEE 754 Spec] ( https://en.wikipedia.org/wiki/IEEE_754-2008_revision )
249- float32/single | Yes | See [ IEEE 754 Spec] ( http ://en.wikipedia.org/wiki/IEEE_754-1985) | See [ IEEE 754 Spec] ( https://en.wikipedia.org/wiki/IEEE_754-1985 )
250- float64/double | Yes | See [ IEEE 754 Spec] ( http ://en.wikipedia.org/wiki/IEEE_754-1985) | See [ IEEE 754 Spec] ( https://en.wikipedia.org/wiki/IEEE_754-1985 )
254+ float32/single | Yes | See [ IEEE 754 Spec] ( https ://en.wikipedia.org/wiki/IEEE_754-1985) | See [ IEEE 754 Spec] ( https://en.wikipedia.org/wiki/IEEE_754-1985 )
255+ float64/double | Yes | See [ IEEE 754 Spec] ( https ://en.wikipedia.org/wiki/IEEE_754-1985) | See [ IEEE 754 Spec] ( https://en.wikipedia.org/wiki/IEEE_754-1985 )
251256high-precision number | Yes | Infinite | Infinite
252257
253258** Notes** :
@@ -263,7 +268,7 @@ integers are written in Big-Endian order).
263268
264269#### Float
265270All float types (` half ` , ` single ` , ` double ` are written in ** Little-Endian order**
266- (this is different from UBJSON which does not specify the endianness of floats).
271+ (this is different from UBJSON which does not specify the Endianness of floats).
267272
268273- ` float16 ` or half-precision values are written in [ IEEE 754 half precision floating point
269274format] ( https://en.wikipedia.org/wiki/IEEE_754-2008_revision ) , which has the following
@@ -273,14 +278,14 @@ structure:
273278 - Bit 9-0 (10 bits) - fraction (significant)
274279
275280- ` float32 ` or single-precision values are written in [ IEEE 754 single precision floating point
276- format] ( http ://en.wikipedia.org/wiki/IEEE_754-1985) , which has the following
281+ format] ( https ://en.wikipedia.org/wiki/IEEE_754-1985) , which has the following
277282structure:
278283 - Bit 31 (1 bit) - sign
279284 - Bit 30-23 (8 bits) - exponent
280285 - Bit 22-0 (23 bits) - fraction (significant)
281286
282287- ` float64 ` or double-precision values are written in [ IEEE 754 double precision floating point
283- format] ( http ://en.wikipedia.org/wiki/IEEE_754-1985) , which has the following
288+ format] ( https ://en.wikipedia.org/wiki/IEEE_754-1985) , which has the following
284289structure:
285290 - Bit 63 (1 bit) - sign
286291 - Bit 62-52 (11 bits) - exponent
@@ -290,7 +295,7 @@ structure:
290295#### High-Precision
291296These are encoded as a string and thus are only limited by the maximum string
292297size. Values ** must** be written out in accordance with the original [ JSON
293- number type specification] ( http ://json.org) .
298+ number type specification] ( https ://json.org) .
294299
295300#### Examples
296301Numeric values in JSON:
@@ -353,6 +358,33 @@ BJData (using block-notation):
353358[}]
354359```
355360
361+ ---
362+ ### <a name =" value_byte " />Byte
363+ The ` byte ` type in BJData is functionally identical to the ` uint8 ` type,
364+ but semantically is meant to represent a byte and not a numeric value. In
365+ particular, when used as the strong type of an array container it provides
366+ a hint to the parser that an optimized data storage format may be used as
367+ opposed to a generic array of integers.
368+
369+ See also [ optimized format] ( #container_optimized ) below.
370+
371+ #### Example
372+ Byte values in JSON:
373+ ``` json
374+ {
375+ "binary" : [222 , 173 , 190 , 239 ]
376+ "val" : 123 ,
377+ }
378+ ```
379+
380+ BJData (using block-notation):
381+ ```
382+ [{]
383+ [i][6][binary] [[] [$][B] [#][i][4] [222][173][190][239]
384+ [i][3][val][B][123]
385+ [}]
386+ ```
387+
356388---
357389### <a name =" value_string " />String
358390The ` string ` type in BJData is equivalent to the ` string ` type from the JSON
@@ -457,17 +489,17 @@ thought of as providing the ability to create a strongly-typed container in BJDa
457489
458490A major different between BJData and UBJSON is that the _ type_ in a BJData
459491strongly-typed container is limited to ** non-zero-fixed-length data types** , therefore,
460- only integers (` i,U,I,u,l,m,L,M ` ), floating-point numbers (` h,d,D ` ) and char (` C ` )
492+ only integers (` i,U,I,u,l,m,L,M ` ), floating-point numbers (` h,d,D ` ), char (` C ` ) and byte ( ` B ` )
461493are qualified. All zero-length types (` T,F,Z,N ` ), variable-length types(` S, H ` )
462494and container types (` [,{ ` ) shall not be used in an optimized _ type_ header.
463495This restriction is set to reduce the security risks due to potentials of
464496buffer-overflow attacks using [ zero-length markers] ( https://github.com/nlohmann/json/issues/2793 ) ,
465- hampered readability and dimished benefit using variable/container
497+ hampered readability and diminished benefit using variable/container
466498types in an optimized format.
467499
468500The requirements for _ type_ are
469501
470- - If a _ type_ is specified, it ** must** be one of ` i,U,I,u,l,m,L,M,h,d,D,C ` .
502+ - If a _ type_ is specified, it ** must** be one of ` i,U,I,u,l,m,L,M,h,d,D,C,B ` .
471503- If a _ type_ is specified, it ** must** be done so before a _ count_ .
472504- If a _ type_ is specified, a _ count_ ** must** be specified as well. (Otherwise
473505it is impossible to tell when a container is ending, e.g. did you just parse
@@ -493,6 +525,14 @@ bytes while parsing.
493525[#][i][64]
494526```
495527
528+ ### Optimized binary array
529+ When an array of _ type_ ` B ` is specified the parser shall use an optimized data storage
530+ format to represent binary data where applicable, as opposed to a generic array of integers.
531+ Similarly, explicit binary data should be serialized as such to allow for parsers to
532+ make use of the optimization.
533+
534+ If such a data storage format is not available, an array of integers shall be used.
535+
496536### Optimized N-dimensional array of uniform type
497537When both _ type_ and _ count_ are specified and the _ count_ marker ` # ` is followed
498538by ` [ ` , the parser should expect the following sequence to be a 1-D ` array ` with
@@ -514,6 +554,19 @@ all non-negative numbers specifying the dimensions of the N-dimensional array.
514554The binary data of the N-dimensional array is then serialized into a 1-D vector
515555in the ** row-major** element order (similar to C, C++, Javascript or Python) .
516556
557+ To store an N-dimensional array that is serialized using the ** column-major** element
558+ order (as used in MATLAB and FORTRAN), the _ count_ marker ` # ` should be followed by
559+ an array of a single element, which must be a 1-D array of integer type as the
560+ dimensional vector above. Either of the arrays can be in optimized or non-optimized
561+ form. For example, either of the following
562+
563+ ```
564+ [[] [$] [type] [#] [[] [[] [$] [Nx type] [#] [Ndim type] [Ndim] [Nx Ny Nz ...] []] [a11 a21 a31 ... a21 a22 ...]
565+ or
566+ [[] [$] [type] [#] [[] [[] [Nx type] [nx] [Ny type] [Ny] [Nz type] [Nz] ... []] []] [a11 a21 a31 ... a21 a22 ...]
567+ ```
568+ represents the same column-major N-dimensional array of ` type ` and size ` [Nx, Ny, Nz, ...] ` .
569+
517570
518571#### Example (a 2x3x4 ` uint8 ` array):
519572The following 2x3x4 3-D ` uint8 ` array
@@ -531,22 +584,26 @@ The following 2x3x4 3-D `uint8` array
531584 ]
532585]
533586```
534- shall be stored as
587+ shall be stored using ** row-major ** serialized form as
535588```
536589 [[] [$][U] [#][[] [$][U][#][3] [2][3][4]
537590 [1][9][6][0] [2][9][3][1] [8][0][9][6] [6][4][2][7] [8][5][1][2] [3][3][2][6]
538591```
539-
592+ or ** column-major** serialized form as
593+ ```
594+ [[] [$][U] [#][[] [[] [$][U][#][3] [2][3][4] []]
595+ [1][6][2][8] [8][3][9][4] [9][5][0][3] [6][2][3][1] [9][2][0][7] [1][2][6][6]
596+ ```
540597
541598### Additional rules
542599- A _ count_ ** must** be >= 0.
543- - A _ count_ can be specified by itself .
600+ - A _ count_ can be specified alone .
544601- If a _ count_ is specified, the container ** must not** specify an end-marker.
545602- A container that specifies a _ count_ ** must** contain the specified number of
546603child elements.
547604- If a _ type_ is specified, it ** must** be done so before _ count_ .
548605- If a _ type_ is specified, a _ count_ ** must** also be specified. A _ type_
549- cannot be specified by itself .
606+ cannot be specified alone .
550607- A container that specifies a _ type_ ** must not** contain any additional
551608_ type_ markers for any contained value.
552609
@@ -601,13 +658,13 @@ The MIME type for a Binary JData document is **`"application/jdata-binary"`**
601658Acknowledgement
602659------------------------------
603660
604- The BJData spec is derived from the Universal Binary JSON (UBJSON, http ://ubjson.org )
661+ The BJData spec is derived from the Universal Binary JSON (UBJSON, https ://ubjson.org )
605662specification (Draft 12) developed by Riyad Kalla and other UBJSON contributors.
606663
607664The initial version of this MarkDown-formatted specification was derived from the
608665documentation included in the [ Py-UBJSON] ( https://github.com/Iotic-Labs/py-ubjson/blob/dev-contrib/UBJSON-Specification.md )
609666repository (Commit 5ce1fe7).
610667
611- This specification was developed as part of the NeuroJSON project (http ://neurojson.org )
668+ This specification was developed as part of the NeuroJSON project (https ://neurojson.org )
612669with funding support from the US National Institute of Health (NIH) under
613670grant [ U24-NS124027] ( https://reporter.nih.gov/project-details/10308329 ) .
0 commit comments