Skip to content

Commit b1e1480

Browse files
committed
[doc] add bjdata spec 3, fix python 3.12 recursion test
1 parent 86bfef1 commit b1e1480

File tree

4 files changed

+104
-37
lines changed

4 files changed

+104
-37
lines changed

Binary_JData_Specification.md

Lines changed: 88 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,22 @@
11
Binary JData: A portable interchange format for complex binary data
22
============================================================
33

4-
- **Status of this document**: Request for comments
54
- **Maintainer**: Qianqian Fang <q.fang at neu.edu>
65
- **License**: Apache License, Version 2.0
7-
- **Version**: 1 (Draft 2)
8-
- **Last Stable Release**: [Version 1 (Draft 2)](https://github.com/NeuroJSON/bjdata/blob/Draft_2/Binary_JData_Specification.md)
6+
- **Version**: 1 (Draft 3)
7+
- **URL**: https://neurojson.org/bjdata/draft3
8+
- **Status**: Frozen on March 23, 2025. For future updates, please see the Development URL below
9+
- **Development**: https://github.com/NeuroJSON/bjdata
10+
- **Acknowledgement**: This project is supported by US National Institute of Health (NIH)
11+
grant [U24-NS124027 (NeuroJSON)](https://neurojson.org)
912
- **Abstract**:
1013

1114
> The Binary JData (BJData) Specification defines an efficient serialization
1215
protocol for unambiguously storing complex and strongly-typed binary data found
1316
in diverse applications. The BJData specification is the binary counterpart
1417
to the JSON format, both of which are used to serialize complex data structures
15-
supported by the JData specification (http://openjdata.org). The BJData spec is
16-
derived and extended from the Universal Binary JSON (UBJSON, http://ubjson.org)
18+
supported by the JData specification (https://neurojson.org/jdata). The BJData spec is
19+
derived and extended from the Universal Binary JSON (UBJSON, https://ubjson.org)
1720
specification (Draft 12). It adds supports for N-dimensional packed arrays and
1821
extended binary data types.
1922

@@ -45,31 +48,32 @@ backends, medical imaging, and scientific data storage.
4548
The lack of support for strongly-typed and binary data has been one of the main
4649
barriers towards widespread adoption of JSON in these domains. In recent years,
4750
efforts to address these limitation have resulted in an array of versatile binary
48-
JSON formats, such as BSON (Binary JSON, http://bson.org), UBJSON (Universal Binary
49-
JSON, http://ubjson.org), MessagePack (https://msgpack.org), CBOR (Concise Binary
51+
JSON formats, such as BSON (Binary JSON, https://bson.org), UBJSON (Universal Binary
52+
JSON, https://ubjson.org), MessagePack (https://msgpack.org), CBOR (Concise Binary
5053
Object Representation, [RFC 7049], https://cbor.io) etc. These binary JSON
5154
counterparts are broadly used in speed-sensitive data processing applications and
5255
address various needs from a diverse range of applications.
5356

5457
To better champion findable, accessible, interoperable, and reusable
5558
([FAIR principle](https://www.nature.com/articles/sdata201618)) data in
56-
scientific data storage and management, we have created the **OpenJData Initiative**
57-
(http://openjdata.org) to develop a set of open-standards for portable, human-readable
59+
scientific data storage and management, we have created the **NeuroJSON Project**
60+
(https://neurojson.org) to develop a set of open-standards for portable, human-readable
5861
and high-performance data annotation and serialization aimed towards enabling
59-
scientific researchers, IT engineers, as well as general data users to efficiently
60-
annotate and store complex data structures arising from diverse applications.
62+
neuroimaging researchers, scientific researchers, IT engineers, as well as general
63+
data users to efficiently annotate and store complex data structures arising
64+
from diverse applications.
6165

62-
The OpenJData framework first converts complex data structures, such as N-D
66+
The NeuroJSON data sharing framework first converts complex data structures, such as N-D
6367
arrays, trees, tables and graphs, into easy-to-serialize, portable data annotations
6468
via the **JData Specification** (https://github.com/NeuroJSON/jdata) and then serializes
6569
and stores the annotated JData constructs using widely-supported data formats.
66-
To balance data portability, readability and efficiency, OpenJData defines a
70+
To balance data portability, readability and efficiency, NeuroJSON defines a
6771
**dual-interface**: a text-based format **syntactically compatible with JSON**,
6872
and a binary-JSON format to achieve significantly smaller file sizes and faster
6973
encoding/decoding.
7074

7175
The Binary JData (BJData) format is the **official binary interface** for the JData
72-
specification. It is derived from the widely supported UBJSON Specification
76+
Specification. It is derived from the widely supported UBJSON Specification
7377
Draft 12 (https://github.com/ubjson/universal-binary-json), and adds native
7478
support for **N-dimensional packed arrays** - an essential data structure for
7579
scientific applications - as well as extended binary data types, including unsigned
@@ -95,7 +99,7 @@ License
9599
------------------------------
96100

97101
The Binary JData Specification is licensed under the
98-
[Apache 2.0 License](http://www.apache.org/licenses/LICENSE-2.0.html).
102+
[Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.html).
99103

100104

101105
Format Specification
@@ -116,7 +120,7 @@ the data following it.
116120
`uint16`, `int16`, `uint32`, `int32`, `uint64` or `int64`) specifying the length
117121
of the following data payload.
118122

119-
- **data** (_optional_) - A contiguous byte-stream containing serialized binary
123+
- **data** (_optional_) - A contiguous byte-stream containing serialized binary
120124
data representing the actual binary data for this type of value.
121125

122126
### Notes
@@ -170,6 +174,7 @@ Type | Total size | ASCII Marker(s) | Length required | Data (payload)
170174
[float64/double](#value_numeric) | 9 bytes | *D* | No | Yes
171175
[high-precision number](#value_numeric) | 1 byte + int num val + string byte len | *H* | Yes | Yes
172176
[char](#value_char) | 2 bytes | *C* | No | Yes
177+
[byte](#value_byte) | 2 bytes | *B* | No | Yes
173178
[string](#value_string) | 1 byte + int num val + string byte len | *S* | Yes | Yes (if not empty)
174179
[array](#container_array) | 2+ bytes | *\[* and *\]* | Optional | Yes (if not empty)
175180
[object](#container_object) | 2+ bytes | *{* and *}* | Optional | Yes (if not empty)
@@ -246,8 +251,8 @@ uint32| Yes | 0 | 4,294,967,295
246251
int64 | No | -9,223,372,036,854,775,808 | 9,223,372,036,854,775,807
247252
uint64| Yes | 0 | 18,446,744,073,709,551,615
248253
float16/half | Yes | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-2008_revision) | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-2008_revision)
249-
float32/single | Yes | See [IEEE 754 Spec](http://en.wikipedia.org/wiki/IEEE_754-1985) | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-1985)
250-
float64/double | Yes | See [IEEE 754 Spec](http://en.wikipedia.org/wiki/IEEE_754-1985) | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-1985)
254+
float32/single | Yes | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-1985) | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-1985)
255+
float64/double | Yes | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-1985) | See [IEEE 754 Spec](https://en.wikipedia.org/wiki/IEEE_754-1985)
251256
high-precision number | Yes | Infinite | Infinite
252257

253258
**Notes**:
@@ -263,7 +268,7 @@ integers are written in Big-Endian order).
263268

264269
#### Float
265270
All float types (`half`, `single`, `double` are written in **Little-Endian order**
266-
(this is different from UBJSON which does not specify the endianness of floats).
271+
(this is different from UBJSON which does not specify the Endianness of floats).
267272

268273
- `float16` or half-precision values are written in [IEEE 754 half precision floating point
269274
format](https://en.wikipedia.org/wiki/IEEE_754-2008_revision), which has the following
@@ -273,14 +278,14 @@ structure:
273278
- Bit 9-0 (10 bits) - fraction (significant)
274279

275280
- `float32` or single-precision values are written in [IEEE 754 single precision floating point
276-
format](http://en.wikipedia.org/wiki/IEEE_754-1985), which has the following
281+
format](https://en.wikipedia.org/wiki/IEEE_754-1985), which has the following
277282
structure:
278283
- Bit 31 (1 bit) - sign
279284
- Bit 30-23 (8 bits) - exponent
280285
- Bit 22-0 (23 bits) - fraction (significant)
281286

282287
- `float64` or double-precision values are written in [IEEE 754 double precision floating point
283-
format](http://en.wikipedia.org/wiki/IEEE_754-1985), which has the following
288+
format](https://en.wikipedia.org/wiki/IEEE_754-1985), which has the following
284289
structure:
285290
- Bit 63 (1 bit) - sign
286291
- Bit 62-52 (11 bits) - exponent
@@ -290,7 +295,7 @@ structure:
290295
#### High-Precision
291296
These are encoded as a string and thus are only limited by the maximum string
292297
size. Values **must** be written out in accordance with the original [JSON
293-
number type specification](http://json.org).
298+
number type specification](https://json.org).
294299

295300
#### Examples
296301
Numeric values in JSON:
@@ -353,6 +358,33 @@ BJData (using block-notation):
353358
[}]
354359
```
355360

361+
---
362+
### <a name="value_byte"/>Byte
363+
The `byte` type in BJData is functionally identical to the `uint8` type,
364+
but semantically is meant to represent a byte and not a numeric value. In
365+
particular, when used as the strong type of an array container it provides
366+
a hint to the parser that an optimized data storage format may be used as
367+
opposed to a generic array of integers.
368+
369+
See also [optimized format](#container_optimized) below.
370+
371+
#### Example
372+
Byte values in JSON:
373+
```json
374+
{
375+
"binary": [222, 173, 190, 239]
376+
"val": 123,
377+
}
378+
```
379+
380+
BJData (using block-notation):
381+
```
382+
[{]
383+
[i][6][binary] [[] [$][B] [#][i][4] [222][173][190][239]
384+
[i][3][val][B][123]
385+
[}]
386+
```
387+
356388
---
357389
### <a name="value_string"/>String
358390
The `string` type in BJData is equivalent to the `string` type from the JSON
@@ -457,17 +489,17 @@ thought of as providing the ability to create a strongly-typed container in BJDa
457489

458490
A major different between BJData and UBJSON is that the _type_ in a BJData
459491
strongly-typed container is limited to **non-zero-fixed-length data types**, therefore,
460-
only integers (`i,U,I,u,l,m,L,M`), floating-point numbers (`h,d,D`) and char (`C`)
492+
only integers (`i,U,I,u,l,m,L,M`), floating-point numbers (`h,d,D`), char (`C`) and byte (`B`)
461493
are qualified. All zero-length types (`T,F,Z,N`), variable-length types(`S, H`)
462494
and container types (`[,{`) shall not be used in an optimized _type_ header.
463495
This restriction is set to reduce the security risks due to potentials of
464496
buffer-overflow attacks using [zero-length markers](https://github.com/nlohmann/json/issues/2793),
465-
hampered readability and dimished benefit using variable/container
497+
hampered readability and diminished benefit using variable/container
466498
types in an optimized format.
467499

468500
The requirements for _type_ are
469501

470-
- If a _type_ is specified, it **must** be one of `i,U,I,u,l,m,L,M,h,d,D,C`.
502+
- If a _type_ is specified, it **must** be one of `i,U,I,u,l,m,L,M,h,d,D,C,B`.
471503
- If a _type_ is specified, it **must** be done so before a _count_.
472504
- If a _type_ is specified, a _count_ **must** be specified as well. (Otherwise
473505
it is impossible to tell when a container is ending, e.g. did you just parse
@@ -493,6 +525,14 @@ bytes while parsing.
493525
[#][i][64]
494526
```
495527

528+
### Optimized binary array
529+
When an array of _type_ `B` is specified the parser shall use an optimized data storage
530+
format to represent binary data where applicable, as opposed to a generic array of integers.
531+
Similarly, explicit binary data should be serialized as such to allow for parsers to
532+
make use of the optimization.
533+
534+
If such a data storage format is not available, an array of integers shall be used.
535+
496536
### Optimized N-dimensional array of uniform type
497537
When both _type_ and _count_ are specified and the _count_ marker `#` is followed
498538
by `[`, the parser should expect the following sequence to be a 1-D `array` with
@@ -514,6 +554,19 @@ all non-negative numbers specifying the dimensions of the N-dimensional array.
514554
The binary data of the N-dimensional array is then serialized into a 1-D vector
515555
in the **row-major** element order (similar to C, C++, Javascript or Python) .
516556

557+
To store an N-dimensional array that is serialized using the **column-major** element
558+
order (as used in MATLAB and FORTRAN), the _count_ marker `#` should be followed by
559+
an array of a single element, which must be a 1-D array of integer type as the
560+
dimensional vector above. Either of the arrays can be in optimized or non-optimized
561+
form. For example, either of the following
562+
563+
```
564+
[[] [$] [type] [#] [[] [[] [$] [Nx type] [#] [Ndim type] [Ndim] [Nx Ny Nz ...] []] [a11 a21 a31 ... a21 a22 ...]
565+
or
566+
[[] [$] [type] [#] [[] [[] [Nx type] [nx] [Ny type] [Ny] [Nz type] [Nz] ... []] []] [a11 a21 a31 ... a21 a22 ...]
567+
```
568+
represents the same column-major N-dimensional array of `type` and size `[Nx, Ny, Nz, ...]`.
569+
517570

518571
#### Example (a 2x3x4 `uint8` array):
519572
The following 2x3x4 3-D `uint8` array
@@ -531,22 +584,26 @@ The following 2x3x4 3-D `uint8` array
531584
]
532585
]
533586
```
534-
shall be stored as
587+
shall be stored using **row-major** serialized form as
535588
```
536589
[[] [$][U] [#][[] [$][U][#][3] [2][3][4]
537590
[1][9][6][0] [2][9][3][1] [8][0][9][6] [6][4][2][7] [8][5][1][2] [3][3][2][6]
538591
```
539-
592+
or **column-major** serialized form as
593+
```
594+
[[] [$][U] [#][[] [[] [$][U][#][3] [2][3][4] []]
595+
[1][6][2][8] [8][3][9][4] [9][5][0][3] [6][2][3][1] [9][2][0][7] [1][2][6][6]
596+
```
540597

541598
### Additional rules
542599
- A _count_ **must** be >= 0.
543-
- A _count_ can be specified by itself.
600+
- A _count_ can be specified alone.
544601
- If a _count_ is specified, the container **must not** specify an end-marker.
545602
- A container that specifies a _count_ **must** contain the specified number of
546603
child elements.
547604
- If a _type_ is specified, it **must** be done so before _count_.
548605
- If a _type_ is specified, a _count_ **must** also be specified. A _type_
549-
cannot be specified by itself.
606+
cannot be specified alone.
550607
- A container that specifies a _type_ **must not** contain any additional
551608
_type_ markers for any contained value.
552609

@@ -601,13 +658,13 @@ The MIME type for a Binary JData document is **`"application/jdata-binary"`**
601658
Acknowledgement
602659
------------------------------
603660

604-
The BJData spec is derived from the Universal Binary JSON (UBJSON, http://ubjson.org)
661+
The BJData spec is derived from the Universal Binary JSON (UBJSON, https://ubjson.org)
605662
specification (Draft 12) developed by Riyad Kalla and other UBJSON contributors.
606663

607664
The initial version of this MarkDown-formatted specification was derived from the
608665
documentation included in the [Py-UBJSON](https://github.com/Iotic-Labs/py-ubjson/blob/dev-contrib/UBJSON-Specification.md)
609666
repository (Commit 5ce1fe7).
610667

611-
This specification was developed as part of the NeuroJSON project (http://neurojson.org)
668+
This specification was developed as part of the NeuroJSON project (https://neurojson.org)
612669
with funding support from the US National Institute of Health (NIH) under
613670
grant [U24-NS124027](https://reporter.nih.gov/project-details/10308329).

CHANGELOG

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
0.5.0
2+
- 2025-07-26 support BJData Spec Draft 3 (PR #5, thanks to Nebojsa Cvetkovic)
3+
- 2025-07-27 compatible with numpy 2.x, fully reformat C and python code with formatter
4+
5+
0.4.1
6+
- 2023-10-25 fix numpy 1d array length encoding bug, fix #4
7+
8+
0.4.0
9+
- 2022-09-04 add numpy ndarray/scalar encoding/decoding, fix pip ext compilation error, thanks to @termanis
10+
111
0.3.0
212
- 2022-04-01 support BJData Spec Draft 2, change to little-endian for numbers
313

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2020-2023 Qianqian Fang <q.fang at neu.edu>. All rights reserved.
1+
# Copyright (c) 2020-2025 Qianqian Fang <q.fang at neu.edu>. All rights reserved.
22
# Copyright (c) 2016-2019 Iotic Labs Ltd. All rights reserved.
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");

test/test.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2020-2023 Qianqian Fang <q.fang at neu.edu>. All rights reserved.
1+
# Copyright (c) 2020-2025 Qianqian Fang <q.fang at neu.edu>. All rights reserved.
22
# Copyright (c) 2016-2019 Iotic Labs Ltd. All rights reserved.
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -960,15 +960,15 @@ def test_recursion(self):
960960
setrecursionlimit(200)
961961
try:
962962
obj = current = []
963-
for _ in range(getrecursionlimit() * 2):
963+
# Increase the multiplier for Python 3.12+
964+
multiplier = 4 if version_info >= (3, 12) else 2
965+
for _ in range(getrecursionlimit() * multiplier):
964966
new_list = []
965967
current.append(new_list)
966968
current = new_list
967-
968969
with self.assert_raises_regex(RuntimeError, "recursion"):
969970
self.bjddumpb(obj)
970-
971-
raw = ARRAY_START * (getrecursionlimit() * 2)
971+
raw = ARRAY_START * (getrecursionlimit() * multiplier)
972972
with self.assert_raises_regex(RuntimeError, "recursion"):
973973
self.bjdloadb(raw)
974974
finally:

0 commit comments

Comments
 (0)