@@ -43,63 +43,52 @@ extended binary data types.
4343Introduction
4444------------------------------
4545
46- The Javascript Object Notation (JSON) format, formally known as the ECMA-404
47- or ISO21778:2017 standard, is ubiquitously used in today's web and native
48- applications. JSON presents numerous advantages, such as human readability,
49- generality for accommodating complex hierarchical data, self-documentation, and
50- abundant existing free and commercial libraries. However, its utility is largely
51- restricted to the storage of lightweight textural data, and has very limited presence
52- in many data-intensive and performance-demanding applications, such as database
53- backends, medical imaging, and scientific data storage.
54-
55- The lack of support for strongly-typed and binary data has been one of the main
56- barriers towards widespread adoption of JSON in these domains. In recent years,
57- efforts to address these limitation have resulted in an array of versatile binary
58- JSON formats, such as BSON (Binary JSON, https://bson.org ), UBJSON (Universal Binary
59- JSON, https://ubjson.org ), MessagePack (https://msgpack.org ), CBOR (Concise Binary
60- Object Representation, [ RFC 7049] , https://cbor.io ) etc. These binary JSON
61- counterparts are broadly used in speed-sensitive data processing applications and
62- address various needs from a diverse range of applications.
63-
64- To better champion findable, accessible, interoperable, and reusable
65- ([ FAIR principle] ( https://www.nature.com/articles/sdata201618 ) ) data in
66- scientific data storage and management, we have created the ** NeuroJSON Project**
67- (https://neurojson.org ) to develop a set of open-standards for portable, human-readable
68- and high-performance data annotation and serialization aimed towards enabling
69- neuroimaging researchers, scientific researchers, IT engineers, as well as general
70- data users to efficiently annotate and store complex data structures arising
71- from diverse applications.
72-
73- The NeuroJSON data sharing framework first converts complex data structures, such as N-D
74- arrays, trees, tables and graphs, into easy-to-serialize, portable data annotations
75- via the ** JData Specification** (https://github.com/NeuroJSON/jdata ) and then serializes
76- and stores the annotated JData constructs using widely-supported data formats.
77- To balance data portability, readability and efficiency, NeuroJSON defines a
78- ** dual-interface** : a text-based format ** syntactically compatible with JSON** ,
79- and a binary-JSON format to achieve significantly smaller file sizes and faster
80- encoding/decoding.
81-
82- The Binary JData (BJData) format is the ** official binary interface** for the JData
83- Specification. It is derived from the widely supported UBJSON Specification
84- Draft 12 (https://github.com/ubjson/universal-binary-json ), and adds native
85- support for ** N-dimensional packed arrays** - an essential data structure for
86- scientific applications - as well as extended binary data types, including unsigned
87- integer types and half-precision floating-point numbers. The new data constructs
88- also allow a BJData file to store binary arrays larger than 4 GB in size, which
89- is not currently possible with MessagePack (maximum data record size is limited
90- to 4 GB) and BSON (maximum total file size is 4 GB).
91-
92- A key rationale for basing the BJData format upon UBJSON as opposed to
93- other more popular binary JSON-like formats, such as BSON, CBOR and MessagePack,
94- is UBJSON's ** quasi-human-readability** - a unique characteristic that is
95- absent from almost all other binary formats. This is because all data semantic
96- elements in a UBJSON/BJData file, e.g. the "name" fields and data-type markers,
97- are defined in human-readable strings. The resulting binary files are not only
98- capable of storing complex and hierarchical binary data structures, but also
99- directly readable using an editor with minimal or no processing. We anticipate that
100- such a unique capability, in combination with the highly portable JData annotation
101- keywords, makes a data file self-explanatory, easy to reuse, and easy to
102- inter-operate in complex applications.
46+ The Javascript Object Notation (JSON) format is ubiquitously used in today's
47+ web and native applications. JSON offers numerous advantages, including
48+ simplicity, human- and machine-readability, and versatility, with a large
49+ toolchain ecosystem ranging from numerous parsers to highly efficient
50+ document-store databases. However, its plain-text nature and schema-less
51+ design make it difficult to efficiently store and exchange strongly-typed
52+ binary data, including numerical and multi-dimensional arrays and other
53+ complex data structures generated in scientific research and imaging
54+ applications.
55+
56+ In recent years, efforts to address these limitations have resulted in an
57+ array of binary JSON formats, such as BSON (Binary JSON, https://bson.org ),
58+ UBJSON (Universal Binary JSON, https://ubjson.org ), MessagePack
59+ (https://msgpack.org ), and CBOR (Concise Binary Object Representation,
60+ [ RFC 7049] , https://cbor.io ), among others. These binary JSON counterparts
61+ are broadly adopted in speed-sensitive data applications and vary in terms
62+ of supported data types, flexibility of containers (arrays and objects), and
63+ associated libraries.
64+
65+ The Binary JData (BJData) format is a binary JSON format extended from the
66+ UBJSON Specification Draft 12 (https://github.com/ubjson/universal-binary-json )
67+ by adding native support for ** N-dimensional packed arrays** - an essential data
68+ structure for scientific applications - and ** columnar table format** for packed
69+ objects to offer efficient storage of tables and repeating objects, such as
70+ Python Pandas DataFrames and NumPy arrays-of-objects. These optimized
71+ container constructs greatly reduce storage overhead and redundancy,
72+ enhancing data exchange efficiency.
73+
74+ In addition, BJData also extends binary data types that UBJSON failed to
75+ support, including unsigned integer types, half-precision floating-point
76+ numbers, a native byte type, as well as user-defined binary extensions. With
77+ these extensions, a BJData file can store binary arrays larger than 4 GB in
78+ size, which is not possible with MessagePack (maximum data record size is
79+ limited to 4 GB) or BSON (maximum total file size is 4 GB). Despite these
80+ extensions, the BJData format remains simple to parse.
81+
82+ An outstanding benefit of the BJData format (and its predecessor UBJSON) as
83+ opposed to other more popular binary JSON-like formats, such as BSON, CBOR,
84+ and MessagePack, is its ** quasi-human-readability** - a unique characteristic
85+ that is absent from almost all other binary formats. This is because all
86+ data semantic elements in a BJData file, e.g. the "name" fields and
87+ data-type markers, are defined in human-readable strings. The resulting
88+ binary can be partially readable using an editor with minimal or no
89+ processing. We anticipate that such a unique capability makes a data file
90+ self-explanatory and easy to support without compromising computational and
91+ storage efficiency.
10392
10493
10594License
0 commit comments