Skip to content

Commit 7486ab8

Browse files
authored
[doc] update introduction, before adding Extension type
1 parent c2ac37e commit 7486ab8

File tree

1 file changed

+46
-57
lines changed

1 file changed

+46
-57
lines changed

Binary_JData_Specification.md

Lines changed: 46 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -43,63 +43,52 @@ extended binary data types.
4343
Introduction
4444
------------------------------
4545

46-
The Javascript Object Notation (JSON) format, formally known as the ECMA-404
47-
or ISO21778:2017 standard, is ubiquitously used in today's web and native
48-
applications. JSON presents numerous advantages, such as human readability,
49-
generality for accommodating complex hierarchical data, self-documentation, and
50-
abundant existing free and commercial libraries. However, its utility is largely
51-
restricted to the storage of lightweight textural data, and has very limited presence
52-
in many data-intensive and performance-demanding applications, such as database
53-
backends, medical imaging, and scientific data storage.
54-
55-
The lack of support for strongly-typed and binary data has been one of the main
56-
barriers towards widespread adoption of JSON in these domains. In recent years,
57-
efforts to address these limitation have resulted in an array of versatile binary
58-
JSON formats, such as BSON (Binary JSON, https://bson.org), UBJSON (Universal Binary
59-
JSON, https://ubjson.org), MessagePack (https://msgpack.org), CBOR (Concise Binary
60-
Object Representation, [RFC 7049], https://cbor.io) etc. These binary JSON
61-
counterparts are broadly used in speed-sensitive data processing applications and
62-
address various needs from a diverse range of applications.
63-
64-
To better champion findable, accessible, interoperable, and reusable
65-
([FAIR principle](https://www.nature.com/articles/sdata201618)) data in
66-
scientific data storage and management, we have created the **NeuroJSON Project**
67-
(https://neurojson.org) to develop a set of open-standards for portable, human-readable
68-
and high-performance data annotation and serialization aimed towards enabling
69-
neuroimaging researchers, scientific researchers, IT engineers, as well as general
70-
data users to efficiently annotate and store complex data structures arising
71-
from diverse applications.
72-
73-
The NeuroJSON data sharing framework first converts complex data structures, such as N-D
74-
arrays, trees, tables and graphs, into easy-to-serialize, portable data annotations
75-
via the **JData Specification** (https://github.com/NeuroJSON/jdata) and then serializes
76-
and stores the annotated JData constructs using widely-supported data formats.
77-
To balance data portability, readability and efficiency, NeuroJSON defines a
78-
**dual-interface**: a text-based format **syntactically compatible with JSON**,
79-
and a binary-JSON format to achieve significantly smaller file sizes and faster
80-
encoding/decoding.
81-
82-
The Binary JData (BJData) format is the **official binary interface** for the JData
83-
Specification. It is derived from the widely supported UBJSON Specification
84-
Draft 12 (https://github.com/ubjson/universal-binary-json), and adds native
85-
support for **N-dimensional packed arrays** - an essential data structure for
86-
scientific applications - as well as extended binary data types, including unsigned
87-
integer types and half-precision floating-point numbers. The new data constructs
88-
also allow a BJData file to store binary arrays larger than 4 GB in size, which
89-
is not currently possible with MessagePack (maximum data record size is limited
90-
to 4 GB) and BSON (maximum total file size is 4 GB).
91-
92-
A key rationale for basing the BJData format upon UBJSON as opposed to
93-
other more popular binary JSON-like formats, such as BSON, CBOR and MessagePack,
94-
is UBJSON's **quasi-human-readability** - a unique characteristic that is
95-
absent from almost all other binary formats. This is because all data semantic
96-
elements in a UBJSON/BJData file, e.g. the "name" fields and data-type markers,
97-
are defined in human-readable strings. The resulting binary files are not only
98-
capable of storing complex and hierarchical binary data structures, but also
99-
directly readable using an editor with minimal or no processing. We anticipate that
100-
such a unique capability, in combination with the highly portable JData annotation
101-
keywords, makes a data file self-explanatory, easy to reuse, and easy to
102-
inter-operate in complex applications.
46+
The Javascript Object Notation (JSON) format is ubiquitously used in today's
47+
web and native applications. JSON offers numerous advantages, including
48+
simplicity, human- and machine-readability, and versatility, with a large
49+
toolchain ecosystem ranging from numerous parsers to highly efficient
50+
document-store databases. However, its plain-text nature and schema-less
51+
design make it difficult to efficiently store and exchange strongly-typed
52+
binary data, including numerical and multi-dimensional arrays and other
53+
complex data structures generated in scientific research and imaging
54+
applications.
55+
56+
In recent years, efforts to address these limitations have resulted in an
57+
array of binary JSON formats, such as BSON (Binary JSON, https://bson.org),
58+
UBJSON (Universal Binary JSON, https://ubjson.org), MessagePack
59+
(https://msgpack.org), and CBOR (Concise Binary Object Representation,
60+
[RFC 7049], https://cbor.io), among others. These binary JSON counterparts
61+
are broadly adopted in speed-sensitive data applications and vary in terms
62+
of supported data types, flexibility of containers (arrays and objects), and
63+
associated libraries.
64+
65+
The Binary JData (BJData) format is a binary JSON format extended from the
66+
UBJSON Specification Draft 12 (https://github.com/ubjson/universal-binary-json)
67+
by adding native support for **N-dimensional packed arrays** - an essential data
68+
structure for scientific applications - and **columnar table format** for packed
69+
objects to offer efficient storage of tables and repeating objects, such as
70+
Python Pandas DataFrames and NumPy arrays-of-objects. These optimized
71+
container constructs greatly reduce storage overhead and redundancy,
72+
enhancing data exchange efficiency.
73+
74+
In addition, BJData also extends binary data types that UBJSON failed to
75+
support, including unsigned integer types, half-precision floating-point
76+
numbers, a native byte type, as well as user-defined binary extensions. With
77+
these extensions, a BJData file can store binary arrays larger than 4 GB in
78+
size, which is not possible with MessagePack (maximum data record size is
79+
limited to 4 GB) or BSON (maximum total file size is 4 GB). Despite these
80+
extensions, the BJData format remains simple to parse.
81+
82+
An outstanding benefit of the BJData format (and its predecessor UBJSON) as
83+
opposed to other more popular binary JSON-like formats, such as BSON, CBOR,
84+
and MessagePack, is its **quasi-human-readability** - a unique characteristic
85+
that is absent from almost all other binary formats. This is because all
86+
data semantic elements in a BJData file, e.g. the "name" fields and
87+
data-type markers, are defined in human-readable strings. The resulting
88+
binary can be partially readable using an editor with minimal or no
89+
processing. We anticipate that such a unique capability makes a data file
90+
self-explanatory and easy to support without compromising computational and
91+
storage efficiency.
10392

10493

10594
License

0 commit comments

Comments
 (0)