44
55- Copyright: (C) Qianqian Fang (2019-2024) <q.fang at neu.edu>
66- License: Apache License, Version 2.0
7- - Version: 0.5.5
7+ - Version: 0.6.0
88- URL: https://github.com/NeuroJSON/pyjdata
9+ - Acknowledgement: This project is supported by US National Institute of Health (NIH)
10+ grant [ U24-NS124027] ( https://reporter.nih.gov/project-details/10308329 )
911
1012![ Build Status] ( https://github.com/NeuroJSON/pyjdata/actions/workflows/run_test.yml/badge.svg )
1113
1214The [ JData Specification] ( https://github.com/NeuroJSON/jdata/ ) defines a lightweight
13- language-independent data annotation interface targetted at
14- storing and sharing complex data structures across different programming
15+ language-independent data annotation interface enabling easy storing
16+ and sharing of complex data structures across different programming
1517languages such as MATLAB, JavaScript, Python etc. Using JData formats, a
16- complex Python data structure can be encoded as a ` dict ` object that is easily
17- serialized as a JSON/binary JSON file and share such data between
18- programs of different languages.
18+ complex Python data structure, including numpy objects, can be encoded
19+ as a simple ` dict ` object that is easily serialized as a JSON/binary JSON
20+ file and share such data between programs of different languages.
21+
22+ Since 2021, the development of PyJData module and the underlying data format specificaitons
23+ [ JData] ( https://neurojson.org/jdata/draft3 ) and [ BJData] ( https://neurojson.org/bjdata/draft2 )
24+ have been funded by the US National Institute of Health (NIH) as
25+ part of the NeuroJSON project (https://neurojson.org and https://neurojson.io ).
26+
27+ The goal of the NeuroJSON project is to develop scalable, searchable, and
28+ reusable neuroimaging data formats and data sharing platforms. All data
29+ produced from the NeuroJSON project will be using JSON/Binary JData formats as the
30+ underlying serialization standards and the lightweight JData specification as
31+ language-independent data annotation standard.
1932
2033## How to install
2134
@@ -102,6 +115,13 @@ newdata=jd.load('test.json')
102115newdata
103116```
104117
118+ One can use ` loadt ` or ` savet ` to read/write JSON-based data files and ` loadb ` and ` saveb ` to
119+ read/write binary-JSON based data files. By default, JData annotations are automatically decoded
120+ after loading and encoded before saving. One can set ` {'encode': False} ` in the save functions
121+ or ` {'decode': False} ` in the load functions as the ` opt ` to disable further processing of JData
122+ annotations. We also provide ` loadts ` and ` loadbs ` for parsing a string-buffer made of text-based
123+ JSON or binary JSON stream.
124+
105125PyJData supports multiple N-D array data compression/decompression methods (i.e. codecs), similar
106126to HDF5 filters. Currently supported codecs include ` zlib ` , ` gzip ` , ` lz4 ` , ` lzma ` , ` base64 ` and various
107127` blosc2 ` compression methods, including ` blosc2blosclz ` , ` blosc2lz4 ` , ` blosc2lz4hc ` , ` blosc2zlib ` ,
@@ -111,6 +131,104 @@ decompress the data based on the `_ArrayZipType_` annotation present in the data
111131compression methods support multi-threading. To set the thread number, one should define an ` nthread `
112132value in the option (` opt ` ) for both encoding and decoding.
113133
134+ ## Reading JSON via REST-API
135+
136+ If a REST-API (URL) is given as the first input of ` load ` , it reads the JSON data directly
137+ from the URL and parse the content to native Python data structures. To avoid repetitive download,
138+ ` load ` automatically cache the downloaded file so that future calls directly load the
139+ locally cached file. If one prefers to always load from the URL without local cache, one should
140+ use ` loadurl() ` instead. Here is an example
141+
142+ ```
143+ import jdata as jd
144+ data = jd.load('https://neurojson.io:7777/openneuro/ds000001');
145+ data.keys()
146+ ```
147+
148+ ## Using JSONPath to access and query complex datasets
149+
150+ Starting from v0.6.0, PyJData provides a lightweight implementation [ JSONPath] ( https://goessner.net/articles/JsonPath/ ) ,
151+ a widely used format for query and access a hierarchical dict/list structure, such as those
152+ parsed by ` load ` or ` loadurl ` . Here is an example
153+
154+ ```
155+ import jdata as jd
156+
157+ data = jd.loadurl('https://raw.githubusercontent.com/fangq/jsonlab/master/examples/example1.json');
158+ jd.jsonpath(data, '$.age')
159+ jd.jsonpath(data, '$.address.city')
160+ jd.jsonpath(data, '$.phoneNumber')
161+ jd.jsonpath(data, '$.phoneNumber[0]')
162+ jd.jsonpath(data, '$.phoneNumber[0].type')
163+ jd.jsonpath(data, '$.phoneNumber[-1]')
164+ jd.jsonpath(data, '$.phoneNumber..number')
165+ jd.jsonpath(data, '$[phoneNumber][type]')
166+ jd.jsonpath(data, '$[phoneNumber][type][1]')
167+ ```
168+
169+ The ` jd.jsonpath ` function does not support all JSONPath features. If more complex JSONPath
170+ queries are needed, one should install ` jsonpath_ng ` or other more advanced JSONPath support.
171+ Here is an example using ` jsonpath_ng `
172+
173+ ```
174+ import jdata as jd
175+ from jsonpath_ng.ext import parse
176+
177+ data = jd.loadurl('https://raw.githubusercontent.com/fangq/jsonlab/master/examples/example1.json');
178+
179+ val = [match.value for match in parse('$.address.city').find(data)]
180+ val = [match.value for match in parse('$.phoneNumber').find(data)]
181+ ```
182+
183+ ## Downloading and caching ` _DataLink_ ` referenced external data files
184+
185+ Similarly to [ JSONLab] ( https://github.com/fangq/jsonlab?tab=readme-ov-file#jsoncachem ) ,
186+ PyJData also provides similar external data file downloading/caching capability.
187+
188+ The ` _DataLink_ ` annotation in the JData specification permits linking of external data files
189+ in a JSON file - to make downloading/parsing externally linked data files efficient, such as
190+ processing large neuroimaging datasets hosted on http://neurojson.io , we have developed a system
191+ to download files on-demand and cache those locally. jsoncache.m is responsible of searching
192+ the local cache folders, if found the requested file, it returns the path to the local cache;
193+ if not found, it returns a SHA-256 hash of the URL as the file name, and the possible cache folders
194+
195+ When loading a file from URL, below is the order of cache file search paths, ranking in search order
196+ ```
197+ global-variable NEUROJSON_CACHE | if defined, this path will be searched first
198+ [pwd '/.neurojson'] | on all OSes
199+ /home/USERNAME/.neurojson | on all OSes (per-user)
200+ /home/USERNAME/.cache/neurojson | if on Linux (per-user)
201+ /var/cache/neurojson | if on Linux (system wide)
202+ /home/USERNAME/Library/neurojson| if on MacOS (per-user)
203+ /Library/neurojson | if on MacOS (system wide)
204+ C:\ProgramData\neurojson | if on Windows (system wide)
205+ ```
206+ When saving a file from a URL, under the root cache folder, subfolders can be created;
207+ if the URL is one of a standard NeuroJSON.io URLs as below
208+ ```
209+ https://neurojson.org/io/stat.cgi?action=get&db=DBNAME&doc=DOCNAME&file=sub-01/anat/datafile.nii.gz
210+ https://neurojson.io:7777/DBNAME/DOCNAME
211+ https://neurojson.io:7777/DBNAME/DOCNAME/datafile.suffix
212+ ```
213+ the file datafile.nii.gz will be downloaded to /home/USERNAME/.neurojson/io/DBNAME/DOCNAME/sub-01/anat/ folder
214+ if a URL does not follow the neurojson.io format, the cache folder has the below form
215+ ```
216+ CACHEFOLDER{i}/domainname.com/XX/YY/XXYYZZZZ...
217+ ```
218+ where XXYYZZZZ.. is the SHA-256 hash of the full URL, XX is the first two digit, YY is the 3-4 digits
219+
220+ In PyJData, we provide ` jdata.jdlink() ` function to dynamically download and locally cache
221+ externally linked data files. ` jdata.jdlink() ` only parse files with JSON/binary JSON suffixes that
222+ ` load ` supports. Here is a example
223+
224+ ```
225+ import jdata as jd
226+
227+ data = jd.load('https://neurojson.io:7777/openneuro/ds000001');
228+ extlinks = jd.jsonpath(data, '$..anat.._DataLink_') # deep-scan of all anatomical folders and find all linked NIfTI files
229+ jd.jdlink(extlinks, {'regex': 'sub-0[12]_.*nii'}) # download only the nii files for sub-01 and sub-02
230+ jd.jdlink(extlinks) # download all links
231+ ```
114232
115233## Utility
116234
0 commit comments