Skip to content

Commit 96a28af

Browse files
samuelgarciazm711h-mayorquin
authored
buffer description api in neorawio and xarray reference API bridge (#1513)
* Proof of concept of "buffer_description_api" and xarray reference API bridge * Implement get_analogsignal_chunk() generically when a rawio class has_buffer_description_api=True This should also solve the memmap and memmory leak problem. * wip * test on micromed * rebase on buffer_id * Implement get_analogsignal_chunk() generically when a rawio class has_buffer_description_api=True This should also solve the memmap and memmory leak problem. * wip * test on micromed * some fix * make strema a slice of buffer and xarray api use buffer_id * json api : winedr + winwcp * buffer api : RawBinarySignalRawIO + RawMCSRawIO * json api : neuroscope + openephysraw * More reader with buffer description * wip * json api start hdf5 on maxwell * doc for signal_stream signal_buffer * Merci Zach Co-authored-by: Zach McKenzie <[email protected]> * Use class approach for buffer api : BaseRawWithBufferApiIO * feedback * Apply suggestions from code review Co-authored-by: Heberto Mayorquin <[email protected]> Co-authored-by: Zach McKenzie <[email protected]> * clean * oups * more clean --------- Co-authored-by: Zach McKenzie <[email protected]> Co-authored-by: Heberto Mayorquin <[email protected]>
1 parent aa0c7fe commit 96a28af

22 files changed

+861
-368
lines changed

doc/source/rawio.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,32 @@ Read event timestamps and times
281281
In [42]: print(ev_times)
282282
[ 0.0317]
283283

284+
Signal streams and signal buffers
285+
---------------------------------
286+
287+
For reading analog signals **neo.rawio** has 2 important concepts:
288+
289+
1. The **signal_stream** : it is a group of channels that can be read together using :func:`get_analog_signal_chunk()`.
290+
This group of channels is guaranteed to have the same sampling rate, and the same duration per segment.
291+
Most of the time, this group of channel is a "logical" group of channels. In short they are from the same headstage
292+
or from the same auxiliary board.
293+
Optionally, depending on the format, a **signal_stream** can be a slice of or an entire **signal_buffer**.
294+
295+
2. The **signal_buffer** : it is group of channels that share the same data layout in a file. The most simple example
296+
is channel that can be read by a simple :func:`signals = np.memmap(file, shape=..., dtype=... , offset=...)`.
297+
A **signal_buffer** can contain one or several **signal_stream**'s (very often it is only one).
298+
There are two kind of formats that handle this concept:
299+
300+
* Formats which use :func:`np.memmap()` internally
301+
* Formats based on hdf5
302+
303+
There are many formats that do not handle this concept:
304+
305+
* the ones that use an external python package for reading data (edf, ced, plexon2, ...)
306+
* the ones with a complicated data layout (e.g. those where the data blocks are split without structure)
307+
308+
To check if a format makes use of the buffer api you can check the class attribute flag `has_buffer_description_api` of the
309+
rawio class.
284310

285311

286312

neo/rawio/axonrawio.py

Lines changed: 28 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@
5353
import numpy as np
5454

5555
from .baserawio import (
56-
BaseRawIO,
56+
BaseRawWithBufferApiIO,
5757
_signal_channel_dtype,
5858
_signal_stream_dtype,
5959
_signal_buffer_dtype,
@@ -63,7 +63,7 @@
6363
from neo.core import NeoReadWriteError
6464

6565

66-
class AxonRawIO(BaseRawIO):
66+
class AxonRawIO(BaseRawWithBufferApiIO):
6767
"""
6868
Class for Class for reading data from pCLAMP and AxoScope files (.abf version 1 and 2)
6969
@@ -92,7 +92,7 @@ class AxonRawIO(BaseRawIO):
9292
rawmode = "one-file"
9393

9494
def __init__(self, filename=""):
95-
BaseRawIO.__init__(self)
95+
BaseRawWithBufferApiIO.__init__(self)
9696
self.filename = filename
9797

9898
def _parse_header(self):
@@ -115,8 +115,6 @@ def _parse_header(self):
115115
head_offset = info["sections"]["DataSection"]["uBlockIndex"] * BLOCKSIZE
116116
totalsize = info["sections"]["DataSection"]["llNumEntries"]
117117

118-
self._raw_data = np.memmap(self.filename, dtype=sig_dtype, mode="r", shape=(totalsize,), offset=head_offset)
119-
120118
# 3 possible modes
121119
if version < 2.0:
122120
mode = info["nOperationMode"]
@@ -142,7 +140,7 @@ def _parse_header(self):
142140
)
143141
else:
144142
episode_array = np.empty(1, [("offset", "i4"), ("len", "i4")])
145-
episode_array[0]["len"] = self._raw_data.size
143+
episode_array[0]["len"] = totalsize
146144
episode_array[0]["offset"] = 0
147145

148146
# sampling_rate
@@ -154,9 +152,14 @@ def _parse_header(self):
154152
# one sweep = one segment
155153
nb_segment = episode_array.size
156154

155+
stream_id = "0"
156+
buffer_id = "0"
157+
157158
# Get raw data by segment
158-
self._raw_signals = {}
159+
# self._raw_signals = {}
159160
self._t_starts = {}
161+
self._buffer_descriptions = {0 :{}}
162+
self._stream_buffer_slice = {stream_id : None}
160163
pos = 0
161164
for seg_index in range(nb_segment):
162165
length = episode_array[seg_index]["len"]
@@ -169,7 +172,15 @@ def _parse_header(self):
169172
if (fSynchTimeUnit != 0) and (mode == 1):
170173
length /= fSynchTimeUnit
171174

172-
self._raw_signals[seg_index] = self._raw_data[pos : pos + length].reshape(-1, nbchannel)
175+
self._buffer_descriptions[0][seg_index] = {}
176+
self._buffer_descriptions[0][seg_index][buffer_id] = {
177+
"type" : "raw",
178+
"file_path" : str(self.filename),
179+
"dtype" : str(sig_dtype),
180+
"order": "C",
181+
"file_offset" : head_offset + pos * sig_dtype.itemsize,
182+
"shape" : (int(length // nbchannel), int(nbchannel)),
183+
}
173184
pos += length
174185

175186
t_start = float(episode_array[seg_index]["offset"])
@@ -227,17 +238,14 @@ def _parse_header(self):
227238
offset -= info["listADCInfo"][chan_id]["fSignalOffset"]
228239
else:
229240
gain, offset = 1.0, 0.0
230-
stream_id = "0"
231-
buffer_id = "0"
232-
signal_channels.append(
233-
(name, str(chan_id), self._sampling_rate, sig_dtype, units, gain, offset, stream_id, buffer_id)
234-
)
241+
242+
signal_channels.append((name, str(chan_id), self._sampling_rate, sig_dtype, units, gain, offset, stream_id, buffer_id))
235243

236244
signal_channels = np.array(signal_channels, dtype=_signal_channel_dtype)
237245

238246
# one unique signal stream and buffer
239-
signal_buffers = np.array([("Signals", "0")], dtype=_signal_buffer_dtype)
240-
signal_streams = np.array([("Signals", "0", "0")], dtype=_signal_stream_dtype)
247+
signal_buffers = np.array([("Signals", buffer_id)], dtype=_signal_buffer_dtype)
248+
signal_streams = np.array([("Signals", stream_id, buffer_id)], dtype=_signal_stream_dtype)
241249

242250
# only one events channel : tag
243251
# In ABF timstamps are not attached too any particular segment
@@ -295,21 +303,16 @@ def _segment_t_start(self, block_index, seg_index):
295303
return self._t_starts[seg_index]
296304

297305
def _segment_t_stop(self, block_index, seg_index):
298-
t_stop = self._t_starts[seg_index] + self._raw_signals[seg_index].shape[0] / self._sampling_rate
306+
sig_size = self.get_signal_size(block_index, seg_index, 0)
307+
t_stop = self._t_starts[seg_index] + sig_size / self._sampling_rate
299308
return t_stop
300309

301-
def _get_signal_size(self, block_index, seg_index, stream_index):
302-
shape = self._raw_signals[seg_index].shape
303-
return shape[0]
304-
305310
def _get_signal_t_start(self, block_index, seg_index, stream_index):
306311
return self._t_starts[seg_index]
307312

308-
def _get_analogsignal_chunk(self, block_index, seg_index, i_start, i_stop, stream_index, channel_indexes):
309-
if channel_indexes is None:
310-
channel_indexes = slice(None)
311-
raw_signals = self._raw_signals[seg_index][slice(i_start, i_stop), channel_indexes]
312-
return raw_signals
313+
def _get_analogsignal_buffer_description(self, block_index, seg_index, buffer_id):
314+
return self._buffer_descriptions[block_index][seg_index][buffer_id]
315+
313316

314317
def _event_count(self, block_index, seg_index, event_channel_index):
315318
return self._raw_ev_timestamps.size

neo/rawio/baserawio.py

Lines changed: 157 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@
7777

7878
from neo import logging_handler
7979

80+
from .utils import get_memmap_chunk_from_opened_file
81+
8082

8183
possible_raw_modes = [
8284
"one-file",
@@ -182,6 +184,15 @@ def __init__(self, use_cache: bool = False, cache_path: str = "same_as_resource"
182184
self.header = None
183185
self.is_header_parsed = False
184186

187+
self._has_buffer_description_api = False
188+
189+
def has_buffer_description_api(self) -> bool:
190+
"""
191+
Return if the reader handle the buffer API.
192+
If True then the reader support internally `get_analogsignal_buffer_description()`
193+
"""
194+
return self._has_buffer_description_api
195+
185196
def parse_header(self):
186197
"""
187198
Parses the header of the file(s) to allow for faster computations
@@ -191,6 +202,7 @@ def parse_header(self):
191202
# this must create
192203
# self.header['nb_block']
193204
# self.header['nb_segment']
205+
# self.header['signal_buffers']
194206
# self.header['signal_streams']
195207
# self.header['signal_channels']
196208
# self.header['spike_channels']
@@ -663,6 +675,7 @@ def get_signal_size(self, block_index: int, seg_index: int, stream_index: int |
663675
664676
"""
665677
stream_index = self._get_stream_index_from_arg(stream_index)
678+
666679
return self._get_signal_size(block_index, seg_index, stream_index)
667680

668681
def get_signal_t_start(self, block_index: int, seg_index: int, stream_index: int | None = None):
@@ -1311,7 +1324,6 @@ def _get_analogsignal_chunk(
13111324
-------
13121325
array of samples, with each requested channel in a column
13131326
"""
1314-
13151327
raise (NotImplementedError)
13161328

13171329
###
@@ -1350,6 +1362,150 @@ def _rescale_event_timestamp(self, event_timestamps: np.ndarray, dtype: np.dtype
13501362
def _rescale_epoch_duration(self, raw_duration: np.ndarray, dtype: np.dtype):
13511363
raise (NotImplementedError)
13521364

1365+
###
1366+
# buffer api zone
1367+
# must be implemented if has_buffer_description_api=True
1368+
def get_analogsignal_buffer_description(self, block_index: int = 0, seg_index: int = 0, buffer_id: str = None):
1369+
if not self.has_buffer_description_api:
1370+
raise ValueError("This reader do not support buffer_description API")
1371+
descr = self._get_analogsignal_buffer_description(block_index, seg_index, buffer_id)
1372+
return descr
1373+
1374+
def _get_analogsignal_buffer_description(self, block_index, seg_index, buffer_id):
1375+
raise (NotImplementedError)
1376+
1377+
1378+
1379+
class BaseRawWithBufferApiIO(BaseRawIO):
1380+
"""
1381+
Generic class for reader that support "buffer api".
1382+
1383+
In short reader that are internally based on:
1384+
1385+
* np.memmap
1386+
* hdf5
1387+
1388+
In theses cases _get_signal_size and _get_analogsignal_chunk are totaly generic and do not need to be implemented in the class.
1389+
1390+
For this class sub classes must implements theses two dict:
1391+
* self._buffer_descriptions[block_index][seg_index] = buffer_description
1392+
* self._stream_buffer_slice[buffer_id] = None or slicer o indices
1393+
1394+
"""
1395+
1396+
def __init__(self, *arg, **kwargs):
1397+
super().__init__(*arg, **kwargs)
1398+
self._has_buffer_description_api = True
1399+
1400+
def _get_signal_size(self, block_index, seg_index, stream_index):
1401+
buffer_id = self.header["signal_streams"][stream_index]["buffer_id"]
1402+
buffer_desc = self.get_analogsignal_buffer_description(block_index, seg_index, buffer_id)
1403+
# some hdf5 revert teh buffer
1404+
time_axis = buffer_desc.get("time_axis", 0)
1405+
return buffer_desc['shape'][time_axis]
1406+
1407+
def _get_analogsignal_chunk(
1408+
self,
1409+
block_index: int,
1410+
seg_index: int,
1411+
i_start: int | None,
1412+
i_stop: int | None,
1413+
stream_index: int,
1414+
channel_indexes: list[int] | None,
1415+
):
1416+
1417+
stream_id = self.header["signal_streams"][stream_index]["id"]
1418+
buffer_id = self.header["signal_streams"][stream_index]["buffer_id"]
1419+
1420+
buffer_slice = self._stream_buffer_slice[stream_id]
1421+
1422+
1423+
buffer_desc = self.get_analogsignal_buffer_description(block_index, seg_index, buffer_id)
1424+
1425+
i_start = i_start or 0
1426+
i_stop = i_stop or buffer_desc['shape'][0]
1427+
1428+
if buffer_desc['type'] == "raw":
1429+
1430+
# open files on demand and keep reference to opened file
1431+
if not hasattr(self, '_memmap_analogsignal_buffers'):
1432+
self._memmap_analogsignal_buffers = {}
1433+
if block_index not in self._memmap_analogsignal_buffers:
1434+
self._memmap_analogsignal_buffers[block_index] = {}
1435+
if seg_index not in self._memmap_analogsignal_buffers[block_index]:
1436+
self._memmap_analogsignal_buffers[block_index][seg_index] = {}
1437+
if buffer_id not in self._memmap_analogsignal_buffers[block_index][seg_index]:
1438+
fid = open(buffer_desc['file_path'], mode='rb')
1439+
self._memmap_analogsignal_buffers[block_index][seg_index][buffer_id] = fid
1440+
else:
1441+
fid = self._memmap_analogsignal_buffers[block_index][seg_index][buffer_id]
1442+
1443+
num_channels = buffer_desc['shape'][1]
1444+
1445+
raw_sigs = get_memmap_chunk_from_opened_file(fid, num_channels, i_start, i_stop, np.dtype(buffer_desc['dtype']), file_offset=buffer_desc['file_offset'])
1446+
1447+
1448+
elif buffer_desc['type'] == 'hdf5':
1449+
1450+
# open files on demand and keep reference to opened file
1451+
if not hasattr(self, '_hdf5_analogsignal_buffers'):
1452+
self._hdf5_analogsignal_buffers = {}
1453+
if block_index not in self._hdf5_analogsignal_buffers:
1454+
self._hdf5_analogsignal_buffers[block_index] = {}
1455+
if seg_index not in self._hdf5_analogsignal_buffers[block_index]:
1456+
self._hdf5_analogsignal_buffers[block_index][seg_index] = {}
1457+
if buffer_id not in self._hdf5_analogsignal_buffers[block_index][seg_index]:
1458+
import h5py
1459+
h5file = h5py.File(buffer_desc['file_path'], mode="r")
1460+
self._hdf5_analogsignal_buffers[block_index][seg_index][buffer_id] = h5file
1461+
else:
1462+
h5file = self._hdf5_analogsignal_buffers[block_index][seg_index][buffer_id]
1463+
1464+
hdf5_path = buffer_desc["hdf5_path"]
1465+
full_raw_sigs = h5file[hdf5_path]
1466+
1467+
time_axis = buffer_desc.get("time_axis", 0)
1468+
if time_axis == 0:
1469+
raw_sigs = full_raw_sigs[i_start:i_stop, :]
1470+
elif time_axis == 1:
1471+
raw_sigs = full_raw_sigs[:, i_start:i_stop].T
1472+
else:
1473+
raise RuntimeError("Should never happen")
1474+
1475+
if buffer_slice is not None:
1476+
raw_sigs = raw_sigs[:, buffer_slice]
1477+
1478+
1479+
1480+
else:
1481+
raise NotImplementedError()
1482+
1483+
# this is a pre slicing when the stream do not contain all channels (for instance spikeglx when load_sync_channel=False)
1484+
if buffer_slice is not None:
1485+
raw_sigs = raw_sigs[:, buffer_slice]
1486+
1487+
# channel slice requested
1488+
if channel_indexes is not None:
1489+
raw_sigs = raw_sigs[:, channel_indexes]
1490+
1491+
1492+
return raw_sigs
1493+
1494+
def __del__(self):
1495+
if hasattr(self, '_memmap_analogsignal_buffers'):
1496+
for block_index in self._memmap_analogsignal_buffers.keys():
1497+
for seg_index in self._memmap_analogsignal_buffers[block_index].keys():
1498+
for buffer_id, fid in self._memmap_analogsignal_buffers[block_index][seg_index].items():
1499+
fid.close()
1500+
del self._memmap_analogsignal_buffers
1501+
1502+
if hasattr(self, '_hdf5_analogsignal_buffers'):
1503+
for block_index in self._hdf5_analogsignal_buffers.keys():
1504+
for seg_index in self._hdf5_analogsignal_buffers[block_index].keys():
1505+
for buffer_id, h5_file in self._hdf5_analogsignal_buffers[block_index][seg_index].items():
1506+
h5_file.close()
1507+
del self._hdf5_analogsignal_buffers
1508+
13531509

13541510
def pprint_vector(vector, lim: int = 8):
13551511
vector = np.asarray(vector)

neo/rawio/bci2000rawio.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
"""
22
BCI2000RawIO is a class to read BCI2000 .dat files.
33
https://www.bci2000.org/mediawiki/index.php/Technical_Reference:BCI2000_File_Format
4+
5+
Note : BCI2000RawIO cannot implemented using has_buffer_description_api because the buffer
6+
is not compact. The buffer of signals is not compact (has some interleaved state uint in between channels)
47
"""
58

69
import numpy as np
@@ -50,9 +53,11 @@ def _parse_header(self):
5053
self.header["nb_block"] = 1
5154
self.header["nb_segment"] = [1]
5255

53-
# one unique stream and buffer
54-
signal_buffers = np.array(("Signals", "0"), dtype=_signal_buffer_dtype)
55-
signal_streams = np.array([("Signals", "0", "0")], dtype=_signal_stream_dtype)
56+
# one unique stream but no buffer because channels are not compact
57+
stream_id = "0"
58+
buffer_id = ""
59+
signal_buffers = np.array([], dtype=_signal_buffer_dtype)
60+
signal_streams = np.array([("Signals", stream_id, buffer_id)], dtype=_signal_stream_dtype)
5661
self.header["signal_buffers"] = signal_buffers
5762
self.header["signal_streams"] = signal_streams
5863

@@ -80,8 +85,6 @@ def _parse_header(self):
8085
if isinstance(offset, str):
8186
offset = float(offset)
8287

83-
stream_id = "0"
84-
buffer_id = "0"
8588
sig_channels.append((ch_name, chan_id, sr, dtype, units, gain, offset, stream_id, buffer_id))
8689
self.header["signal_channels"] = np.array(sig_channels, dtype=_signal_channel_dtype)
8790

0 commit comments

Comments
 (0)