-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I am sorry if I file so many issues at once. What I really wanted was to implement the feature explained in this issue, but when doing so I found the problems/discussions points filed on #34 #35 and #36.
The feature I would like to propose is an extension to load_xdf that is significantly faster but only reads the headers.
My use-case is the following: I am reading and importing all my XDF files into another database/filesystem and I wanted to save the information present on the headers in a database (hence my discussion on #36). The problem is that I have a lot of files, and some of them are big (about 3Gb, probably recordings that started and was then forgotten, but it could be a long recording session).
The way I plan to implement this is to use the tag that identifies the chunk header and the chunk size in bytes (https://github.com/sccn/xdf/wiki/Specifications#chunk). When the tag is not of types FileHeader (1) or StreamHeader (2) I will move the file pointer to the beginning of the next chunk.
I managed to achieve this with the following code:
def load_xdf(filename,
...,
headers_only=False):
...
# read [Tag]
tag = struct.unpack('<H', f.read(2))[0]
log_str = ' Read tag: {} at {} bytes, length={}'.format(tag, f.tell(), chunklen)
if tag in [2, 3, 4, 6]:
StreamId = struct.unpack('<I', f.read(4))[0]
log_str += ', StreamId={}'.format(StreamId)
logger.debug(log_str)
# ^^^ Keeping this code to show a reference of where the modification goes
# Quick read of header only: when the chunk if not a header, move
# the file pointer to the beginning of the next chunk
if headers_only and tag not in (1, 2):
offset = 2 # We already read 2 bytes for the tag
if tag in (2, 3, 4, 6):
# In these cases, we already read 4 bytes!
offset += 4
# Move n=chunklen-offset bytes forward, relative to current position (whence=1)
f.seek(chunklen - offset, 1)
continueWith this modification, I can cut down the read time of a 3Gb file from 2 1/2 minutes to 5 seconds. Considering I have several hundreds of files (not that big, though), I am saving quite a lot of time.
If you guys agree with this modification, I would gladly make a PR for it.