Feature request: read headers only

I am sorry if I file so many issues at once. What I really wanted was to implement the feature explained in this issue, but when doing so I found the problems/discussions points filed on #34 #35 and #36.

The feature I would like to propose is an extension to `load_xdf` that is significantly faster but only reads the headers.

My use-case is the following: I am reading and importing all my XDF files into another database/filesystem and I wanted to save the information present on the headers in a database (hence my discussion on #36). The problem is that I have a lot of files, and some of them are big (about 3Gb, probably recordings that started and was then forgotten, but it could be a long recording session).

The way I plan to implement this is to use the tag that identifies the chunk header and the chunk size in bytes (https://github.com/sccn/xdf/wiki/Specifications#chunk). When the tag is not of types FileHeader (1) or StreamHeader (2) I will move the file pointer to the beginning of the next chunk.

I managed to achieve this with the following code:
```python
def load_xdf(filename,
             ...,
             headers_only=False):

           ...

            # read [Tag]
            tag = struct.unpack('<H', f.read(2))[0]
            log_str = ' Read tag: {} at {} bytes, length={}'.format(tag, f.tell(), chunklen)
            if tag in [2, 3, 4, 6]:
                StreamId = struct.unpack('<I', f.read(4))[0]
                log_str += ', StreamId={}'.format(StreamId)

            logger.debug(log_str)
            # ^^^ Keeping this code to show a reference of where the modification goes

            # Quick read of header only: when the chunk if not a header, move
            # the file pointer to the beginning of the next chunk
            if headers_only and tag not in (1, 2):
                offset = 2   # We already read 2 bytes for the tag
                if tag in (2, 3, 4, 6):
                    # In these cases, we already read 4 bytes!
                    offset += 4
                # Move n=chunklen-offset bytes forward, relative to current position (whence=1)
                f.seek(chunklen - offset, 1)
                continue

```

With this modification, I can cut down the read time of a 3Gb file from 2 1/2 minutes to 5 seconds. Considering I have several hundreds of files (not that big, though), I am saving quite a lot of time.
If you guys agree with this modification, I would gladly make a PR for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: read headers only #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: read headers only #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions