Skip to content

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

@hornauerp

Description

@hornauerp

I tried to sort a concatenated ChannelSliceRecordings (MaxWell recordings), which failed when writing the binary recording. I went on to try and save one of the ChannelSliceRecordings individually (ChannelSliceRecording: 355 channels - 10.0kHz - 1 segments - 18,000,600 samples - 1,800.06s (30.00 minutes) - uint16 dtype - 11.90 GiB) using sliced_recording.save_to_folder(save_path, n_jobs=-1), which also failed after a few minutes. Importantly, the progress bar did not move and was stuck at 0% 0/601 [23:22<?, ?it/s] indicating that it had not even started writing the file. I then tried to increase the number of cores (up to 72) and the amount of RAM available (up to 1TB), but none of it helped. Checking the resource monitor, I saw that no matter how much RAM I provided, it would fill up completely and then crash with an error like this:

Traceback (most recent call last):
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 323, in run
    self.terminate_broken(cause)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
    work_item.future.set_exception(bpe)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f30c8d7d490 state=cancelled>

A process in the process pool was terminated abruptly while the future was running or pending.

When trying with n_jobs=1, the progress bar would start filling up, but writing the recording mentioned above would have taken ~77h. My suspicion is that for every job and every chunk the full recording is loaded to memory, but I had a hard time finding the code related to this issue.

All of this was run in Jupyter notebooks on our server (Ubuntu 18.04) with the most recent version of spikeinterface.

Since I have some time pressure to analyze this data, I would really appreciate any help in speeding up this process. Thank you!

EDIT: When saving the full recording or FrameSliceRecordings, the performance is as expected pretty fast, so it must be specific to ChannelSliceRecordings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance issues/improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions