Saving of ChannelSliceRecordings inefficient/basically unusable

I tried to sort a concatenated ChannelSliceRecordings (MaxWell recordings), which failed when writing the binary recording. I went on to try and save one of the ChannelSliceRecordings individually (ChannelSliceRecording: 355 channels - 10.0kHz - 1 segments - 18,000,600 samples - 1,800.06s (30.00 minutes) - uint16 dtype - 11.90 GiB) using `sliced_recording.save_to_folder(save_path, n_jobs=-1)`, which also failed after a few minutes. Importantly, the progress bar did not move and was stuck at 0% 0/601 [23:22<?, ?it/s] indicating that it had not even started writing the file. I then tried to increase the number of cores (up to 72) and the amount of RAM available (up to 1TB), but none of it helped. Checking the resource monitor, I saw that no matter how much RAM I provided, it would fill up completely and then crash with an error like this: 
```Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 323, in run
    self.terminate_broken(cause)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
    work_item.future.set_exception(bpe)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f30c8d7d490 state=cancelled>

A process in the process pool was terminated abruptly while the future was running or pending.
```

When trying with `n_jobs=1`, the progress bar would start filling up, but writing the recording mentioned above would have taken ~77h. My suspicion is that for every job and every chunk the full recording is loaded to memory, but I had a hard time finding the code related to this issue. 

All of this was run in Jupyter notebooks on our server (Ubuntu 18.04) with the most recent version of spikeinterface. 

Since I have some time pressure to analyze this data, I would really appreciate any help in speeding up this process. Thank you!

EDIT: When saving the full recording or FrameSliceRecordings, the performance is as expected pretty fast, so it must be specific to ChannelSliceRecordings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions