-
Notifications
You must be signed in to change notification settings - Fork 240
Description
I tried to sort a concatenated ChannelSliceRecordings (MaxWell recordings), which failed when writing the binary recording. I went on to try and save one of the ChannelSliceRecordings individually (ChannelSliceRecording: 355 channels - 10.0kHz - 1 segments - 18,000,600 samples - 1,800.06s (30.00 minutes) - uint16 dtype - 11.90 GiB) using sliced_recording.save_to_folder(save_path, n_jobs=-1), which also failed after a few minutes. Importantly, the progress bar did not move and was stuck at 0% 0/601 [23:22<?, ?it/s] indicating that it had not even started writing the file. I then tried to increase the number of cores (up to 72) and the amount of RAM available (up to 1TB), but none of it helped. Checking the resource monitor, I saw that no matter how much RAM I provided, it would fill up completely and then crash with an error like this:
Traceback (most recent call last):
File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 323, in run
self.terminate_broken(cause)
File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
work_item.future.set_exception(bpe)
File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f30c8d7d490 state=cancelled>
A process in the process pool was terminated abruptly while the future was running or pending.
When trying with n_jobs=1, the progress bar would start filling up, but writing the recording mentioned above would have taken ~77h. My suspicion is that for every job and every chunk the full recording is loaded to memory, but I had a hard time finding the code related to this issue.
All of this was run in Jupyter notebooks on our server (Ubuntu 18.04) with the most recent version of spikeinterface.
Since I have some time pressure to analyze this data, I would really appreciate any help in speeding up this process. Thank you!
EDIT: When saving the full recording or FrameSliceRecordings, the performance is as expected pretty fast, so it must be specific to ChannelSliceRecordings.