Skip to content

Make soft merges/split faster #4310

@chrishalcrow

Description

@chrishalcrow

I've been implementing "live" curations in si-gui (SpikeInterface/spikeinterface-gui#224), and to make it more usable it would be great to speed up soft merges/splits. So this is an issue to try and direct some optimisation work! Just very quick and dirty: I benchmarked a single two-unit merge on a typical 1-hour NP2 recording. Here are the times to re-compute each extension, with metrics further split up. It's not all extensions, just the ones I use in gui.

Image

Times for each extension re-compute in seconds:

template_similarity,3.515722459065730
firing_rate,1.814500082982700
synchrony,1.3482307499507400
rp_violation,0.54650704190135
waveforms,0.1620308329584080
amplitude_cutoff,0.10835533309727900
templates,0.1060247499262910
correlograms,0.061571332975290700
amplitude_median,0.049069040920585400
spike_locations,0.02714691695291550
snr,0.016673249891027800
num_spikes,0.0160850411048159
amplitude_cv,0.012816166039556300
sliding_rp_violation,0.011563500040210800
unit_locations,0.010269250022247400
spike_amplitudes,0.004084207932464780
firing_range,0.0012063339818269000
repolarization_slope,0.000768749974668026
presence_ratio,0.0004059579223394390
recovery_slope,0.00016670895274728500
isi_violation,0.00012345798313617700
random_spikes,8.01659189164639E-05
sd_ratio,4.77919820696116E-05
half_width,1.93340238183737E-05
noise_levels,1.19999749585986E-05
peak_to_valley,1.12500274553895E-05
peak_trough_ratio,5.79189509153366E-06

(total = 7 seconds. Note: this doesn't scale linearly, two merges take much less time than 2 times one merge)

Good news: the slow ones seem reasonable to optimize! For a soft template (a linear combination of old templates) template similarity can be computed from already-computed info. Firing rate and rp_violation are slow because we re-compute for all units (which we don't need to do). synchrony should be numbafy-able.

Now a single split!

Image

template_similarity,4.082042583962900
correlograms,2.585401250049470
firing_rate,1.890601999941280
synchrony,1.315767249907370
templates,0.44488545798230900
rp_violation,0.43966795899905300
waveforms,0.12408262502867700
amplitude_cutoff,0.12074912490788800
amplitude_median,0.04578441695775840
amplitude_cv,0.022253792034462100
snr,0.016999625018797800
num_spikes,0.011724042007699600
sliding_rp_violation,0.011248083901591600
unit_locations,0.005210084025748070
firing_range,0.000748499995097518
presence_ratio,0.0006416250253096220
repolarization_slope,0.00044879200868308500
recovery_slope,0.0002276250161230560
isi_violation,0.00018608407117426400
random_spikes,0.00010933401063084600
sd_ratio,4.31250082328916E-05
half_width,1.81249342858791E-05
peak_to_valley,1.36250164359808E-05
noise_levels,1.11659755930305E-05
spike_amplitudes,8.66595655679703E-06
peak_trough_ratio,7.12496694177389E-06
spike_locations,5.91692514717579E-06

Harder to optimize splits: we do need to re-compute correlograms for a unit (pain), we do need to recompute the similarity between the new templates (pain). Maybe the entire template_similarity computation can be optimized better? Maybe could be numba-fied or sparsified in some way?

@yger @samuelgarcia please share any ideas! I'm happy to try and compete to make things faster. Seems like a fun project for new nerds on the project (@tayheau ;) ).

Let's post here if we decide to work on something. I'll work on easy optimizations in the metrics first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    curationRelated to curation modulepostprocessingRelated to postprocessing module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions