-
Notifications
You must be signed in to change notification settings - Fork 118
Description
@tomxi 's ISMIR paper from last year derives exact versions of several segmentation metrics. The idea, which was noted early on (by @dpwe as I recall) but which we never quite acted upon, is that all of these sampling-based metrics will generally, in the limit as the frame rate → ∞, converge to functions of interval durations. Done correctly, the time complexity of the evaluators should scale as a function of the number of boundaries (which dictate the number of piecewise-constant label regions), not the number of frames (proportional to track duration).
Tom's implementation here: https://github.com/tomxi/frameless-eval shows this for three representative metrics: pairwise frame classification, v-measure (and NCE), and L-measure. The take-away message here is that compared to the typical frame rate of 10 Hz, the exact implementations are orders of magnitude more efficient, and more "accurate" (in the sense that sampling is always an approximation to the exact method).
This, generally, seems like a big win, as these evaluations can get relatively costly (especially for hierarchies) when put in the validation loop of model development.
What's the proposal
For backward compatibility, we obviously need to retain the sampling-based implementations and their default frame rates.
I do think we can adopt the exact implementations (selected by frame_size=None) with no appreciable impact on default behavior. That said, I think it's worth considering a deprecation cycle on the default rates to transition everyone to the exact implementations by default (with sampled versions available as backup).
Thoughts?