[WIP] Improve `.outputSeek()` performance #25

geraintluff · 2025-10-17T17:27:22Z

When starting sample playback, the .outputSeek() method computes a bunch of output up-front, so that the first call to .process() produces output for the beginning of the sample.

Previously, this was done by .seek()ing into the input, producing .outputLatency() samples using normal processing. The pre-roll output is "reflected back" (reversed and phase-inverted before adding back in, to avoid the click you'd get from truncation).

Here's a diagram of the analysis (top) and synthesis (bottom) windows for the first 10 output blocks, including those computed during.outputSeek():

The input for t < 0 is effectively zeroes, and the output for t < 0 is reflected back. This is with 4x overlap, so there are two output windows which need to be computed up-front before the "actual" output is ready. There are also two analysis windows for each output block, because this is performing a time-stretch.

Improvement 1: skip one analysis

The first change in this branch/PR is the internal flag assumePreviousBlockZero. This is actually true after a .reset(), but we pretend it's true directly after .outputSeek(). This means we avoid one input-block analysis when time-stretching:

Affect on the sound

The first output block has no "previous input" to use for a phase-vocoder prediction. This isn't a problem, since that phase-vocoder prediction is only really needed to remain phase-aligned to a previous block.

If anything, I would expect this first block to actually be clearer on initial transients, but that needs to be backed up by thorough listening tests.

Improvement 2: different initial window shape

This change changes the window shape (and block centre) at the start of .outputSeek(). The previous window shapes are then restored for the next block:

Since the input is padded with zeros, and the output gets folded back, this first block's window doesn't need to extend very far before t=0. However, since the analysis/synthesis "window offsets" (marked with a dot in these diagrams) are the reference time for the analysis/synthesis, we also perform an additional phase-shift to adjust for this changing when we restore the original window shapes/offsets.

Limits to the performance improvement

If splitComputation is turned off, then Stretch periodically does a big chunk of work as it computes the next output block. In an environment with large buffer sizes (or enough buffering to handle the uneven CPU use), this isn't a problem - here's the computation time if we compute chunks of 2048 samples at a time:

However, for 512 samples we can see that most blocks barely do any work:

Split-computation

Split-computation mode spreads this work out more evenly (without any threading stuff), at the expense of some extra output latency:

This difference is even more dramatic for smaller buffer sizes. Here's the difference if we use 100-sample buffers:

However! On all of these plots I've added the time of the .outputSeek() call, scaled to CPU% as if it's a single 512/2048/... buffer. (The width shows the amount of pre-roll it actually has to compute.) Since this is up-front work to get the Stretch instance ready to produce actual output, split-computation doesn't help.

Future alternatives

I think I've taken this particular approach as far as it can reasonably go.

The only option I can see for getting the CPU cost of .outputSeek() all the way down (to match the smoothedComputation case) is to use a much cheaper method to generate that initial output. We would then need to re-analyse that output so we can phase-match it from the next typical Stretch processing-block.

The most general approach would be to allow the user to generate some initial output themselves, and then tell Stretch to continue based on that output.

…n restoring

geraintluff added 11 commits August 11, 2025 17:30

Add asymmetry parameter

103513b

Bump Linear to 0.2.8

e8f4c8a

Plot processing time for chunks, including output-seek

8ea9c1d

Plot input/output windows

395b8a5

Mark centre time for each window

c384930

Skip some processing when assumePreviousBlockZero

14dbada

Adjust prevInput and output to account for STFT offset change whe…

6bb1cf1

…n restoring

Reset STFT interval as one of the processing stages

7b5545f

Swap outputs before reversing when computing .outputSeek()

47cd56b

Refactor stored config into its own object

e711ff2

Fix alignment issue when reducing length (playback rate > 1)

f5bf622

geraintluff force-pushed the performance/output-seek branch from ab1d3f5 to f5bf622 Compare October 21, 2025 14:40

geraintluff and others added 3 commits October 21, 2025 16:03

Make .flush() clearer

0150263

Fix .reset() and .exact()

b0ece89

Store input/output latency, for .outputSeek() consistency

14f83ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Improve `.outputSeek()` performance #25

[WIP] Improve `.outputSeek()` performance #25

Uh oh!

geraintluff commented Oct 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Improve .outputSeek() performance #25

Are you sure you want to change the base?

[WIP] Improve .outputSeek() performance #25

Uh oh!

Conversation

geraintluff commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improvement 1: skip one analysis

Affect on the sound

Improvement 2: different initial window shape

Limits to the performance improvement

Split-computation

Future alternatives

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Improve `.outputSeek()` performance #25

[WIP] Improve `.outputSeek()` performance #25

geraintluff commented Oct 17, 2025 •

edited

Loading