Skip to content

New alignment algorithm with char_split hitting n_text_ctx limit #469

@kaijchang

Description

@kaijchang

Thank you for implementing the new alignment algorithm, it seems to be working well out of the box.

However, when I enable char_split, it sometimes runs into RuntimeError: The size of tensor a (xxx) must match the size of tensor b (448) at non-singleton dimension 1 because there are now many more tokens when finding alignment.

Example of a full trace:

File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/cog/server/worker.py", line 352, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/predict.py", line 106, in predict
result = self.model.transcribe(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/whisper_word_level/original_whisper.py", line 714, in transcribe_stable
if inner_transcribe() is not None:
^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/whisper_word_level/original_whisper.py", line 636, in inner_transcribe
add_word_timestamps_stable(
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/timing.py", line 494, in add_word_timestamps_stable
align()
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/timing.py", line 459, in align
alignment = find_alignment_stable(model, tokenizer, text_tokens, mel, num_samples,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/timing.py", line 288, in find_alignment_stable
_compute_jump_indices(cache=cache, new=new, **kwargs)
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/timing.py", line 174, in _compute_jump_indices
weights = _compute_atten_weights_new(model, cache=cache, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/timing.py", line 135, in _compute_atten_weights_new
_compute_qks(model, tokenizer, text_tokens, mel, tokens, cache)
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/stable_whisper/timing.py", line 61, in _compute_qks
logits = model.decoder(tokens.unsqueeze(0), audio_features)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.14/lib/python3.11/site-packages/whisper/model.py", line 236, in forward
self.token_embedding(x)
RuntimeError: The size of tensor a (570) must match the size of tensor b (448) at non-singleton dimension 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions