tensor mismatch error when load testing whisper #2019

gupta9ankit5 · 2024-02-13T11:25:52Z

gupta9ankit5
Feb 13, 2024

This is during the load testing (using locust) of the whisper model (deployed as flask application) that I am getting an error as below. The strange thing about it, is the error occurs only when the number of users per second is set to greater than 1. There is no error at all when number of users is 1.

Error: RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [12, 1198] but got: [12, 598].

Detailed stack trace is as follows:

[2024-02-13 11:12:44,102] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/flask/app.py", line 1463, in wsgi_app
response = self.full_dispatch_request()
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/flask/app.py", line 872, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/flask/app.py", line 870, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/flask/app.py", line 855, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/Users/user101/Documents/whisper_latest_version_LT/whisp.py", line 44, in transcribe
response = get_transcriptions(audios, whisper_variant)
File "/Users/user101/Documents/whisper_latest_version_LT/whisp.py", line 29, in get_transcriptions
transcriptions = [small.transcribe(data, fp16=False)['text'] for data in datas]
File "/Users/user101/Documents/whisper_latest_version_LT/whisp.py", line 29, in
transcriptions = [small.transcribe(data, fp16=False)['text'] for data in datas]
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/transcribe.py", line 240, in transcribe
result: DecodingResult = decode_with_fallback(mel_segment)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/transcribe.py", line 170, in decode_with_fallback
decode_result = model.decode(segment, options)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/decoding.py", line 824, in decode
result = DecodingTask(model, options).run(mel)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/decoding.py", line 737, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/decoding.py", line 687, in _main_loop
logits = self.inference.logits(tokens, audio_features)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/decoding.py", line 163, in logits
return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/model.py", line 211, in forward
x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/model.py", line 136, in forward
x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)[0]
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/model.py", line 90, in forward
wv, qk = self.qkv_attention(q, k, v, mask)
File "/Users/user101/Documents/whisper_latest_version_LT/lib/python3.9/site-packages/whisper/model.py", line 108, in qkv_attention
return (w @ v).permute(0, 2, 1, 3).flatten(start_dim=2), qk.detach()
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [12, 1198] but got: [12, 598].

glangford · 2024-02-13T16:26:56Z

glangford
Feb 13, 2024

You cannot use a model across multiple threads. See the discussion here.

The size of tensor a must match the size of tensor at non-singleton dimension #951

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tensor mismatch error when load testing whisper #2019

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

tensor mismatch error when load testing whisper #2019

Uh oh!

Uh oh!

gupta9ankit5 Feb 13, 2024

Replies: 1 comment

Uh oh!

glangford Feb 13, 2024

gupta9ankit5
Feb 13, 2024

glangford
Feb 13, 2024