IndexError: index -1 is out of bounds for dimension 1 with size 0 #1478

ryzn0518 · 2023-06-29T06:31:33Z

ryzn0518
Jun 29, 2023

I have encountered a problem, when using large-v2 to load the model locally for inference, an error is reported.
the code, my PC is M2,

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
import soundfile
import torchaudio

base_model = "/Users/ddd/Documents/github/whisper-large-v2"
processor = WhisperProcessor.from_pretrained(base_model,
                                             language="zh",
                                             task="transcribe",
                                             local_files_only="True")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")

# 获取模型
model = WhisperForConditionalGeneration.from_pretrained(base_model,
                                                        device_map="auto",
                                                        local_files_only=True).half()
model.eval()

audio_file = "/Users/ddd/Documents/gitlab/llm-train/yuyin/simple.m4a"

src_signal, sample_rate = librosa.load(audio_file, sr=16000)

start = 23196064
end = 23364576

src_signal_demo = src_signal[start:end]
input_features = processor(src_signal_demo, sampling_rate=sample_rate, return_tensors="pt").input_features.half().to("mps")

prompt = '以下是普通话的句子'

prompt_ids = processor.get_prompt_ids(prompt)

forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
predicted_ids = model.generate(input_features, prompt_ids=prompt_ids, forced_decoder_ids=forced_decoder_ids,
                               max_new_tokens=3000)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

the error is

in <module>:9                                                                                    │
│                                                                                                  │
│    6 prompt_ids = processor.get_prompt_ids(prompt)                                               │
│    7                                                                                             │
│    8 forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")     │
│ ❱  9 predicted_ids = model.generate(input_features, prompt_ids=prompt_ids, forced_decoder_ids    │
│   10 │   │   │   │   │   │   │      max_new_tokens=3000)                                         │
│   11 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)             │
│   12 print("耗时:", time.time() - start_time, transcription)                                     │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/models/whisper/mo │
│ deling_whisper.py:1664 in generate                                                               │
│                                                                                                  │
│   1661 │   │   if generation_config.return_timestamps:                                           │
│   1662 │   │   │   logits_processor = [WhisperTimeStampLogitsProcessor(generation_config)]       │
│   1663 │   │                                                                                     │
│ ❱ 1664 │   │   return super().generate(                                                          │
│   1665 │   │   │   inputs,                                                                       │
│   1666 │   │   │   generation_config,                                                            │
│   1667 │   │   │   logits_processor,                                                             │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py:115 │
│ in decorate_context                                                                              │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/generation/utils. │
│ py:1522 in generate                                                                              │
│                                                                                                  │
│   1519 │   │   │   │   )                                                                         │
│   1520 │   │   │                                                                                 │
│   1521 │   │   │   # 11. run greedy search                                                       │
│ ❱ 1522 │   │   │   return self.greedy_search(                                                    │
│   1523 │   │   │   │   input_ids,                                                                │
│   1524 │   │   │   │   logits_processor=logits_processor,                                        │
│   1525 │   │   │   │   stopping_criteria=stopping_criteria,                                      │
│                                                                                                  │
│ /Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/transformers/generation/utils. │
│ py:2349 in greedy_search                                                                         │
│                                                                                                  │
│   2346 │   │   │   if synced_gpus and this_peer_finished:                                        │
│   2347 │   │   │   │   continue  # don't waste resources running the code we don't need          │
│   2348 │   │   │                                                                                 │
│ ❱ 2349 │   │   │   next_token_logits = outputs.logits[:, -1, :]                                  │
│   2350 │   │   │                                                                                 │
│   2351 │   │   │   # pre-process distribution                                                    │
│   2352 │   │   │   next_tokens_scores = logits_processor(input_ids, next_token_logits)

I had verified input_features

tensor([[[ 0.0771,  0.2803,  0.2590,  ..., -0.7373, -0.7373, -0.7373],
         [ 0.4045,  0.4724,  0.5020,  ..., -0.7373, -0.7373, -0.7373],
         [ 0.4485,  0.4192,  0.5005,  ..., -0.7373, -0.7373, -0.7373],
         ...,
         [-0.7373, -0.7373, -0.7373,  ..., -0.7373, -0.7373, -0.7373],
         [-0.7373, -0.7373, -0.7373,  ..., -0.7373, -0.7373, -0.7373],
         [-0.7373, -0.7373, -0.7373,  ..., -0.7373, -0.7373, -0.7373]]],
       device='mps:0', dtype=torch.float16)

and compute input length of audio in seconds

len(src_signal_demo) / sample_rate = 10.532

phineas-pta · 2023-06-29T12:21:39Z

phineas-pta
Jun 29, 2023

for inference it's easier to use something like

from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-large-v2",
    device="mps",
    chunk_length_s=30, # if not precised then only generate as much as `max_new_tokens`
    generate_kwargs = {"num_beams": 5} # same as setting as "openai whisper" default
)

prompt = 'YOUR PROMPT'
prompt_ids = pipe.tokenizer.get_prompt_ids(prompt, return_tensors="pt")
result = pipe("audio.mp3", generate_kwargs={"language": "zh", "task": "transcribe", "prompt_ids": prompt_ids})

print(result["text"])

also if possible share your audio to be tested

4 replies

ryzn0518 Jun 30, 2023
Author

@phineas-pta
Thank you very much. I am able to successfully load parts of the audio using the pipeline, and the 'large-v2' model produces the expected output.

I would like to ask, how can I achieve similar results to those produced by 'whisper.transcribe', such as:

whisper.transcribe(
    audio[int(seg["start"]) : int(seg["end"])],
    task="transcribe",
    language=self.args.lang,
    initial_prompt=self.args.prompt,
    verbose=False if len(speech_timestamps) == 1 else None,
)

This gives me output in the following format:

[{'id': 0, 'seek': 0, 'start': 0.0, 'end': 4.8, 'text': '這個數據是老闆找客服要來了嗎?這個數據我', 'tokens': [50364, 6287, 30622, 36841, 1541, 10439, 8259, 228, 25085, 32316, 27408, 4275, 25364, 7434, 30, 6287, 30622, 36841, 1654, 50604], 'temperature': 0.0, 'avg_logprob': -0.2989383652096703, 'compression_ratio': 0.9666666666666667, 'no_speech_prob': 0.02209618128836155}]

This output includes fields like 'id', 'seek', 'start', 'end', and 'text'. However, when using the pipeline, I only get the 'text' output."

phineas-pta Jun 30, 2023

simply result = pipe("audio.mp3", return_timestamps=True, ...)

now you have result["chunks"]

ryzn0518 Jun 30, 2023
Author

Sorry, I have replicated the issue I encountered previously, but I am unable to share the audio file publicly. Can I send it to you privately for troubleshooting?

phineas-pta Jun 30, 2023

sorry idk inner working of transformers to help

make sure to have latest version of transformers, also test other audio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IndexError: index -1 is out of bounds for dimension 1 with size 0 #1478

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

IndexError: index -1 is out of bounds for dimension 1 with size 0 #1478

Uh oh!

ryzn0518 Jun 29, 2023

Replies: 1 comment · 4 replies

Uh oh!

phineas-pta Jun 29, 2023

Uh oh!

Uh oh!

ryzn0518 Jun 30, 2023 Author

Uh oh!

phineas-pta Jun 30, 2023

Uh oh!

ryzn0518 Jun 30, 2023 Author

Uh oh!

phineas-pta Jun 30, 2023

ryzn0518
Jun 29, 2023

Replies: 1 comment 4 replies

phineas-pta
Jun 29, 2023

ryzn0518 Jun 30, 2023
Author

ryzn0518 Jun 30, 2023
Author