Replies: 1 comment
-
up |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I Met the issue in title when im trying to use whisper large to do the recognition.
Here is my setup:
model_id = "openai/whisper-large-v3"
whisper_model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=whisper_model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=8,
torch_dtype=torch_dtype,
device=device,
)
result = pipe(audio_data, generate_kwargs={"language": "chinese"}, return_timestamps=True)
and i do see some problems, for example, about 20% for the output files doesnt have the timestamp,
also sometimes, the model recognize background music(no lyrics, just melody) as the speech. and sometimes it has some werid generetion.
May i know if those problems are due to some wrong setting(like the attention mask one) or just sometimes the model cant do it correctly,
Beta Was this translation helpful? Give feedback.
All reactions