You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, the code is as follows and follows the basic preset provided by huggingface.
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-large-v3"
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
# test
result = pipe('Data.wav', generate_kwargs={"task":"transcribe", "language":"<|ko|>"})
print(result['text'])
wav file provided by AI Hub in South Korea.
There are hundreds of thousands of voice files, most of them are transcribe correctly.
but {"task":"transcribe", "language":"<|ko|>"}
Even if specify , there is an error in recognizing some specific files (not good quality, incorrect pronunciation of the speaker) as Japanese.
Whisper outputs the following results.
グッデナシがちんちゃんの思うとおか지고。
After testing the <|en|> option, it is successfully transcribed into English.
Whisper outputs the following results.
I was really scared.
In conclusion, the 'ko' option does not appear to work properly.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, the code is as follows and follows the basic preset provided by huggingface.
wav file provided by AI Hub in South Korea.
There are hundreds of thousands of voice files, most of them are transcribe correctly.
but
{"task":"transcribe", "language":"<|ko|>"}
Even if specify , there is an error in recognizing some specific files (not good quality, incorrect pronunciation of the speaker) as Japanese.
Whisper outputs the following results.
After testing the
<|en|>
option, it is successfully transcribed into English.Whisper outputs the following results.
In conclusion, the 'ko' option does not appear to work properly.
Does anyone know the solution?
Beta Was this translation helpful? Give feedback.
All reactions