Replies: 1 comment 4 replies
-
for inference it's easier to use something like from transformers import pipeline
pipe = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-large-v2",
device="mps",
chunk_length_s=30, # if not precised then only generate as much as `max_new_tokens`
generate_kwargs = {"num_beams": 5} # same as setting as "openai whisper" default
)
prompt = 'YOUR PROMPT'
prompt_ids = pipe.tokenizer.get_prompt_ids(prompt, return_tensors="pt")
result = pipe("audio.mp3", generate_kwargs={"language": "zh", "task": "transcribe", "prompt_ids": prompt_ids})
print(result["text"]) also if possible share your audio to be tested |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
the code, my PC is M2,
the error is
input_features
and compute input length of audio in seconds
Beta Was this translation helpful? Give feedback.
All reactions