Replies: 1 comment 1 reply
-
|
have you figured out how to do it? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been using Whisper-large-v2 model from HuggingFace for local testing, and found setting 'return_timestamps=True' parameter in the ASR pipeline returns timestamped transcriptions (see below code snippet from HuggingFace page).
I would like to have access to these segment-level timestamps for an application I am working on, but it seems this parameter is not exposed in the Whisper Triton deployment here. Can anyone guide me on how I could set this parameter / access this output?
`import torch
from transformers import pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
prediction = pipe(sample.copy(), batch_size=8)["text"]
" Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."
prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
'timestamp': (0.0, 5.44)}]`
Beta Was this translation helpful? Give feedback.
All reactions