Accelerate the Whisper decoding with CTranslate2 #937

guillaumekln · 2023-02-07T15:20:48Z

guillaumekln
Feb 7, 2023

Hello,

We integrated the Whisper model in CTranslate2, which is a fast inference engine for Transformer models. The project implements many useful inference features such as optimized CPU and GPU execution, asynchronous execution, multi-GPU execution, 8-bit quantization, etc.

The library only implements the decoding part (equivalent to model.decode here), but you can find a possible implementation of the full transcription logic in this repository:

https://github.com/guillaumekln/faster-whisper

For example, here's the transcription time of 13 minutes of audio on a V100 for the same accuracy:

Implementation	Time with "small" model	Time with "medium" model
openai/whisper	1m37s	3m16s
CTranslate2	0m25s	0m42s

Hopefully this can be useful to some of you!

Feel free to ask questions here.

Best,
Guillaume

lingxiaoxue · 2023-02-10T08:42:55Z

lingxiaoxue
Feb 10, 2023

Why is an error reported when loading the model to the GPU?

And use whisper's model.transcribe with less delay when using cpu? I use "time.perf_counter()" to calculate time。

17 replies

lingxiaoxue Feb 16, 2023

I'm using this audio, starting at about 43 seconds ：https://www.youtube.com/watch?v=rOeRWRJ16yY

lingxiaoxue Feb 16, 2023

Can you verify you are using the latest version ctranslate2>=3.5.1?
I am using 3.5.0

lingxiaoxue Feb 16, 2023

Can you verify you are using the latest version ctranslate2>=3.5.1?

Solved, thank you very much

lingxiaoxue Feb 21, 2023

Hello, I found some auditory hallucinations. Whisper didn't show up。

Whisper will also have the following hallucinations problem：

The audio content is indeed blank, and no noise can be heard.

Brodski May 7, 2023

Hi Lingxiaoxue, you might find the answer in one of the issues of faster-whisper. I solved my problem with vad_filter=True.

model.transcribe(audio_path, language="en", vad_filter=True)

ItakeLs · 2023-02-10T17:07:04Z

ItakeLs
Feb 10, 2023

Do you have any templates/examples of how you would transcribe longer than 30s and does the generate function still use decoding options arguments, such as beam_size, temperature, etc. How is the performance compared to before the modifications of CTranslate2?

14 replies

guillaumekln Feb 14, 2023
Author

There was a small bug in the decoding code. Can you make sure you have the latest version ctranslate2>=3.5.1?

sanjaye218 Feb 14, 2023

am using ctranslate2 version 3.5.1.

guillaumekln Feb 14, 2023
Author

Is it possible for you to share the input audio file?

sanjaye218 Feb 14, 2023

Sorry can't share audio as those are from my customers. Let me find audio which i can share

guillaumekln Feb 23, 2023
Author

I suggest that you try again with the latest versions of ctranslate2 and the faster-whisper repository. There were several small changes to make the behavior closer to the original Whisper implementation.

Also note that the "large" model in openai/whisper is actually the new "large-v2" model. So you should make sure to use openai/whisper-large-v2 in the conversion command when trying to compare.

bakermanbrian · 2023-02-11T02:30:35Z

bakermanbrian
Feb 11, 2023

Just ran into the following error trying to run this locally:
ct2-transformers-converter --model openai/whisper-tiny --output_dir whisper-tiny-ct2
Traceback (most recent call last):
File "/usr/local/bin/ct2-transformers-converter", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/ctranslate2/converters/transformers.py", line 718, in main
converter.convert_from_args(args)
File "/usr/local/lib/python3.9/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/usr/local/lib/python3.9/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
File "/usr/local/lib/python3.9/site-packages/ctranslate2/converters/transformers.py", line 63, in _load
config = transformers.AutoConfig.from_pretrained(self._model_name_or_path)
File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 766, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/usr/local/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 473, in getitem
raise KeyError(key)
KeyError: 'whisper'

2 replies

guillaumekln Feb 11, 2023
Author

You probably need to update the transformers package to a more recent version. Can you try that?

bakermanbrian Feb 11, 2023

success, thank you!

ghost · 2023-02-12T11:17:59Z

ghost
Feb 12, 2023

2 replies

guillaumekln Feb 12, 2023
Author

On Windows you currently get an error if you try to use the GPU.

You can still use this project to run transcriptions on CPU which is also heavily optimized. But of course it will be much slower than the GPU.

I will try to address this Windows limitation in the coming days. In the meantime, maybe WSL is a possible solution?

https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl

guillaumekln Feb 16, 2023
Author

Good news! GPU execution on Windows is now working with the latest version of CTranslate2 (pip install ctranslate2>=3.6).

You should install cuBLAS 11.x and cuDNN 8.x on the system and make sure to update the PATH environment variable accordingly.

huynhthanh98 · 2023-02-13T13:50:12Z

huynhthanh98
Feb 13, 2023

Hi!

Is there a way to convert the whisper weights that I have finetune on your code?

Thanks.

6 replies

huynhthanh98 Feb 14, 2023

Hi,

I have not finetune with transformer framework, but i have a pt file. Can i convert from pt file in torch?

Thank you.

guillaumekln Feb 14, 2023
Author

Currently there is no direct way to convert a .pt Whisper model that is not coming from Transformers.

huynhthanh98 Feb 27, 2023

Hi @guillaumekln,

Thank you for your reply, i have another question.

Can i run faster-whisper with batch_size > 1 to make use of vram?

guillaumekln Feb 27, 2023
Author

Hi,

Currently faster-whisper implements the same algorithm as openai/whisper so there is no batch processing.

huynhthanh98 Feb 28, 2023

Hi,

Thanks for your reply.

sanjaye218 · 2023-02-15T05:00:09Z

sanjaye218
Feb 15, 2023

Hi,
I am getting following error while adding num_workers= parameter.

TypeError: WhisperModel.init() got an unexpected keyword argument 'num_workers'

6 replies

sanjaye218 Feb 15, 2023

Its working after taking latest.

If num_workers =2 and am running on single GPU with 24 GB Memory, will it automatically do parellel processing. How can I initiate parellel processing.

guillaumekln Feb 15, 2023
Author

It is explained in the docstring of this argument. You need to call transcribe from multiple Python threads. So this option is to transcribe multiple files in parallel.

However, this option is more useful for CPU execution than GPU execution. Even though each worker is using a different CUDA stream, you will most likely not see important gains on a single GPU.

sanidhya14 Jun 11, 2023

It is explained in the docstring of this argument. You need to call transcribe from multiple Python threads. So this option is to transcribe multiple files in parallel.

However, this option is more useful for CPU execution than GPU execution. Even though each worker is using a different CUDA stream, you will most likely not see important gains on a single GPU.

I see this mentioned in doc-string. But how is it possible from the same process as i am trying to do the same using CPU, but i don't see any parallelization, does Python GIL not block other threads and allow only 1 thread to run for cpu computation.

sanidhya14 Jun 11, 2023

It is explained in the docstring of this argument. You need to call transcribe from multiple Python threads. So this option is to transcribe multiple files in parallel.

However, this option is more useful for CPU execution than GPU execution. Even though each worker is using a different CUDA stream, you will most likely not see important gains on a single GPU.

FYI -> I am trying to do VAD filtering on single audio to generate mini audio batches & trying to execute those on model in parallel to achieve batch processing for even faster execution

guillaumekln Jun 11, 2023
Author

The Python GIL is released when calling the model because it is fully running in C++ code. This enables parallelization from multiple Python threads.

sanjaye218 · 2023-02-15T05:47:29Z

sanjaye218
Feb 15, 2023

I am using whisper large model and transcribe English as well as Spanish audios. What is best bean_size, compute_type and quantization parameters. Currently I am suing bean_size=5, compute_type =float16 and quantization = float 16

5 replies

guillaumekln Feb 15, 2023
Author

beam_size: the original Whisper paper is using beam_size=5 so I recommend this value unless you prioritize speed over quality. In this case you can try using 2 or 1.
quantization: I recommend "float16" here. This way the model size is halved and there is still enough precision to choose a different compute_type when loading the model.
compute_type: for GPU execution "float16" is good enough. However, I haven’t tried "int8_float16" with the large model. It may be faster (or not).

sanjaye218 Feb 15, 2023

Will there be any impact on quality if we use compute_type as int8_float16

guillaumekln Feb 15, 2023
Author

In our experiments with the small model, we found that int8 quantization has almost no impact on quality. The WER score was only 0.1 lower.

sanjaye218 Feb 15, 2023

with float16, missing some segments where voice is low. what other quantization or compute_type can give better results.

Also, how can i use quantization as float 32 with whisper large model, I am getting error invalid choice float32

guillaumekln Feb 16, 2023
Author

When converting the model, you can remove the --quantization option to save weights in full precision.
When loading the model, you can set compute_type="float" to run in full precision.

guillaumekln · 2023-02-16T16:51:54Z

guillaumekln
Feb 16, 2023
Author

I worked on a new optimization to further reduce the memory usage (20 to 30% reduction) and slightly increase the execution speed.

See the latest results in the Benchmark section of the README.

10 replies

dgoryeo Feb 23, 2023

Thanks for the workaround @guillaumekln . I'll give it a try during the weekend. Will report back.

guillaumekln Mar 6, 2023
Author

With ctranslate2==3.8.0 there should now be no need for a workaround. The memory usage is reduced when converting the model with --quantization float16 so the conversion should now succeed in Colab.

yshalsager Mar 18, 2023

@guillaumekln I am still having the RAM issue under colab but only with large-v2 model, here's my notebook. I am using latest code, with ctranslate2-3.9.1. What do you suggest?

guillaumekln Mar 18, 2023
Author

When creating a TransformersConverter, try setting load_as_float16=True:

transforms_converter = TransformersConverter(
    model_name_or_path=f"openai/whisper-{model}",
    copy_files=["tokenizer.json"],
    load_as_float16=True,
)

This parameter is automatically set when using the conversion script, but not when using the conversion API directly.

yshalsager Mar 18, 2023

@guillaumekln Thanks so much, worked perfectly!

vopbs · 2023-02-28T10:34:32Z

vopbs
Feb 28, 2023

I encountered this error in GPU implementation. I hope you can help me.

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

14 replies

vopbs Mar 2, 2023

If the voice gets longer, he will lose the marker.

guillaumekln Mar 2, 2023
Author

Hi, if I want to convert the result to json, do I have to write it myself?

Yes.

vopbs Mar 2, 2023

Thank you so much. I'm going to put a lot of money into running and see the final result, and then I'll give you feedback.

But some of the punctuation marks are missing. I'll try it first. Thank you very much for your help.

vopbs Mar 2, 2023

Also found a serious problem, there are some words that are not our audio files, all in the last sentence?

guillaumekln Mar 3, 2023
Author

That sounds like an issue with the Whisper model itself, not an issue with the faster-whisper implementation.

jbkkd · 2023-03-01T11:14:36Z

jbkkd
Mar 1, 2023

@guillaumekln First of all, thanks for this amazing library - it's an amazing improvement, and should really be the default implementation.

I have a question on speaker separation. The audio files I'm working with have two speakers - one per stereo channel.

How feasible would it be to process those files using faster-whisper? I can see you're mixing the audio to Mono to do the transcription, but I couldn't figure out from the code how you'd process two separate channels independently.

4 replies

guillaumekln Mar 1, 2023
Author

Can you check if something like this works?

import av
import numpy as np


def decode_audio(input_file, sampling_rate=16000, split_stereo=False):
    fifo = av.audio.fifo.AudioFifo()
    resampler = av.audio.resampler.AudioResampler(
        format="s16",
        layout="stereo" if split_stereo else "mono",
        rate=sampling_rate,
    )

    with av.open(input_file) as container:
        # Decode and resample each audio frame.
        for frame in container.decode(audio=0):
            frame.pts = None
            for new_frame in resampler.resample(frame):
                fifo.write(new_frame)

        # Flush the resampler.
        for new_frame in resampler.resample(None):
            fifo.write(new_frame)

    frame = fifo.read()

    # Convert s16 back to f32.
    array = frame.to_ndarray().flatten().astype(np.float32) / 32768.0

    if split_stereo:
        left_channel = array[0::2]
        right_channel = array[1::2]
        return left_channel, right_channel

    return array


left_audio, right_audio = decode_audio(audio_path, split_stereo=True)

# model.transcribe(left_audio)
# model.transcribe(right_audio)

You need the latest commit of faster-whisper in order to directly pass the decoded audio to the transcribe method.

jbkkd Mar 1, 2023

That works perfectly! Thanks

databill86 Mar 31, 2023

Is there an effective way to reorder the transcriptions from left audio and right audio so that we can reconstruct the full dialogue?
Something like:
[start, end] speaker1: ...
[start, end] speaker2: ...
I'm having troubles to get a decent result because of the overlapping (large) timestamps. Also, when we separate the audio into left and right, the text transcriptions are long.

guillaumekln Mar 31, 2023
Author

Did you try using the word-level timestamps returned with word_timestamps=True?

fabriziofeitosa · 2023-03-04T22:54:22Z

fabriziofeitosa
Mar 4, 2023

Hello! Sorry for my bad english, it's not my native language. I'm a beginner in this language (py) and that's why I have some questions that may sound horrible.

Today I already use whisper in a functional way with the command:
whisper audio.mp3 --model small --language pt

How do I use your script in my native language (Portuguese)?

I understood that I must run pip install "faster-whisper @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/master.tar.gz" to install. And then what should I do?

Again I apologize for such a lay question and my bad English.

3 replies

guillaumekln Mar 5, 2023
Author

Hi,

At this time the faster-whisper project is just a Python library. It does not include command line tools.

To run it from the command line you would need to define your own Python script as shown in the README.

mayeaux Mar 31, 2023

This module was created recently which allows CLI access: https://github.com/jordimas/whisper-ctranslate2
I can confirm it works well

turnkit Apr 11, 2023

For command line Jordimas is the way. https://github.com/jordimas/whisper-ctranslate2

(Note that both of these projects use a different default beam size.)

guillaumekln · 2023-03-15T16:11:55Z

guillaumekln
Mar 15, 2023
Author

The repository faster-whisper has been updated to support word-level timestamps!

You can find a basic usage example in the README.

10 replies

sproocht Mar 30, 2023

Sure! Here is a sample transcription. I am attaching the audio file as well. It is in Luxembourgish but setting the language as "German" works as well. Thanks in advance!

Ctranslate2 version:
En ale Mann huet dem hollännesche Schrëftsteller Simon Carmichael eng Kéier verzielt, dass e véierzeg Joer mat senger Fra bestuet wier. An en huet se dunn mat engem Buch verglach, op wéi eng Manéier. Mat der Bibel? Datt een all Dag kënnt dra bliederen. An e mécht ëmmer d' Fangerne aus. Wann een d' Buch fir de Kapp schléit, ass net onbedéngt d' Buch, wat huel ass. Wann d' Fra e Mount laang am Keller bei Waasser a Brout agespaart ass, an de meescht d' Tier routen ass, wär vill méi fein. Wann s de d' Buch méi liess, dann ass et méi frech des. Also ech géif soen, dat wat de Georges an de Marcel gesot hunn, wier richteg. Ech soen näischt Schlechtes iwwer meng Fra, huet de Mann gemengt, mee se ass eppes wéi e schéint Buch, mee leider hunn ech schon eng Zäit laang fäerdeg gelies, an duerfir steet der ëmmer egal.

Whisper library's version:
En ale Mann huet dem hollännesche Schrëftsteller Simon Carmichael eng Kéier verzielt, dass e véierzeg Joer mat senger Fra bestuet wier. An en huet se dunn mat engem Buch verglach, op wéi eng Manéier. Mat der Bibel? **Ah**, datt een all Dag kënnt dra bliederen. **Ho!** An e mécht ëmmer d' Fangernees. Jo! An een, datt een d' Buch fir dee Kapp schléit, ass net onbedéngt d' Buch, wat huel ass. **Ehh?** Wann d' Fra e Mount laang am Keller bei Waasser a Brout agespaart ass, an de meescht d' Tier routen ass, sou huet hie vill méi fein.**Ehh?** Nee, wann s de d' Buch méi lies, dann ass et méi frech des. Also ech géif soen, dat wat de Georges an de Marcel gesot hunn, wier richteg. Ech soen näischt Schleches iwwer meng Fra, wat de Mann gemengt, mee se ass eppes wéi e schéint Buch, mee leider hunn ech schon eng Zäit laang fäerdeg gelies, an duerfir steet der ëmmer egal.

DK_2019_Housen_Fraa_Buch.zip

guillaumekln Mar 31, 2023
Author

Thanks! Can you also specify the model size and transcription options you are using? I tried a few configurations but I was not able to reproduce this output.

sproocht Mar 31, 2023

You can actually use either the medium and large model. Here is some sample code for testing (the detected language is "de" and so the transcription text may be different from the above, but that is OK. You still get some filler words like "Oh!", even when you set the language to "en"):
import whisper
model = whisper.load_model("medium", in_memory= True)
result = model.transcribe(audio_file, verbose = True)

With Ctranslate2, I use the following code (same result without beam size or length penalty):
from faster_whisper import WhisperModel
model = WhisperModel(ct_model_path, device="cuda", compute_type="float16")
segments, info = model.transcribe(audio_file, beam_size=5, length_penalty=1.5 )

guillaumekln Mar 31, 2023
Author

In openai/whisper, model.transcribe uses a beam size of 1 by default. Can you try using the same value with faster-whisper?

from faster_whisper import WhisperModel 
model = WhisperModel(ct_model_path, device="cuda", compute_type="float16") 
segments, info =  model.transcribe(audio_file, beam_size=1)

sproocht Mar 31, 2023

Nice! That solved the issue. Thanks a lot! I tried other options earlier except beam_size=1. :-)
Here is the output from CTranslate2:

Detected language 'de' with probability 0.462646
31/03/2023 16:38:42 : Getting transcriptions
[0.00s -> 30.00s] En ale Mann huet dem hollännesche Schrëftsteller Simon Carmichael eng Kéier verzielt, dass e véierzeg Joer mat senger Fra bestuet wier. An en huet se dunn mat engem Buch verglach, op wéi eng Manéier. Mat der Bibel? Ah, datt een all Dag kënnt dra bliederen. Ho! An e mécht ëmmer d' Fangernees. Jo! An een, datt een d' Buch fir dee Kapp schléit, ass net onbedéngt d' Buch, wat huel ass. Ehh? Wann d' Fra e Mount laang am Keller bei Waasser a Brout agespaart ass, an de meescht d' Tier routen ass, sou huet hie vill méi fein.
[30.00s -> 48.20s] Ehh? Nee, wann s de d' Buch méi lies, dann ass et méi frech des. Also ech géif soen, dat wat de Georges an de Marcel gesot hunn, wier richteg. Ech soen näischt Schleches iwwer meng Fra, wat de Mann gemengt, mee se ass eppes wéi e schéint Buch, mee leider hunn ech schon eng Zäit laang fäerdeg gelies, an duerfir steet der ëmmer egal.
31/03/2023 16:38:47 : Finished getting transcriptions

mayeaux · 2023-03-29T21:21:32Z

mayeaux
Mar 29, 2023

Works amazing! I will move to use this on freesubtitles.ai , thanks for the great work!

0 replies

vopbs · 2023-04-10T05:32:47Z

vopbs
Apr 10, 2023

I need your help now, encounter such a problem, I have such an audio, I am a dual-track voice, he is two people calling audio. How do I separate the results of his recognition by orbit.

2 replies

guillaumekln Apr 10, 2023
Author

What do you mean by orbit?

In this release note https://github.com/guillaumekln/faster-whisper/releases/tag/v0.4.0 there is an example to separate stereo audio channels.

vopbs Apr 10, 2023

If I dismantle the mono and double channels, his recognition accuracy will be greatly reduced. If it is not dismantled, the recognition accuracy is relatively high. Is there any good way to identify it later? this can also speed up the recognition speed, because if it is dismantled, it will be identified twice, and if it is not dismantled, it will be enough.

huynhthanh98 · 2023-04-13T08:14:49Z

huynhthanh98
Apr 13, 2023

Hi @guillaumekln ,

Because i use a custom model so that my model can not run an audio which is longer than 25s. Can I convert the weights from huggingface model to whisper-model and use CTranslate2 for my model to process on audio is longer than 25s.

Thank you.

13 replies

phineas-pta Apr 21, 2023

then convert huggingface -> faster whisper ctranslate2 and directly use faster whisper

why openai whisper as an useless intermediary step ?

huynhthanh98 May 15, 2023

Hi @phineas-pta ,

Sorry for the late reply. Since the huggingface-model (the model I have finetuned) can only predict on audio files shorter than 30s, I want to convert the weights of the huggingface-model to the whisper-model to take advantage of whisper's framework to handle audio files longer than 30s. Then i can convert from this whisper-model to faster whisper for faster processing.

phineas-pta May 16, 2023

i think you misunderstand something about the 30s limit

for training/fine-tuning programmers have to manually split audio to 30s chunks

but for inferencing, splitting is automatically done by the backend/implementation you're using (original openai, ctranslate2 faster-whisper, huggingface transformers, ggml whisper.cpp, etc.)

so, except if you want to write a completely new implementation (for e.g. nvidia tensorrt), you don't have to bother about the 30s limit when using it

huynhthanh98 May 16, 2023

Thank you for your reply, i understood this problem!

When i predict with a audio longer than 30s i get this warning:

WARNING:libav.mp3:Estimating duration from bitrate, this may be inaccurate

Does it have any troubles on my result?

phineas-pta May 16, 2023

how did you get that warning ? i never saw that in all 4 backends i mentioned

anyway i think it was trying to calculate audio length so not much a problem

sumitmitra255 · 2023-07-20T07:03:21Z

sumitmitra255
Jul 20, 2023

I am new to this stuffs... Please help me with this error. I donnot understand what this means?

1 reply

phineas-pta Jul 20, 2023

use faster-whisper not this

arvindmn01 · 2023-11-28T12:16:48Z

arvindmn01
Nov 28, 2023

I have finetuned the Whisper small model for specific use case, and I have also quantized it for faster transcription using Faster_whisper
This is how I am using it for Transcription.

model_name = "whisper-small-ct2"
#here device is cpu
model = WhisperModel(model_name, device=dev, compute_type="int8")

segments,info=model.transcribe(audio_buffer)
Text extraction from the segments returned by the transcribe function takes almost 8-9 seconds for a 10-second audio on CPU.

for segment in segments:
      printer=segment.text

Is there any way to enhance the transcription speed for a 10-second audio on CPU?
Any assistance from anyone would be appreciated.

4 replies

Purfview Nov 28, 2023

Did you quantize model to "int8_float32"?

arvindmn01 Nov 29, 2023

thanks @Purfview
I had quantized the model to "float16"
Now, I have quantized it to 'int8_float32,' and as a result, this code snippet now takes 4-5 seconds for a 10-second audio chunk

for segment in segments:
      printer=segment.text

arvindmn01 Nov 30, 2023

hi @Purfview
Data extraction from the segments is still inconsistent. Sometimes, it takes around 20-30 seconds for a 10-second audio chunk. I have observed that in some cases, if the chunk contains only one or two words, it takes more time than usual.
for this segment it took 4.40 seconds

[Segment(id=1, seek=1000, start=0.0, end=7.0, text=' Fagit xt', tokens=[50363, 376, 363, 270, 220, 742, 50713], temperature=0.0, avg_logprob=-0.844719298183918, compression_ratio=0.5, no_speech_prob=0.14199724793434143, words=None)]

and for this segment it took 30 seconds

[Segment(id=1, seek=1000, start=0.0, end=2.2600000000000002, text=' Fourier', tokens=[50363, 34296, 5277, 50476], temperature=1.0, avg_logprob=-3.514225387573242, compression_ratio=0.4666666666666667, no_speech_prob=0.39546436071395874, words=None)]

and same for this segment (time- 28 seconds)

[Segment(id=1, seek=1000, start=0.0, end=7.66, text=' enjoy', tokens=[50363, 2883, 50746], temperature=1.0, avg_logprob=-2.3765015304088593, compression_ratio=0.782608695652174, no_speech_prob=0.10629443824291229, words=None), Segment(id=2, seek=1000, start=7.66, end=8.8, text=' enzyme enjoy', tokens=[50746, 27679, 2883, 50803], temperature=1.0, avg_logprob=-2.3765015304088593, compression_ratio=0.782608695652174, no_speech_prob=0.10629443824291229, words=None)]

Is there any way to reduce this delay?

Purfview Nov 30, 2023

It's because of fallback, you can disable it with temperature=0.

cofroute · 2024-03-19T06:11:55Z

cofroute
Mar 19, 2024

can i use it to text2text

0 replies

Accelerate the Whisper decoding with CTranslate2 #937

Uh oh!

Uh oh!

Replies: 18 comments · 113 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guillaumekln Feb 14, 2023 Author

Uh oh!

Uh oh!

guillaumekln Feb 14, 2023 Author

Uh oh!

Uh oh!

Uh oh!

guillaumekln Feb 23, 2023 Author

Uh oh!

Uh oh!

guillaumekln Feb 11, 2023 Author

Uh oh!

Uh oh!

Uh oh!

guillaumekln Feb 12, 2023 Author

Uh oh!

guillaumekln Feb 16, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guillaumekln Feb 14, 2023 Author

Uh oh!

Uh oh!

guillaumekln Feb 27, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guillaumekln Feb 15, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Replies: 18 comments 113 replies

guillaumekln Feb 14, 2023
Author

guillaumekln Feb 14, 2023
Author

guillaumekln Feb 23, 2023
Author

guillaumekln Feb 11, 2023
Author

guillaumekln Feb 12, 2023
Author

guillaumekln Feb 16, 2023
Author

guillaumekln Feb 14, 2023
Author

guillaumekln Feb 27, 2023
Author

guillaumekln Feb 15, 2023
Author