Skip to content

Commit 9fe28e7

Browse files
committed
Merge branch 'master' into allow-for-NextSegment-to-be-called-when-Process-is-re-called
2 parents 3fc1c2b + 19ceec8 commit 9fe28e7

File tree

329 files changed

+42148
-21367
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

329 files changed

+42148
-21367
lines changed

README.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -362,6 +362,7 @@ First, check if your Ascend NPU device is supported:
362362
| Ascend NPU | Status |
363363
|:-----------------------------:|:-------:|
364364
| Atlas 300T A2 | Support |
365+
| Atlas 300I Duo | Support |
365366

366367
Then, make sure you have installed [`CANN toolkit`](https://www.hiascend.com/en/software/cann/community) . The lasted version of CANN is recommanded.
367368

@@ -755,23 +756,23 @@ written in Python that is fast and accurate.
755756

756757
Models can be downloaded by running the following command on Linux or MacOS:
757758
```console
758-
$ ./models/download-vad-model.sh silero-v5.1.2
759-
Downloading ggml model silero-v5.1.2 from 'https://huggingface.co/ggml-org/whisper-vad' ...
760-
ggml-silero-v5.1.2.bin 100%[==============================================>] 864.35K --.-KB/s in 0.04s
761-
Done! Model 'silero-v5.1.2' saved in '/path/models/ggml-silero-v5.1.2.bin'
759+
$ ./models/download-vad-model.sh silero-v6.2.0
760+
Downloading ggml model silero-v6.2.0 from 'https://huggingface.co/ggml-org/whisper-vad' ...
761+
ggml-silero-v6.2.0.bin 100%[==============================================>] 864.35K --.-KB/s in 0.04s
762+
Done! Model 'silero-v6.2.0' saved in '/path/models/ggml-silero-v6.2.0.bin'
762763
You can now use it like this:
763764

764-
$ ./build/bin/whisper-cli -vm /path/models/ggml-silero-v5.1.2.bin --vad -f samples/jfk.wav -m models/ggml-base.en.bin
765+
$ ./build/bin/whisper-cli -vm /path/models/ggml-silero-v6.2.0.bin --vad -f samples/jfk.wav -m models/ggml-base.en.bin
765766

766767
```
767768
And the following command on Windows:
768769
```console
769-
> .\models\download-vad-model.cmd silero-v5.1.2
770-
Downloading vad model silero-v5.1.2...
771-
Done! Model silero-v5.1.2 saved in C:\Users\danie\work\ai\whisper.cpp\ggml-silero-v5.1.2.bin
770+
> .\models\download-vad-model.cmd silero-v6.2.0
771+
Downloading vad model silero-v6.2.0...
772+
Done! Model silero-v6.2.0 saved in C:\Users\danie\work\ai\whisper.cpp\ggml-silero-v6.2.0.bin
772773
You can now use it like this:
773774

774-
C:\path\build\bin\Release\whisper-cli.exe -vm C:\path\ggml-silero-v5.1.2.bin --vad -m models/ggml-base.en.bin -f samples\jfk.wav
775+
C:\path\build\bin\Release\whisper-cli.exe -vm C:\path\ggml-silero-v6.2.0.bin --vad -m models/ggml-base.en.bin -f samples\jfk.wav
775776

776777
```
777778

@@ -783,15 +784,15 @@ This model can be also be converted manually to ggml using the following command
783784
$ python3 -m venv venv && source venv/bin/activate
784785
$ (venv) pip install silero-vad
785786
$ (venv) $ python models/convert-silero-vad-to-ggml.py --output models/silero.bin
786-
Saving GGML Silero-VAD model to models/silero-v5.1.2-ggml.bin
787+
Saving GGML Silero-VAD model to models/silero-v6.2.0-ggml.bin
787788
```
788789
And it can then be used with whisper as follows:
789790
```console
790791
$ ./build/bin/whisper-cli \
791792
--file ./samples/jfk.wav \
792793
--model ./models/ggml-base.en.bin \
793794
--vad \
794-
--vad-model ./models/silero-v5.1.2-ggml.bin
795+
--vad-model ./models/silero-v6.2.0-ggml.bin
795796
```
796797

797798
### VAD Options

bindings/ruby/README.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -134,20 +134,20 @@ Support for Voice Activity Detection (VAD) can be enabled by setting `Whisper::P
134134
```ruby
135135
Whisper::Params.new(
136136
vad: true,
137-
vad_model_path: "silero-v5.1.2",
137+
vad_model_path: "silero-v6.2.0",
138138
# other arguments...
139139
)
140140
```
141141

142-
When you pass the model name (`"silero-v5.1.2"`) or URI (`https://huggingface.co/ggml-org/whisper-vad/resolve/main/ggml-silero-v5.1.2.bin`), it will be downloaded automatically.
143-
Currently, "silero-v5.1.2" is registered as pre-converted model like ASR models. You also specify file path or URI of model.
142+
When you pass the model name (`"silero-v6.2.0"`) or URI (`https://huggingface.co/ggml-org/whisper-vad/resolve/main/ggml-silero-v6.2.0.bin`), it will be downloaded automatically.
143+
Currently, "silero-v6.2.0" is registered as pre-converted model like ASR models. You also specify file path or URI of model.
144144

145145
If you need configure VAD behavior, pass params for that:
146146

147147
```ruby
148148
Whisper::Params.new(
149149
vad: true,
150-
vad_model_path: "silero-v5.1.2",
150+
vad_model_path: "silero-v6.2.0",
151151
vad_params: Whisper::VAD::Params.new(
152152
threshold: 1.0, # defaults to 0.5
153153
min_speech_duration_ms: 500, # defaults to 250
@@ -324,6 +324,22 @@ whisper
324324
325325
The second argument `samples` may be an array, an object with `length` and `each` method, or a MemoryView. If you can prepare audio data as C array and export it as a MemoryView, whispercpp accepts and works with it with zero copy.
326326
327+
Using VAD separately from ASR
328+
-----------------------------
329+
330+
VAD feature itself is useful. You can use it separately from ASR:
331+
332+
```ruby
333+
vad = Whisper::VAD::Context.new("silero-v6.2.0")
334+
vad
335+
.detect("path/to/audio.wav", Whisper::VAD::Params.new)
336+
.each_with_index do |segment, index|
337+
segment => {start_time: st, end_time: ed} # `Segment` responds to `#deconstruct_keys`
338+
339+
puts "[%{nth}: %{st} --> %{ed}]" % {nth: index + 1, st:, ed:}
340+
end
341+
```
342+
327343
Development
328344
-----------
329345

bindings/ruby/ext/ruby_whisper.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@ VALUE mWhisper;
66
VALUE mVAD;
77
VALUE cContext;
88
VALUE cParams;
9+
VALUE cVADContext;
910
VALUE cVADParams;
11+
VALUE cVADSegments;
12+
VALUE cVADSegment;
1013
VALUE eError;
1114

1215
VALUE cSegment;
@@ -37,6 +40,9 @@ extern void init_ruby_whisper_error(VALUE *mWhisper);
3740
extern void init_ruby_whisper_segment(VALUE *mWhisper, VALUE *cSegment);
3841
extern void init_ruby_whisper_model(VALUE *mWhisper);
3942
extern void init_ruby_whisper_vad_params(VALUE *mVAD);
43+
extern void init_ruby_whisper_vad_context(VALUE *mVAD);
44+
extern void init_ruby_whisper_vad_segment(VALUE *mVAD);
45+
extern void init_ruby_whisper_vad_segments(VALUE *mVAD);
4046
extern void register_callbacks(ruby_whisper_params *rwp, VALUE *context);
4147

4248
/*
@@ -170,6 +176,9 @@ void Init_whisper() {
170176
init_ruby_whisper_segment(&mWhisper, &cContext);
171177
init_ruby_whisper_model(&mWhisper);
172178
init_ruby_whisper_vad_params(&mVAD);
179+
init_ruby_whisper_vad_segment(&mVAD);
180+
init_ruby_whisper_vad_segments(&mVAD);
181+
init_ruby_whisper_vad_context(&mVAD);
173182

174183
rb_require("whisper/context");
175184
rb_require("whisper/segment");

bindings/ruby/ext/ruby_whisper.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,17 @@ typedef struct {
3737
VALUE context;
3838
} ruby_whisper_model;
3939

40+
typedef struct {
41+
struct whisper_vad_segments *segments;
42+
} ruby_whisper_vad_segments;
43+
44+
typedef struct {
45+
VALUE segments;
46+
int index;
47+
} ruby_whisper_vad_segment;
48+
49+
typedef struct {
50+
struct whisper_vad_context *context;
51+
} ruby_whisper_vad_context;
52+
4053
#endif

bindings/ruby/ext/ruby_whisper_segment.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ ruby_whisper_segment_memsize(const void *p)
2929
if (!rws) {
3030
return 0;
3131
}
32+
if (rws->index) {
33+
size += sizeof(rws->index);
34+
}
3235
return size;
3336
}
3437

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
#include <ruby.h>
2+
#include "ruby_whisper.h"
3+
4+
extern ID id_to_s;
5+
6+
extern VALUE cVADContext;
7+
8+
extern VALUE ruby_whisper_vad_detect(VALUE self, VALUE file_path, VALUE params);
9+
extern VALUE ruby_whisper_normalize_model_path(VALUE model_path);
10+
11+
static size_t
12+
ruby_whisper_vad_context_memsize(const void *p)
13+
{
14+
const ruby_whisper_vad_context *rwvc = p;
15+
size_t size = sizeof(rwvc);
16+
if (!rwvc) {
17+
return 0;
18+
}
19+
if (rwvc->context) {
20+
size += sizeof(rwvc->context);
21+
}
22+
return size;
23+
}
24+
25+
static void
26+
ruby_whisper_vad_context_free(void *p)
27+
{
28+
ruby_whisper_vad_context *rwvc = (ruby_whisper_vad_context *)p;
29+
if (rwvc->context) {
30+
whisper_vad_free(rwvc->context);
31+
rwvc->context = NULL;
32+
}
33+
xfree(rwvc);
34+
}
35+
36+
const rb_data_type_t ruby_whisper_vad_context_type = {
37+
"ruby_whisper_vad_context",
38+
{0, ruby_whisper_vad_context_free, ruby_whisper_vad_context_memsize,},
39+
0, 0,
40+
0
41+
};
42+
43+
static VALUE
44+
ruby_whisper_vad_context_s_allocate(VALUE klass)
45+
{
46+
ruby_whisper_vad_context *rwvc;
47+
VALUE obj = TypedData_Make_Struct(klass, ruby_whisper_vad_context, &ruby_whisper_vad_context_type, rwvc);
48+
rwvc->context = NULL;
49+
return obj;
50+
}
51+
52+
static VALUE
53+
ruby_whisper_vad_context_initialize(VALUE self, VALUE model_path)
54+
{
55+
ruby_whisper_vad_context *rwvc;
56+
struct whisper_vad_context *context;
57+
58+
model_path = ruby_whisper_normalize_model_path(model_path);
59+
context = whisper_vad_init_from_file_with_params(StringValueCStr(model_path), whisper_vad_default_context_params());
60+
if (context == NULL) {
61+
rb_raise(rb_eRuntimeError, "Failed to initialize whisper VAD context");
62+
}
63+
TypedData_Get_Struct(self, ruby_whisper_vad_context, &ruby_whisper_vad_context_type, rwvc);
64+
rwvc->context = context;
65+
66+
return Qnil;
67+
}
68+
69+
void init_ruby_whisper_vad_context(VALUE *mVAD)
70+
{
71+
cVADContext = rb_define_class_under(*mVAD, "Context", rb_cObject);
72+
rb_define_alloc_func(cVADContext, ruby_whisper_vad_context_s_allocate);
73+
rb_define_method(cVADContext, "initialize", ruby_whisper_vad_context_initialize, 1);
74+
rb_define_method(cVADContext, "detect", ruby_whisper_vad_detect, 2);
75+
}
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
#include <ruby.h>
2+
#include "ruby_whisper.h"
3+
#include "common-whisper.h"
4+
#include <string>
5+
#include <vector>
6+
7+
#ifdef __cplusplus
8+
extern "C" {
9+
#endif
10+
11+
extern VALUE cVADSegments;
12+
13+
extern const rb_data_type_t ruby_whisper_vad_context_type;
14+
extern const rb_data_type_t ruby_whisper_vad_params_type;
15+
extern const rb_data_type_t ruby_whisper_vad_segments_type;
16+
17+
extern VALUE ruby_whisper_vad_segments_s_init(struct whisper_vad_segments *segments);
18+
19+
VALUE
20+
ruby_whisper_vad_detect(VALUE self, VALUE file_path, VALUE params) {
21+
ruby_whisper_vad_context *rwvc;
22+
ruby_whisper_vad_params *rwvp;
23+
std::string cpp_file_path;
24+
std::vector<float> pcmf32;
25+
std::vector<std::vector<float>> pcmf32s;
26+
whisper_vad_segments *segments;
27+
28+
TypedData_Get_Struct(self, ruby_whisper_vad_context, &ruby_whisper_vad_context_type, rwvc);
29+
if (rwvc->context == NULL) {
30+
rb_raise(rb_eRuntimeError, "Doesn't have referenxe to context internally");
31+
}
32+
TypedData_Get_Struct(params, ruby_whisper_vad_params, &ruby_whisper_vad_params_type, rwvp);
33+
34+
cpp_file_path = StringValueCStr(file_path);
35+
36+
if (!read_audio_data(cpp_file_path, pcmf32, pcmf32s, false)) {
37+
rb_raise(rb_eRuntimeError, "Failed to open '%s' as WAV file\n", cpp_file_path.c_str());
38+
}
39+
40+
segments = whisper_vad_segments_from_samples(rwvc->context, rwvp->params, pcmf32.data(), pcmf32.size());
41+
if (segments == nullptr) {
42+
rb_raise(rb_eRuntimeError, "Failed to process audio\n");
43+
}
44+
45+
return ruby_whisper_vad_segments_s_init(segments);
46+
}
47+
48+
#ifdef __cplusplus
49+
}
50+
#endif

0 commit comments

Comments
 (0)