Skip to content

Commit 93cfc46

Browse files
authored
Refactor inference processes & add new engines (FasterWhisper, vLLM) (#141)
* Group inference processors Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * Add requirements Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * Remove outdated import Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * optional to py:class Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> * Estimate bandwith moved to data_to_data.py Signed-off-by: Sasha Meister <ameister@nvidia.com> * Update link in docs Signed-off-by: Sasha Meister <ameister@nvidia.com> * Missed level of folders added (nlp/nemo) Signed-off-by: Sasha Meister <ameister@nvidia.com> * Changes addressing the reviewer’s comments Signed-off-by: Sasha Meister <ameister@nvidia.com> * fix docs Signed-off-by: Sasha Meister <ameister@nvidia.com> * Fixed import of rttm processors Signed-off-by: Sasha Meister <ameister@nvidia.com> --------- Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
1 parent 9081c95 commit 93cfc46

File tree

20 files changed

+1126
-104
lines changed

20 files changed

+1126
-104
lines changed

dataset_configs/portuguese/unlabeled/config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,21 +75,21 @@ processors:
7575
output_manifest_file: ${manifest_dir}/vad
7676
input_manifest_arg: "manifest_filepath"
7777
output_manifest_arg: "output_dir"
78-
cmd: 'python sdp/processors/nemo/speech_to_text_with_vad.py audio_type=wav vad_model=vad_multilingual_frame_marblenet vad_config=sdp/processors/nemo/frame_vad_infer_postprocess.yaml'
78+
cmd: 'python sdp/processors/inference/asr/nemo/utils/speech_to_text_with_vad.py audio_type=wav vad_model=vad_multilingual_frame_marblenet vad_config=sdp/processors/inference/asr/nemo/utils/frame_vad_infer_postprocess.yaml'
7979

8080
- _target_: sdp.processors.RenameFields
8181
input_manifest_file: ${manifest_dir}/vad/temp_manifest_vad_rttm-onset0.3-offset0.3-pad_onset0.2-pad_offset0.2-min_duration_on0.2-min_duration_off0.2-filter_speech_firstTrue.json
8282
output_manifest_file: ${manifest_dir}/manifest7.json
8383
rename_fields: {"audio_filepath":"source_filepath"}
8484

85-
- _target_: sdp.processors.nemo.rttm.GetRttmSegments
85+
- _target_: sdp.processors.GetRttmSegments
8686
output_manifest_file: ${manifest_dir}/manifest8.json
8787
rttm_key: rttm_file
8888
output_file_key: audio_segments
8989
duration_key: duration
9090
duration_threshold: 20.0
9191

92-
- _target_: sdp.processors.nemo.rttm.SplitAudioFile
92+
- _target_: sdp.processors.SplitAudioFile
9393
output_manifest_file: ${manifest_dir}/manifest9.json
9494
splited_audio_dir: ${workspace_dir}/splited_wavs/
9595
segments_key: audio_segments

docs/src/sdp/api.rst

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,9 +184,6 @@ used in the downstream processing for additional enhancement or filtering.
184184
.. autodata:: sdp.processors.ASRTransformers
185185
:annotation:
186186

187-
.. autodata:: sdp.processors.EstimateBandwidth
188-
:annotation:
189-
190187
.. autodata:: sdp.processors.tts.pyannote.PyAnnoteDiarizationAndOverlapDetection
191188
:annotation:
192189

@@ -202,6 +199,15 @@ used in the downstream processing for additional enhancement or filtering.
202199
.. autodata:: sdp.processors.tts.metrics.BandwidthEstimationProcessor
203200
:annotation:
204201

202+
.. autodata:: sdp.processors.FasterWhisperInference
203+
:annotation:
204+
205+
.. autodata:: sdp.processors.vLLMInference
206+
:annotation:
207+
208+
.. autodata:: sdp.processors.AudioLid
209+
:annotation:
210+
205211
Text-only processors
206212
####################
207213

@@ -246,6 +252,9 @@ Data modifications
246252
.. autodata:: sdp.processors.ListToEntries
247253
:annotation:
248254

255+
.. autodata:: sdp.processors.EstimateBandwidth
256+
:annotation:
257+
249258
Data filtering
250259
''''''''''''''
251260

@@ -364,6 +373,18 @@ Data filtering
364373
.. autodata:: sdp.processors.RejectIfBanned
365374
:annotation:
366375

376+
.. autodata:: sdp.processors.DetectWhisperHallucinationFeatures
377+
:annotation:
378+
379+
.. autodata:: sdp.processors.CleanQwenGeneration
380+
:annotation:
381+
382+
.. autodata:: sdp.processors.GetRttmSegments
383+
:annotation:
384+
385+
.. autodata:: sdp.processors.SplitAudioFile
386+
:annotation:
387+
367388
Miscellaneous
368389
#############
369390

requirements/main.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,7 @@ datasets>=2.14.0,<3.0.0
2525
# for some processers, additionally https://github.com/NVIDIA/NeMo is required
2626
# for some processers, additionally nemo_text_processing is required
2727
# for mcv: apt-get update && apt-get upgrade -y && apt-get install -y sox libsox-fmt-all
28+
# for FasterWhisperInference processor is required:
29+
# pip install pytorch-lightning nvidia-cublas-cu12 nvidia-cudnn-cu12==9.* faster_whisper
30+
# export LD_LIBRARY_PATH=`python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`
31+
# for vLLMInference processor is required: pip install "optree>=0.13.0" vllm

sdp/processors/__init__.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,6 @@
8181
from sdp.processors.huggingface.create_initial_manifest import (
8282
CreateInitialManifestHuggingFace,
8383
)
84-
from sdp.processors.huggingface.speech_recognition import ASRTransformers
8584
from sdp.processors.modify_manifest.common import (
8685
AddConstantFields,
8786
ApplyInnerJoin,
@@ -119,6 +118,7 @@
119118
SubRegex,
120119
ListToEntries,
121120
LambdaExpression,
121+
EstimateBandwidth,
122122
)
123123
from sdp.processors.modify_manifest.data_to_dropbool import (
124124
DropASRError,
@@ -141,6 +141,16 @@
141141
from sdp.processors.modify_manifest.make_letters_uppercase_after_period import (
142142
MakeLettersUppercaseAfterPeriod,
143143
)
144+
from sdp.processors.inference.asr.nemo.asr_inference import ASRInference
145+
from sdp.processors.inference.asr.nemo.lid_inference import AudioLid
146+
from sdp.processors.inference.asr.faster_whisper.faster_whisper_inference import FasterWhisperInference
147+
from sdp.processors.inference.asr.transformers.speech_recognition import ASRTransformers
148+
from sdp.processors.inference.asr.utils.whisper_hallucinations import DetectWhisperHallucinationFeatures
149+
from sdp.processors.inference.asr.utils.rttm import GetRttmSegments, SplitAudioFile
150+
from sdp.processors.inference.nlp.nemo.pc_inference import PCInference
151+
from sdp.processors.inference.llm.vllm.vllm import vLLMInference
152+
from sdp.processors.inference.llm.utils.qwen_cleaning import CleanQwenGeneration
153+
144154
from sdp.processors.manage_files.convert_audio import (
145155
FfmpegConvert,
146156
SoxConvert,
@@ -151,10 +161,7 @@
151161
from sdp.processors.manage_files.remove import (
152162
RemoveFiles,
153163
)
154-
from sdp.processors.nemo.asr_inference import ASRInference
155-
from sdp.processors.nemo.estimate_bandwidth import EstimateBandwidth
156-
from sdp.processors.nemo.lid_inference import AudioLid
157-
from sdp.processors.nemo.pc_inference import PCInference
164+
158165
from sdp.processors.toloka.accept_if import AcceptIfWERLess
159166
from sdp.processors.toloka.create_pool import CreateTolokaPool
160167
from sdp.processors.toloka.create_project import CreateTolokaProject

0 commit comments

Comments
 (0)