You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: dataset_configs/armenian/toloka/pipeline_get_final_res.yaml
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,7 @@ documentation: |
6
6
It processes all accepted results from the Toloka pool and prepares the data for training by refining and resampling audio files and ensuring text formatting consistency.
7
7
8
8
**Stage Overview**:
9
+
9
10
This stage includes the following steps:
10
11
1. Downloading all the ACCEPTED results from the Toloka platform.
Copy file name to clipboardExpand all lines: dataset_configs/armenian/toloka/pipeline_start.yaml
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,7 @@ documentation: |
6
6
It sets up the foundation for creating structured tasks by initializing a new Toloka project, preparing pools, and processing textual data to generate a clean and organized corpus.
7
7
8
8
**Stage Overview**:
9
+
9
10
This stage focuses on preparing and refining the dataset through the following steps:
Copy file name to clipboardExpand all lines: sdp/processors/huggingface/speech_recognition.py
+19-13Lines changed: 19 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -23,21 +23,27 @@
23
23
fromtypingimportOptional
24
24
25
25
classASRTransformers(BaseProcessor):
26
-
"""
27
-
Processor to transcribe using ASR Transformers model from HuggingFace.
26
+
"""This processor transcribes audio files using HuggingFace ASR Transformer models.
27
+
28
+
It processes audio files from the manifest and adds transcriptions using the specified
29
+
pre-trained model from HuggingFace.
28
30
29
31
Args:
30
-
pretrained_model (str): name of pretrained model on HuggingFace.
31
-
output_text_key (str): Key to save transcription result.
32
-
input_audio_key (str): Key to read audio file. Defaults to "audio_filepath".
33
-
input_duration_key (str): Audio duration key. Defaults to "duration".
34
-
device (str): Inference device.
35
-
batch_size (int): Inference batch size. Defaults to 1.
36
-
chunk_length_s (int): Length of the chunks (in seconds) into which the input audio should be divided.
37
-
Note: Some models perform the chunking on their own (for instance, Whisper chunks into 30s segments also by maintaining the context of the previous chunks).
38
-
torch_dtype (str): Tensor data type. Default to "float32"
39
-
max_new_tokens (Optional[int]): The maximum number of new tokens to generate.
40
-
If not specified, there is no hard limit on the number of tokens generated, other than model-specific constraints.
32
+
pretrained_model (str): Name of pretrained model on HuggingFace.
33
+
output_text_key (str): Key to save transcription result in the manifest.
34
+
input_audio_key (str): Key to read audio file paths from the manifest. Default: "audio_filepath".
35
+
input_duration_key (str): Key for audio duration in the manifest. Default: "duration".
0 commit comments