Skip to content

Commit ecb8f3c

Browse files
authored
examples : add stereo to mono conversion in read_audio_data (#3266)
This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone* [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: #3092
1 parent 2f60ebc commit ecb8f3c

File tree

1 file changed

+14
-7
lines changed

1 file changed

+14
-7
lines changed

examples/common-whisper.cpp

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -112,13 +112,20 @@ bool read_audio_data(const std::string & fname, std::vector<float>& pcmf32, std:
112112
}
113113

114114
if (stereo) {
115-
pcmf32s.resize(2);
116-
pcmf32s[0].resize(frame_count);
117-
pcmf32s[1].resize(frame_count);
118-
for (uint64_t i = 0; i < frame_count; i++) {
119-
pcmf32s[0][i] = pcmf32[2*i];
120-
pcmf32s[1][i] = pcmf32[2*i + 1];
121-
}
115+
std::vector<float> stereo_data = pcmf32;
116+
pcmf32.resize(frame_count);
117+
118+
for (uint64_t i = 0; i < frame_count; i++) {
119+
pcmf32[i] = (stereo_data[2*i] + stereo_data[2*i + 1]);
120+
}
121+
122+
pcmf32s.resize(2);
123+
pcmf32s[0].resize(frame_count);
124+
pcmf32s[1].resize(frame_count);
125+
for (uint64_t i = 0; i < frame_count; i++) {
126+
pcmf32s[0][i] = stereo_data[2*i];
127+
pcmf32s[1][i] = stereo_data[2*i + 1];
128+
}
122129
}
123130

124131
ma_decoder_uninit(&decoder);

0 commit comments

Comments
 (0)