-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
From googleapis/google-cloud-python#13405, the response to streaming_synthesize
is headerless LINEAR16 audio with a sample rate of 24000.
. The code sample below prints the size of the audio content but does not include the necessary header to actually play the audio.
python-docs-samples/texttospeech/snippets/streaming_tts_quickstart.py
Lines 46 to 48 in 5e8e178
streaming_responses = client.streaming_synthesize(itertools.chain([config_request], request_generator())) | |
for response in streaming_responses: | |
print(f"Audio content size in bytes is: {len(response.audio_content)}") |
This may not be the purpose of the code sample, however having this extra information in the code sample will help with debugging customer issues such as googleapis/google-cloud-python#13405.
I added code which includes the raw audio header, however there is likely an easier way to achieve this. We should provide guidance on how folks should create the audio header.
# This is a raw header based on the spec at https://docs.fileformat.com/audio/wav/
header = b'RIFF\x00\x00\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\xc0]\x00\x00\x80\xbb\x00\x00\x02\x00\x10\x00data\x00\x00\x00\x00'
total_length = 0
with open(f"output.wav", "wb") as out:
out.write(header)
for response in streaming_responses:
# calculate the length of the content
total_length += len(response.audio_content)
out.write(response.audio_content)
# Position 40 - 43: Size of the data section
out.seek(40)
out.write(bytes([total_length & 0xFF, (total_length >> 8) & 0xFF, (total_length >> 16) & 0xFF, (total_length >> 24) & 0xFF]))
import os
file_size = os.path.getsize("output.wav")
with open(f"output.wav", "r+b") as out:
# Position 4-7: Size of the overall file - 8 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
out.seek(4)
out.write(bytes([file_size & 0xFF, (file_size >> 8) & 0xFF, (file_size >> 16) & 0xFF, (total_length >> 24) & 0xFF]))