-
Notifications
You must be signed in to change notification settings - Fork 127
Open
Description
I want to implement an avatar in a streaming setup, where I receive audio in chunks every 0.5 to 0.8 seconds without knowing the total length in advance. Do you have any suggestions on how to implement this? I am referring to stream_pipeline_online.py.
SDK.setup(source_path, output_path, **setup_kwargs)
audio, sr = librosa.core.load(audio_path, sr=16000)
num_f = math.ceil(len(audio) / 16000 * 25)
fade_in = run_kwargs.get("fade_in", -1)
fade_out = run_kwargs.get("fade_out", -1)
ctrl_info = run_kwargs.get("ctrl_info", {})
SDK.setup_Nd(N_d=num_f, fade_in=fade_in, fade_out=fade_out, ctrl_info=ctrl_info)
online_mode = SDK.online_mode
if online_mode:
chunksize = run_kwargs.get("chunksize", (3, 5, 2))
audio = np.concatenate([np.zeros((chunksize[0] * 640,), dtype=np.float32), audio], 0)
split_len = int(sum(chunksize) * 0.04 * 16000) + 80 # 6480
for i in range(0, len(audio), chunksize[1] * 640):
audio_chunk = audio[i:i + split_len]
if len(audio_chunk) < split_len:
audio_chunk = np.pad(audio_chunk, (0, split_len - len(audio_chunk)), mode="constant")
SDK.run_chunk(audio_chunk, chunksize)
In the current code, sending audio in chunks requires knowing the total duration of the audio file. However, in my real use case, I will be receiving audio chunks one by one, without having information about the complete audio in advance. How can I implement avatar streaming in this scenario so that streaming happens in real time?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels