-
-
Notifications
You must be signed in to change notification settings - Fork 9k
libobs/audio-monitoring: Add audio sync feature for audio-only playback #12974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
PatTheMav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance I don't see enough meaningful differences in the effective code to justify the existence of two separate functions for this, vs. enhancing the current function to check for an associated video object directly (and then use a different set of calculations for the timestamp difference).
Also what is the root cause for the delay without an associated video source?
Many factors could be causing the delay, but I suspect system lag under heavy workloads and audio driver latency are the primary reasons. Since this issue is difficult to trigger and track, I’ll need more time to investigate it further. |
Much appreciated. Apart from the code duplication I mentioned I can see how the added code would do the right thing, but I also try to understand why the original authors didn't think it was worth adding re-synchronisation to audio sources untethered to a video stream. So maybe they simply didn't consider the situation to be probable (or possible) so it'd be a simple oversight, or the situation is indeed "not supposed to happen" and the fact that it does is indicative of a larger issue. |
The current synchronization logic is fundamentally limited by the condition In scenarios like MSFS high workload, the audio is pushed into the WASAPI endpoint buffer immediately. Because our Manual SyncWhen adjusting sync manually, "waiting" logs appear first, which allows the High WorkloadIn MSFS, the frames are sent to WASAPI as soon as they arrive. The Solution?The current dropping mechanism cannot reach frames once they are in the WASAPI endpoint buffer. If the To fix this, we need a mechanism that handles synchronization when the bottleneck is the endpoint buffer itself:
Debugging CodeI am using the following to track the relationship between our local buffer and the WASAPI padding: static void debug_buffer_stats(struct audio_monitor *monitor, UINT32 wasapi_pad)
{
static uint64_t last_log = 0;
static uint32_t max_pad = 0;
static size_t max_delay_buf = 0;
uint64_t now = os_gettime_ns();
if (wasapi_pad > max_pad) max_pad = wasapi_pad;
if (monitor->delay_buffer.size > max_delay_buf)
max_delay_buf = monitor->delay_buffer.size;
if (now - last_log >= 600000000000ULL) {
blog(LOG_INFO,
"[AudioMon] delay_buf: %zu bytes (max: %zu), "
"WASAPI pad: %u (max: %u), rate: %u",
monitor->delay_buffer.size, max_delay_buf,
wasapi_pad, max_pad,
monitor->sample_rate);
last_log = now;
max_pad = 0;
max_delay_buf = 0;
}
}Since these solutions involve resetting the audio stream or changing timing, they might be considered breaking changes. How should we proceed with implementing a "pad-aware" sync? I'm looking forward to your opinion and thx. I think only through those two syncing can control the lagging to acceptable level. For now I make the maximum delay to 10ms(delay_buffer) + 10ms(WASAPI) and testing for more extreme case. |
41ea4ab to
7447a1b
Compare
|
@norihiro Would like to hear your opinion on this - you've looked into looking concepts from signal theory to fix similar stuff in the past, is there something we can do to capture this? Given that there is no way to handle this scenario, we got a "blank canvas" to come up with a solution so to speak, so if there is a smarter way to solve this (rather than the somewhat brutish way OBS handles this in other places), it'd be a great opportunity to do so. |
So the root cause of the problem seems to be that potentially the output device lags behind and is not able to play back audio at the rate that OBS creates new audio samples and thus the "padding" value in the endpoint buffer increases over time? And if the monitored source has no video component, OBS simply ignores this padding value entirely and pushes more and more data into the endpoint buffer. By your description it seems that we got lucky so far that the endpoint buffer always had enough capacity to accept additional samples despite the growing padding (otherwise OBS would have effectively "reset" the output client). When a video source is monitored (and audio is not detached), OBS would indeed calculate a theoretical absolute nanosecond timestamp for the "oldest" remaining audio in the endpoint buffer (effectively amount of buffers multiplied with the duration of a sample at the current sampling rate) and either "skip" rendering either because the oldest remaining sample is more than 75ms ahead of the last video frame timestamp, or because that same sample is more than 75ms behind the last video frame timestamp (and buffered audio data is waiting for playback). It's only when the remaining data in the endpoint buffer is within 75ms of the last video frame timestamp that OBS would simply push the audio data into it, but audio data is always buffered internally. At least from my naive POV I agree that we should always consider the padding in the endpoint buffer (not only when we have a video frame timestamp associated with the audio data) and thus detect if we produce audio quicker than the audio device seems to be able to consume it. Without an associated video source, its own frame times are obvious points with which audio data needs to be synchronised, but without those, the audio data itself becomes the sync points and the padding seems to be the best indication for how much "out of sync" the output becomes. |
Looking at the current code, the maximum endpoint buffer length is set to 10,000,000 × 100ns = 1000ms = 1s (for reference on why 100ns is used as the base unit, see the Microsoft documentation). This is an exceptionally long delay—so much so that I suspect it might be a mistake. In scenarios where I need real-time audio monitoring (such as in-ear monitoring), this accumulated latency becomes quite uncomfortable to work with. Moreover, since we cannot directly control the endpoint buffer, this also means the video sync-related delay could take up to 1 second before a forced reset is triggered. Based on my perceptual testing, I believe a maximum internal delay of 30ms (20ms for the endpoint buffer + 10ms for OBS's internal buffer) would be a much better choice. Additionally, my tests suggest that simply resetting the buffer when the WASAPI endpoint buffer is relatively small doesn't cause any noticeable perceptual impact, and I've made the syncing also working for audio only source if anyone need it(I think it is more appropriate behaviour) BTW I'am greatly appreciate it for anyone review this PR and give their opinion —including yourself. And I think this PR is good to go now. |
7447a1b to
79194d9
Compare
This commit resolves audio lag that occurs when using audio monitor for extended periods on sources without video.
79194d9 to
66f429f
Compare
|
Considering if there will be some case video source could't feed audio frame in 10ms I just add WASIAPI buffer to 50ms for some trade off. I'll continue test for related feature.Like investigate on video source feed latency, and exam if it is necessary to decoupled video and audio only source(They have really different latency) or makeing WASIAPI buffer size dynamic. |
|
I assume WASAPI and OBS have different clocks, hence, in my opinion, we need to resample the audio. For Decklink output, we had an experiment that dynamically adjust decklink's clock based on the offset between OBS and the hardware. Regarding the difficulty to trigger and track, I want to recommend to plot a buffer size and some other metrics on a graph. |
Yep, I just added the debug plots and outputs as we discussed earlier. In my opinion, your ideas are quite good (I hadn't noticed anyone else attempting these fixes before). I actually considered implementing a similar solution earlier, but decided against it due to the complexity. For now, I'm using a simpler approach. I will try to test the performance and approach of your PR on this particular scene. I'm glad to assist you if you need to adapt those for audio monitoring. |
That PR came to my mind as well, which is why I pinged you. Without having looked at it in too much detail, is there a possibility for OBS to do this work at a higher level than the platform-specific monitoring implementations? It would be great if we could implement this just once in OBS rather than 3 times (for Pulse, WASAPI and CoreAudio). |
pkviet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR. My first reaction to this PR was the same as @norihiro that the better fix would be to resample instead of just dropping samples in order to preserve the audio integrity. Also it feels kind of risky to allow just a 10 ms lag so about 512 samples at 48 kHz. I would really be hesitant to merge the PR without extensive tests across a variety of monitoring devices to ensure audio quality is still fine. The monitoring sub-system is sometimes abused to provide an extra output instead of just monitoring so I'd be extra careful. IMO, I'd really prefer a dynamic resampling as done by FFmpeg aresample filter for instance, which can do a dynamic compensation: https://ffmpeg.org/ffmpeg-filters.html#Examples-16
|
|
||
| #define ACTUALLY_DEFINE_GUID(name, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8) \ | ||
| EXTERN_C const GUID DECLSPEC_SELECTANY name = {l, w1, w2, {b1, b2, b3, b4, b5, b6, b7, b8}} | ||
| EXTERN_C const GUID DECLSPEC_SELECTANY name = {l, w1, w2, {b1, b2, b3, b4, b5, b6, b7, b8}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue ? this should not belong to this PR
|
|
||
| #define do_log(level, format, ...) \ | ||
| blog(level, "[audio monitoring: '%s'] " format, obs_source_get_name(monitor->source), ##__VA_ARGS__) | ||
| blog(level, "[audio monitoring: '%s'] " format, obs_source_get_name(monitor->source), ##__VA_ARGS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
| "diff: %lld, delay buffer size: %lu, " | ||
| "v: %llu: a: %llu", | ||
| diff, (int)monitor->delay_buffer.size, last_frame_ts, front_ts); | ||
| blog(LOG_INFO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
| "diff: %lld, delay buffer size: %lu, " | ||
| "v: %llu: a: %llu", | ||
| diff, (int)monitor->delay_buffer.size, last_frame_ts, front_ts); | ||
| blog(LOG_INFO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
|
|
||
| /* ------------------------------------------ * | ||
| * Init device */ | ||
| * Init device */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
|
|
||
| /* ------------------------------------------ * | ||
| * Init client */ | ||
| * Init client */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
|
|
||
| /* ------------------------------------------ * | ||
| * Init resampler */ | ||
| * Init resampler */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
|
|
||
| /* ------------------------------------------ * | ||
| * Init client */ | ||
| * Init client */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format issue
| } | ||
| } else { | ||
| diff = (int64_t)front_ts - (int64_t)cur_time; | ||
| if (diff > 10000000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you come to that value of 10 ms ? (our audio tick is 1024 samples so at 48 kHz it is 21.3 ms so i am assuming you picked the middle ?) did you try other values ? larger or smaller ?
| continue; | ||
| } | ||
| } else { | ||
| if (diff < -10000000 && monitor->delay_buffer.size > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where samples are dropped; how did you reach that value of 10 ms ? it might make drops more frequent and really affect audio quality to the benefit of better sync. I am not convinced that is a good trade-off (more on that later)
I am pretty sure this was intentional at the time the audio sub-system was implemented and not a mistake. @Lain-B |
This commit resolves audio lag that occurs when using audio monitor for extended periods on sources without video.
Description
Fix #12973
Motivation and Context
As issue
How Has This Been Tested?
AFK for a long time and test whether it will have lag.
Types of changes
Bug fix (non-breaking change which fixes an issue)
Checklist: