Skip to content

Comments

fix(voice): advance RTP timestamp across silence gaps#1695

Open
Stieneee wants to merge 1 commit intobwmarrin:masterfrom
Stieneee:fix/voice-rtp-silence-gap
Open

fix(voice): advance RTP timestamp across silence gaps#1695
Stieneee wants to merge 1 commit intobwmarrin:masterfrom
Stieneee:fix/voice-rtp-silence-gap

Conversation

@Stieneee
Copy link

Summary

  • Advance the RTP timestamp across silence gaps so it stays aligned with wall-clock time
  • Set the RTP marker bit (RFC 3551) on the first packet after a gap to signal a new talk-spurt

Problem

opusSender blocks on the opus channel while waiting for audio data. During silence the ticker fires and drops ticks, but the RTP timestamp only advances when a packet is actually sent (by +960 per 20ms frame). After a silence gap the next packet carries a timestamp that is far behind wall-clock time, making the receiver believe it arrived extremely late.

Over many speaking/silence cycles the receiver's adaptive jitter buffer grows monotonically. In our voice bridging application we observed Mumble-to-Discord latency growing to 2+ seconds after approximately one hour of use with frequent speaking pauses. After this change, audio latency remained consistent with no observable growth over extended sessions.

Changes

When opusSender receives opus data after blocking for longer than 40ms (2x frame size):

  1. Compute how many samples worth of silence elapsed
  2. Round to frame boundary and advance timestamp by that amount
  3. Set the marker bit (0x80) on the RTP header to signal the start of a new talk-spurt
  4. Normal (non-gap) packets clear the marker bit

Test plan

  • Verified with Prometheus metrics that RTP timestamp drift stays near zero over multi-hour sessions
  • Confirmed speaking transitions and silence gap durations are tracked correctly
  • Tested with live Mumble-to-Discord bridge — no audible artifacts from timestamp jumps
  • No data races (go test -race)

…er buffer growth

The opusSender loop blocks on the opus channel waiting for data.
During silence, no data arrives and the RTP timestamp freezes because
it only increments when a packet is actually sent. When speech resumes,
the first packet carries a timestamp that implies it was generated
20ms after the last packet — but seconds or minutes may have passed.

Receivers (e.g. Discord) interpret this as an extremely late packet and
may grow their adaptive jitter buffer. Over many speaking/silence
cycles (typical in voice chat), this causes audio playout delay to
accumulate monotonically. In our voice bridging application we observed
Mumble-to-Discord latency growing to 2+ seconds after approximately
one hour of use with frequent speaking pauses. After this change, audio
latency remained consistent with no observable growth over extended
sessions.

Fix: measure how long we blocked waiting for opus data. If it exceeds
40ms (2× the normal 20ms send interval), treat it as a silence gap:

  1. Advance the RTP timestamp by the gap duration (rounded to frame
     boundary) so it stays aligned with wall-clock time.
  2. Set the RTP marker bit (RFC 3551) on the first post-silence
     packet to signal a new "talk-spurt", allowing the receiver to
     reset its jitter buffer timing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant