Skip to content

Conversation

@youngSSS
Copy link

@youngSSS youngSSS commented Nov 4, 2025

Abstract

  • Detects large RTP sequence jumps (>1000) and timestamp jumps (>30s) to identify stream discontinuities
  • Automatically resets jitter buffer state when discontinuities occur, preventing invalid buffer states
  • Tracks previous timestamp (prevTS) to enable timestamp-based discontinuity detection
  • Logs warnings when discontinuities are detected for operational visibility

Problem

Jitter buffer could get stuck in invalid state after large gaps (e.g., stream source switches, network reconnections), causing choppy playback and incorrect packet handling.

Solution

Added automatic detection and reset:

  • Detection: Check sequence/timestamp jumps with wraparound handling
  • Reset: Clear buffer, reinitialize state, continue processing new packets
  • Logging: Warn on discontinuities with current/previous values

Testing

  • TestLargeSequenceJump: Verifies recovery after sequence jump
  • TestLargeTimestampJump: Verifies recovery after timestamp jump
  • TestSequenceWraparound: Ensures normal wraparound still works

Impact

  • ✅ Backward compatible (no API changes)
  • ✅ Minimal performance overhead
  • ✅ Improves stream resilience and recovery

@youngSSS youngSSS requested review from a team as code owners November 4, 2025 11:27
@CLAassistant
Copy link

CLAassistant commented Nov 4, 2025

CLA assistant check
All committers have signed the CLA.

This commit improves the resilience of the jitter buffer by automatically
detecting and handling large RTP discontinuities in both sequence numbers
and timestamps.

New features:
- Detect large sequence number jumps (>1000) and timestamp jumps (>30s)
- Automatically reset buffer state when discontinuities are detected
- Add warning logs for operational visibility when jumps occur
- Track previous timestamp to enable timestamp discontinuity detection

Bug fixes:
- Prevent buffer from staying in invalid state after large gaps
- Improve packet expiration handling after discontinuities to reduce
  premature drops and delayed emission

The reset mechanism clears all buffered packets and reinitializes the
buffer state, ensuring clean recovery from stream interruptions.

Tests:
- Add TestLargeSequenceJump to verify sequence discontinuity handling
- Add TestLargeTimestampJump to verify timestamp discontinuity handling
- Add TestSequenceWraparound to ensure wraparound boundaries work correctly
Copy link
Contributor

@alexlivekit alexlivekit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

jitter/buffer.go Outdated
}

func (b *Buffer) isLargeTimestampJump(current, prev uint32) bool {
const MAX_TIMESTAMP_JUMP = 48000 * 30 // 30 seconds at 48kHz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note: the same buffer is also used for video packets - for which we have clock rate of 90kHz so the comment isn't entirely accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle camera mute / no-update screenshare / etc.? IIRC it's entirely possible to go seconds without any RTP packet if the screenshare stays static.

b.logger.Infow("resetting jitter buffer due to RTP discontinuity")

for b.head != nil {
next := b.head.next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point packets are dropped - might be good to update PacketsDropped stats field and invoke packet loss callback which API consumer might use to hook up e.g sending packet loss indicator.

jitter/buffer.go Outdated
}

func (b *Buffer) isLargeSequenceJump(current, prev uint16) bool {
const MAX_SEQUENCE_JUMP = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High bitrate video key frames can span hundreds of packets - the specific value would make the buffer too sensitive even for relatively small time gaps.

@milos-lk
Copy link
Contributor

milos-lk commented Nov 4, 2025

Jitter buffer could get stuck in invalid state after large gaps (e.g., stream source switches, network reconnections), causing choppy playback and incorrect packet handling.

could you describe what you mean by invalid state? Packet loss is to be expected under unstable network conditions, based on the description provided I am note entirely sure what is the sequence of events and how exactly they lead to that invalid state - would appreciate few more details 🙏

Copy link
Contributor

@milos-lk milos-lk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded values are too rigid and could cause false positive state resets (esp. for video).
Would also like to get better understanding of the issue we try to fix.

@youngSSS
Copy link
Author

youngSSS commented Nov 5, 2025

Thank you for the thorough review! Based on your concerns about hardcoded thresholds causing false positives, especially for video streams, I've reconsidered the approach.

Problem Description

When clients experience intermittent network issues, CPU starvation, or SDK problems, packet transmission can be interrupted for several seconds. Upon recovery, sequence numbers and timestamps show large gaps. The jitter buffer, designed for normal packet reordering (withinRange() up to 3000 packets), continues waiting for packets that will never arrive, causing the following issues:

  • The existing withinRange() (3000 packets) is too permissive for detecting stream restarts, leaving the buffer in an indefinite "waiting" state
  • Audio/video freezes during the gap period
  • Valid packets accumulate in the buffer without being played
  • Recovery only occurs after latency timeout, resulting in prolonged silence/freezing

Logs

2025-11-02T01:31:01.383Z WARN large sequence number gap
{  
    "currSN": 17570,  
    "gapSN": 152,  
    "currTS": 1990786961,  
    "gapTS": 145920,  
    "timeSinceHighest": "3.040274159s"
}

Stats over 22 seconds:
{  
    "packetsExpected": 1122,  
    "packetsSeenPrimary": 245,  
    "packetsLost": 877,  
    "packetLostPercentage": 78.163994,  
    "gapHistogram": "[54:1, 101:2]"
}

Solution

Backward Compatibility Guarantee.

  • Added configuration functions similar to WithPacketLossHandler usage pattern
  • Without options: No new checks performed (maintains existing behavior)
  • With options: Explicitly enables discontinuity detection


if dropped > 0 {
b.stats.PacketsDropped += uint64(dropped)
if b.onPacketLoss != nil {
Copy link
Contributor

@milos-lk milos-lk Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again - if there was a huge gap in seen timestamps or sequence numbers it might be good to call onPacketLoss callback regardless of whether we actually had and dropped packets. It might be more robust to move it outside of if dropped > 0 block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants