Skip to content

Conversation

@tbarbugli
Copy link
Member

@tbarbugli tbarbugli commented Oct 27, 2025

Adds several methods to PcmData util to simplify PCM management

  1. Added copy(), clear(), and modified append() for in-place operations
  2. Added AudioFormat enum (S16, F32) with validation for audio format consistency
  3. Implemented audio manipulation methods: chunks(), sliding_window(), head(), and tail()
  4. Enhanced resampling to support float32 format
  5. Added telemetry span attributes for SDP in peer connection handling
  6. Modified test fixtures to be opt-in rather than auto-used

Summary by CodeRabbit

  • New Features

    • Added AudioFormat (S16, F32) and expanded PcmData with constructors and utilities: from_bytes/from_data/from_response, chunks(), sliding_window(), head(), tail(), copy(), clear(), enhanced format/resampling, metadata handling, and AudioTrackHandler integration.
  • Tests

    • Added extensive unit tests for PcmData: copy/append/clear semantics, chunking/overlap/padding, sliding windows, head/tail, stereo handling, resampling and dtype validation.
  • Refactor

    • Enhanced telemetry/tracing around RTC negotiation flows to capture richer trace attributes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the PcmData class from a NamedTuple to a regular class, adds new audio processing methods, and introduces the AudioFormat enum for better type safety. Key changes include:

  • Converted PcmData from NamedTuple to a mutable class with copy(), clear(), and modified append() for in-place operations
  • Added AudioFormat enum (S16, F32) with validation for audio format consistency
  • Implemented audio manipulation methods: chunks(), sliding_window(), head(), and tail()
  • Enhanced resampling to support float32 format
  • Added telemetry span attributes for SDP in peer connection handling
  • Modified test fixtures to be opt-in rather than auto-used

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
getstream/video/rtc/track_util.py Core refactoring: converted PcmData to class, added AudioFormat enum, implemented new methods (copy, clear, chunks, sliding_window, head, tail), enhanced resample for f32 support
tests/rtc/test_track_util.py Added comprehensive tests for new PcmData methods; changed monkeypatch fixture to opt-in; updated format parameter usage
tests/rtc/test_pcm_data.py Added tests for copy(), clear(), and append() behavior with the new mutable class design
getstream/video/rtc/peer_connection.py Added telemetry span attributes to track SDP in offer creation and local description setting
getstream/video/rtc/pc.py Wrapped setRemoteDescription with telemetry span including SDP attribute
getstream/video/rtc/connection_manager.py Added span variable binding for telemetry context (no functional change)
Comments suppressed due to low confidence (3)

getstream/video/rtc/track_util.py:259

  • Parameter order is inconsistent with the new constructor signature. The constructor expects (sample_rate, format, samples=None, ...), but this call uses samples= first. While functional with keyword arguments, this inconsistency could cause confusion. Recommend reordering to match the constructor signature: cls(sample_rate=sample_rate, format=format, samples=np.array([], dtype=dtype), channels=channels).
                return cls(
                    samples=np.array([], dtype=dtype),
                    sample_rate=sample_rate,
                    format=format,
                    channels=channels,
                )

getstream/video/rtc/track_util.py:287

  • Parameter order is inconsistent with the new constructor signature. The constructor expects (sample_rate, format, samples=None, ...), but this call uses samples= first. While functional with keyword arguments, this inconsistency could cause confusion. Recommend reordering to match the constructor signature: cls(sample_rate=sample_rate, format=format, samples=arr, channels=channels).
        return cls(
            samples=arr, sample_rate=sample_rate, format=format, channels=channels
        )

tests/rtc/test_pcm_data.py:356

  • Parameter order is inconsistent with the new constructor signature. The constructor expects (sample_rate, format, samples=None, ...), but this call uses samples= first. While functional with keyword arguments, this inconsistency could cause confusion. Recommend reordering to match the constructor signature throughout the test file: PcmData(sample_rate=48000, format=\"s16\", samples=other, channels=2).
    pcm_other = PcmData(samples=other, sample_rate=48000, format="s16", channels=2)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds telemetry spans around multiple RTC negotiation points and replaces the PcmData NamedTuple with a full PcmData class plus an AudioFormat enum and many audio helpers; extensive unit tests for the new PcmData behavior were added.

Changes

Cohort / File(s) Summary
Telemetry instrumentation
getstream/video/rtc/connection_manager.py, getstream/video/rtc/pc.py, getstream/video/rtc/peer_connection.py
Adds tracing spans around negotiation steps (createOffer, setLocalDescription, setRemoteDescription, publisher answer handling). Spans are captured in variables and SDP strings are recorded as span attributes. No public API signature changes.
PcmData refactor & AudioFormat
getstream/video/rtc/track_util.py
Introduces AudioFormat enum and AudioFormatType; replaces PcmData NamedTuple with a full PcmData class and many methods (__init__, from_bytes, from_data, from_response, chunks, sliding_window, head, tail, copy, clear, _normalize_audio_format, resample/format propagation, multi-channel/timestamp handling). Expands public API surface and internal normalization/resampling logic.
Unit tests for PcmData
tests/rtc/test_pcm_data.py
Adds comprehensive tests covering copy/append semantics, clear(), dtype/resample expectations, constructor defaults, clipping behavior, metadata preservation, and resampling edge cases.
Track util tests & mocks updated
tests/rtc/test_track_util.py
Exposes AudioFormat to tests, updates mock frame helpers to accept AudioFormat, and adds tests for chunking/overlap/padding, sliding windows, head/tail, stereo handling, resampling no-op vs required work, and related edge cases.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant PeerConn as PeerConnection
  participant Tracer as Telemetry
  participant RTC as RTCPeer

  Note over Caller,PeerConn: Publish offer flow with tracing
  Caller->>PeerConn: add_tracks(...)
  PeerConn->>Tracer: start span "publishOffer"
  PeerConn->>RTC: createOffer()
  RTC-->>PeerConn: offer
  PeerConn->>Tracer: record attribute sdp
  PeerConn->>Tracer: start span "publishOffer.set_local_description"
  PeerConn->>RTC: setLocalDescription(offer)
  RTC-->>PeerConn: ack
  Tracer-->>PeerConn: end spans
  PeerConn-->>Caller: offer applied
Loading
sequenceDiagram
  autonumber
  participant Caller
  participant ConnMgr as ConnectionManager
  participant Tracer as Telemetry
  participant RTCsub as SubscriberRTC
  participant RTCpub as PublisherRTC

  Note over Caller,ConnMgr: Subscriber offer and publisher answer handling
  Caller->>ConnMgr: on_subscriber_offer(sdp)
  ConnMgr->>Tracer: start span "subscriber.set_local_description"
  ConnMgr->>RTCsub: setLocalDescription(sdp)
  RTCsub-->>ConnMgr: ack
  Tracer-->>ConnMgr: end span
  ConnMgr->>Tracer: start span "publisher.pc.handle_answer"
  ConnMgr->>RTCpub: setRemoteDescription(answer)
  RTCpub-->>ConnMgr: ack
  Tracer-->>ConnMgr: end span
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Files/areas needing extra attention:
    • getstream/video/rtc/track_util.py: dtype ↔ AudioFormat mapping, resampling correctness, channel layout handling, chunk/window timestamp math, head/tail padding semantics, and the expanded public API surface.
    • Tests: ensure new tests accurately reflect conversion/clipping expectations and multi-channel shapes.
    • Telemetry: verify spans are properly bound, attributes recorded correctly, and that tracing does not alter async behavior or leak SDP.

Poem

🐰 I nibble s16 and f32 with delight,
I hop through chunks and pad by moonlight.
Spans mark each leap, offers and replies,
Samples stitched and clipped, neat and wise.
A rabbit cheers — new audio takes flight!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title "Audio utils" is extremely vague and generic. While it relates tangentially to the changeset (which involves audio-related utilities), it fails to convey meaningful information about the specific changes made. A teammate scanning the commit history would not understand that this PR introduces a new AudioFormat enum, adds multiple PcmData manipulation methods (copy, clear, chunks, sliding_window, head, tail), enhances resampling capabilities, and modifies test fixtures. The title is equivalent to "misc updates" in its lack of specificity, making it difficult to discern the actual scope and nature of the changes without reviewing the PR description. Consider using a more descriptive title that captures the main contribution of the PR. For example: "Add AudioFormat enum and PcmData utility methods," "Enhance PcmData with formatting and audio manipulation methods," or "Add PcmData utilities (copy, clear, chunks, sliding_window, head, tail) with AudioFormat enum." This would help reviewers and future developers quickly understand the primary changes without reading the full description.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 91.75% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch audio-util

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2738f and 48363c8.

📒 Files selected for processing (1)
  • getstream/video/rtc/connection_manager.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • getstream/video/rtc/connection_manager.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Tests (3.13)
  • GitHub Check: Tests (3.12)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tests/rtc/test_track_util.py (1)

355-357: Consider consistent parameter ordering for clarity.

While keyword arguments make the order flexible, inconsistent ordering with the constructor signature could cause confusion. The constructor expects (sample_rate, format, samples=None, ...), but this call and others throughout the file vary in their parameter order.

For consistency, consider ordering parameters to match the constructor signature:

         pcm = PcmData(
-            sample_rate=16000, format=AudioFormat.S16, samples=samples, channels=1
+            sample_rate=16000, format=AudioFormat.S16, samples=samples, channels=1
         )

Note: Line 386 and other locations use samples= first, which differs from this ordering.

getstream/video/rtc/track_util.py (1)

102-132: Fix default dtype to respect the format parameter.

The default samples array always uses np.int16 regardless of the format parameter. This creates a dtype mismatch when format="f32".

Apply this diff to fix the default dtype:

     def __init__(
         self,
         sample_rate: int,
         format: AudioFormatType,
         samples: NDArray = None,
         pts: Optional[int] = None,
         dts: Optional[int] = None,
         time_base: Optional[float] = None,
         channels: int = 1,
     ):
         """
         Initialize PcmData.
 
         Args:
             sample_rate: The sample rate of the audio data
             format: The format of the audio data (use AudioFormat.S16 or AudioFormat.F32)
             samples: The audio samples as a numpy array (default: empty int16 array)
             pts: The presentation timestamp of the audio data
             dts: The decode timestamp of the audio data
             time_base: The time base for converting timestamps to seconds
             channels: Number of audio channels (1=mono, 2=stereo)
         """
         self.format: AudioFormatType = format
         self.sample_rate: int = sample_rate
-        self.samples: NDArray = (
-            np.array([], dtype=np.int16) if samples is None else samples
-        )
+        if samples is None:
+            dtype = np.float32 if format in ("f32", "float32", AudioFormat.F32) else np.int16
+            self.samples = np.array([], dtype=dtype)
+        else:
+            self.samples = samples
         self.pts: Optional[int] = pts
         self.dts: Optional[int] = dts
         self.time_base: Optional[float] = time_base
         self.channels: int = channels
🧹 Nitpick comments (1)
getstream/video/rtc/track_util.py (1)

1079-1081: Verify timestamp calculation precision.

The chunk timestamp calculation divides by sample_rate and time_base:

chunk_pts = self.pts + int(i / self.sample_rate / self.time_base)

This could lose precision if time_base is very small. Consider reordering the operations:

-                    chunk_pts = self.pts + int(i / self.sample_rate / self.time_base)
+                    chunk_pts = self.pts + int(i / (self.sample_rate * self.time_base))

This groups the divisors together, potentially improving numerical stability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eeadab3 and 2151a2a.

📒 Files selected for processing (2)
  • getstream/video/rtc/track_util.py (19 hunks)
  • tests/rtc/test_track_util.py (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/rtc/test_track_util.py (1)
getstream/video/rtc/track_util.py (9)
  • PcmData (88-1375)
  • AudioFormat (30-80)
  • chunks (1013-1133)
  • sliding_window (1135-1159)
  • resample (364-528)
  • to_float32 (613-676)
  • tail (1161-1267)
  • duration (139-195)
  • head (1269-1375)
🔇 Additional comments (13)
tests/rtc/test_track_util.py (4)

7-11: LGTM!

The imports correctly include the new AudioFormat enum.


75-106: LGTM!

The fixture is now opt-in as intended. The comments clearly explain why some tests need the real PyAV AudioFrame.


347-499: LGTM!

Comprehensive test coverage for the new chunking and sliding window functionality, including edge cases like padding and stereo audio.


501-948: LGTM!

Excellent test coverage for no-op optimizations and head/tail methods. The real-world use case test at line 899 validates the intended behavior against the original implementation.

getstream/video/rtc/track_util.py (9)

30-81: LGTM!

The AudioFormat enum is well-designed with string inheritance for backward compatibility and a helpful validate() method.


364-528: LGTM!

The resample method correctly handles both S16 and F32 formats, detecting input format from dtype and maintaining format consistency in the output.


613-676: LGTM!

The to_float32() method correctly implements the no-op optimization when already in F32 format and properly converts S16 to float32.


678-826: LGTM!

The append() method correctly implements in-place modification and returns self for chaining, as documented. Edge cases are well handled.


827-853: LGTM!

The copy() method correctly creates a deep copy, using ndarray.copy() for numpy arrays and copy.deepcopy() for other types.


855-880: LGTM!

The clear() method correctly determines the appropriate dtype based on format and follows Python collection conventions by returning None implicitly.


1135-1159: LGTM!

The sliding_window() method correctly converts millisecond parameters to sample counts and delegates to chunks().


1161-1375: LGTM!

The head() and tail() methods are well-implemented with proper parameter validation, support for both mono and multi-channel audio, and flexible padding options.


1683-1719: LGTM!

The _normalize_audio_format() helper correctly validates format and performs both sample rate and format conversions as needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
getstream/video/rtc/track_util.py (4)

102-141: Consider adding format validation in the constructor.

The factory methods (from_bytes, from_data, from_response) validate the format parameter using AudioFormat.validate(), but the constructor doesn't. This inconsistency allows invalid formats to be created via direct instantiation.

Apply this diff to add validation:

     def __init__(
         self,
         sample_rate: int,
         format: AudioFormatType,
         samples: NDArray = None,
         pts: Optional[int] = None,
         dts: Optional[int] = None,
         time_base: Optional[float] = None,
         channels: int = 1,
     ):
         """
         Initialize PcmData.
 
         Args:
             sample_rate: The sample rate of the audio data
             format: The format of the audio data (use AudioFormat.S16 or AudioFormat.F32)
             samples: The audio samples as a numpy array (default: empty array with dtype matching format)
             pts: The presentation timestamp of the audio data
             dts: The decode timestamp of the audio data
             time_base: The time base for converting timestamps to seconds
             channels: Number of audio channels (1=mono, 2=stereo)
         """
+        # Validate format
+        AudioFormat.validate(format)
         self.format: AudioFormatType = format
         self.sample_rate: int = sample_rate

397-405: Consider adding a comment to clarify format detection logic.

The condition on lines 399-401 uses OR logic to determine if float format should be used, checking both the actual dtype and the format attribute. This permissive approach handles cases where dtype and format might be mismatched, but could benefit from a comment explaining the intent.

         # Determine the format based on the input dtype
         if isinstance(cmaj, np.ndarray):
+            # Use float format if either dtype is float32 OR format attribute indicates float
+            # This handles both correctly-typed data and data that needs format conversion
             if cmaj.dtype == np.float32 or (
                 self.format and self.format.lower() in ("f32", "float32")
             ):
                 input_format = "fltp"  # planar float32
             else:
                 input_format = "s16p"  # planar int16

577-580: Improve error handling in the fallback case.

The final fallback return bytes(arr) at line 580 could raise an unclear TypeError for unsupported types. Consider adding explicit handling or logging to make debugging easier.

         # Fallback
         if isinstance(arr, (bytes, bytearray)):
             return bytes(arr)
-        return bytes(arr)
+        # Attempt conversion with better error message
+        try:
+            return bytes(arr)
+        except TypeError as e:
+            raise TypeError(
+                f"Cannot convert samples of type {type(arr)} to bytes. "
+                f"Expected numpy array, bytes, or bytearray."
+            ) from e

1694-1730: LGTM! Useful normalization helper with proper clipping.

The function correctly handles format validation, resampling, and format conversion. Notably, it includes proper clipping when converting f32 to s16 (lines 1722). Consider using this helper more broadly in the module to ensure consistent format handling.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2151a2a and 2d93a1e.

📒 Files selected for processing (2)
  • getstream/video/rtc/track_util.py (20 hunks)
  • tests/rtc/test_pcm_data.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/rtc/test_pcm_data.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Tests (3.10)
  • GitHub Check: Tests (3.13)
  • GitHub Check: Tests (3.11)
  • GitHub Check: Tests (3.12)
🔇 Additional comments (7)
getstream/video/rtc/track_util.py (7)

30-81: LGTM! Well-designed AudioFormat enum.

The enum with string inheritance provides good backward compatibility while adding type safety. The validate() method is a helpful addition for input validation.


102-141: LGTM! Constructor properly handles default dtype based on format.

The default samples array now correctly determines dtype based on the format parameter, addressing the previous review concern. The logic matches the pattern used in clear() for consistency.


689-836: LGTM! Well-implemented in-place append with proper format handling.

The method correctly modifies the object in-place and returns self for chaining. The format conversion logic is thorough, including proper clipping when converting float32 to int16 (lines 739-742).


838-891: LGTM! Standard copy and clear implementations.

Both methods follow Python conventions:

  • copy() performs deep copy with proper handling of numpy arrays vs other types
  • clear() returns None to match standard collection behavior and correctly determines dtype based on format

1024-1144: LGTM! Comprehensive chunking implementation with proper timestamp handling.

The method correctly handles both mono and multi-channel audio, with accurate timestamp calculations for each chunk (lines 1092, 1132). The overlap and padding options provide flexibility for windowing use cases.


1146-1170: LGTM! Clean time-based windowing abstraction.

The method provides a convenient time-based interface (milliseconds) that correctly delegates to the sample-based chunks() method.


1172-1386: LGTM! Symmetric head and tail implementations with proper validation.

Both methods provide clean interfaces for extracting time-based segments with optional padding. The pad_at parameter validation (lines 1206-1207, 1314-1315) prevents invalid values, and both handle mono and multi-channel audio correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2d93a1e and 62a20d6.

📒 Files selected for processing (2)
  • getstream/video/rtc/track_util.py (20 hunks)
  • tests/rtc/test_pcm_data.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/rtc/test_pcm_data.py (1)
getstream/video/rtc/track_util.py (8)
  • PcmData (88-1414)
  • copy (866-892)
  • id (1537-1538)
  • append (717-864)
  • clear (894-919)
  • resample (401-571)
  • AudioFormat (30-80)
  • from_data (329-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Tests (3.11)

@tbarbugli tbarbugli merged commit 7648c04 into main Oct 30, 2025
5 checks passed
This was referenced Nov 3, 2025
@coderabbitai coderabbitai bot mentioned this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants