From bee604f30b115ddc227f4566b7d400c1a960fd42 Mon Sep 17 00:00:00 2001 From: Guido Urdaneta Date: Fri, 7 Feb 2025 16:08:52 +0100 Subject: [PATCH 1/5] Add timestamp explainer --- timestamps.md | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 302 insertions(+) create mode 100644 timestamps.md diff --git a/timestamps.md b/timestamps.md new file mode 100644 index 0000000..20d3ca9 --- /dev/null +++ b/timestamps.md @@ -0,0 +1,302 @@ +# Extra Timestamps for encoded RTC media frames + +## Authors: + +- Guido Urdaneta (Google) + +## Participate +- https://github.com/w3c/webrtc-encoded-transform + + +## Introduction + +The [WebRTC Encoded Transform](https://w3c.github.io/webrtc-encoded-transform/) +API allows applications to access encoded media flowing through a WebRTC +[RTCPeerConnection](https://w3c.github.io/webrtc-pc/#dom-rtcpeerconnection). +Video data is exposed as +[RTCEncodedVideoFrame](https://w3c.github.io/webrtc-encoded-transform/#rtcencodedvideoframe)s +and audio data is exposed as +[RTCEncodedAudioFrame](https://w3c.github.io/webrtc-encoded-transform/#rtcencodedaudioframe)s. +Both types of frames have a getMetadata() method that returns a number of +metadata fields containing more information about the frames. + +This proposal consists in adding a number of additional metadata fields +containing timestamps, in line with recent additions to +[VideoFrameMetadata](https://w3c.github.io/webcodecs/video_frame_metadata_registry.html#videoframemetadata-members) +in [WebCodecs](https://w3c.github.io/webcodecs/) and +[requestVideoFrameCallback](https://wicg.github.io/video-rvfc/#video-frame-callback-metadata-attributes). + +For the purposes of this proposal, we use the following definitions: +* The *capturer system* is a system that originally captures a media frame, + typically from a local camera, microphone or screen-share session. This frame + can be relayed through multiple systems before it reaches its final + destination. +* The *receiver system* is the final destination of the captured frames. It + receives the data via an [RTCPeerConnection] and it uses the WebRTC Encoded + Transform API with the changes included in this proposal. +* The *sender system* is the system that communicates directly with the + *receiver system*. It may be the same as the capturer system, but not + necessarily. It is the last hop before the captured frames reach the receiver + system. + +The proposed new metadata fields are: +* `receiveTime`: The time when the frame was received from the sender system. +* `captureTime`: The time when the frame was captured by the capturer system. + This timestamp is set by the capturer system. +* `senderCaptureTimeOffset`: An estimate of the offset between the capturer + system clock system and the sender system clock. The receiver system can + compute the clock offset between the receiver system and the sender system + and these two offset can be used to adjust the `captureTime` to the + receiver system clock. + +`captureTime` and `senderCaptureTimeOffset` are provided in WebRTC by the +[Absolute Capture Time" header extension](https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/rtp-hdrext/abs-capture-time). + +Note that the [RTCRtpContributingSource](https://www.w3.org/TR/webrtc/#dom-rtcrtpcontributingsource) +interface also exposes these timestamps +(see also [extensions[(https://w3c.github.io/webrtc-extensions/#rtcrtpcontributingsource-extensions)]), +but in a way that is not suitable for applications using the WebRTC Encoded +Transform API. The reason is that encoded transforms operate per frame, while +the values in [RTCRtpContributingSource]() are the most recent seen by the UA, +which make it impossible to know if the values provided by +[RTCRtpContributingSource]() actually correspond to the frames being processed +by the application. + + +## User-Facing Problem + +This API supports applications where measuring the delay between the reception +of a media frame and its original capture is useful. + +Some examples use cases are: +1. Audio/video synchronization measurements +2. Performance measurements +3. Delay measurements + +In all of these cases, the application can log the measurements for offline +analysis or A/B testing, but also adjust application parameters in real time. + + +### Goals [or Motivating Use Cases, or Scenarios] + +- Provide Web applications using WebRTC Encoded Transform access to receive and + capture timestamps in addition to existing metadata already provided. +- Align encoded frame metadata with [metadata provided for raw frames](). + +### Non-goals + +- Provide mechanisms to improve WebRTC communication mechanisms based on the +information provided by these new metadata fields. + + +### Example + +This shows an example of an application that: +1. Computes the delay between audio and video +2. Computes the processing and logs and/or updates remote parameters based on the +delay. + +```js +// code in a DedicatedWorker +let lastVideoCaptureTime; +let lastAudioCaptureTime; +let lastVideoSenderCaptureTimeOffset; +let lastVideoProcessingTime; +let senderReceiverClockOffset = null; + +function updateAVSync() { + const avSyncDifference = lastVideoCaptureTime - lastAudioCaptureTime; + doSomethingWithAVSync(avSyncDifference); +} + +// Measures delay from original capture until reception by this system. +// Other forms of delay are also possible. +function updateEndToEndVideoDelay() { + if (senderReceiverClockOffset == null) { + return; + } + + const adjustedCaptureTime = + senderReceiverClockOffset + lastVideoSenderCaptureTimeOffset + lastVideoCaptureTime; + const endToEndDelay = lastVideoReceiveTime - adjustedCaptureTime; + doSomethingWithEndToEndDelay(endToEndDelay); +} + +function updateVideoProcessingTime() { + const processingTime = lastVideoProcessingTime - lastVideoReceiveTime; + doSomethingWithProcessingTime(); +} + +function createReceiverAudioTransform() { + return new TransformStream({ + start() {}, + flush() {}, + async transform(encodedFrame, controller) { + let metadata = encodedFrame.getMetadata(); + lastAudioCaptureTime = metadata.captureTime; + updateAVSync(); + controller.enqueue(encodedFrame); + } + }); +} + +function createReceiverVideoTransform() { + return new TransformStream({ + start() {}, + flush() {}, + async transform(encodedFrame, controller) { + let metadata = encodedFrame.getMetadata(); + lastVideoCaptureTime = metadata.captureTime; + updateAVSync(); + lastVideoReceiveTime = metadata.receiveTime; + lastVideoSenderCaptureTimeOffset = metadata.senderCaptureTimeOffset; + updateEndToEndDelay(); + doSomeEncodedVideoProcessing(encodedFrame.data); + lastVideoProcessingTime = performance.now(); + updateProcessing(); + controller.enqueue(encodedFrame); + } + }); +} + +// Code to instantiate transforms and attach them to sender/receiver pipelines. +onrtctransform = (event) => { + let transform; + if (event.transformer.options.name == "receiverAudioTransform") + transform = createReceiverAudioTransform(); + else if (event.transformer.options.name == "receiverVideoTransform") + transform = createReceiverVideoTransform(); + else + return; + event.transformer.readable + .pipeThrough(transform) + .pipeTo(event.transformer.writable); +}; + +onmessage = (event) => { + senderReceiverClockOffset = event.data; +} + + +// Code running on Window +const worker = new Worker('worker.js'); +const pc = new RTCPeerConnection(); + +// Do ICE and offer/answer exchange. Removed from this example for clarity. + +// Configure transforms in the worker +pc.ontrack = e => { + if (e.track.kind == "video") + e.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverVideoTransform" }); + else // audio + e.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverAudioTransform" }); +} + +// Compute the clock offset between the sender and this system. +const stats = pc.getStats(); +const remoteOutboundRtpStats = getRequiredStats(stats, "remote-outbound-rtp"); +const remoteInboundRtpStats = getRequiredStats(stats, "remote-inbound-rtp") +const senderReceiverTimeOffset = + remoteOutboundRtpStats.timestamp - + (remoteOutboundRtpStats.remoteTimestamp + + remoteInboundRtpStats.roundTripTime / 2); + +worker.postMessage(senderReceiverTimeOffset); +``` + + +## Alternatives considered + +### [Alternative 1] + +Use the values already exposed in `RTCRtpContributingSource`. + +`RTCRtpContibutingSource` already exposes the same timestamps as in this proposal. +The problem with using those timestamps is that it is impossible to reliably +associate them to a specific encoded frame exposed by the WebRTC Encoded +Transform API. + +This makes any of the computations in this proposal unreliable. + +### [Alternative 2] + +Expose only `captureTime` and `receiveTime`. + +`senderCaptureTimeOffset` is a value that is provided by the +[Absolute Capture Timestamp]()https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/rtp-hdrext/abs-capture-time#absolute-capture-time +WebRTC header extension, but that extension updates the value only periodically +since there is little value in computing the estimatefor every packet, so it is +strictly speaking not a per-frame value. Arguably, an application could use +the `senderCaptureTimeOffset` already exposed in `RTCRtpContributingSource`. + +However, given that this value is coupled with `captureTime` in the header +extension, it looks appropriate and more ergonomic to expose the pair in the +frame as well. While clock offsets do not usually change significantly +in a very short time, there is some extra accuracy in having the estimated +offset between the capturer system and the sender for that particular frame. +This could be more visible, for example, if the set of relays that frames +go through from the capturer system to the sender system changes. + +Exposing `senderCaptureTimeOffset` also makes it clearer that the `captureTime` +comes from the original capturer system, so it needs to be adjusted using the +corresponding clock offset. + + +### [Alternative 3] + +Expose a `captureTime` already adjusted to the receiver system's clock. + +The problem with this option is that clock offsets are estimates. Using +estimates makes computing A/V Sync more difficult and less accurate. + +For example, if the UA uses the a single estimate during the whole session, +the A/V sync computation will be accurate, but the capture times themselves will +be inaccurate as the clock offset estimate is never updated. Any other +computation made with the `captureTime` and other local timestamps will be +inaccurate. + +### [Alternative 4] + +Expose a `localClockOffset` instead of a `senderClockOffset`. + +This would certainly support the use cases presented here, but it would have the +following downsides: +* It would introduce an inconsistency with the values exposed in `RTCRtpContibutingSource`. + This can lead to confusion, as the `senderClockOffset` is always paired together + with the `captureTime` in the header extension and developers expect this association. +* Applications can compute their own estimate of the offset between sender + and receiver using WebRTC Stats and can control how often to update it. +* Some applications might be interested in computing delays using the sender + as reference. + +In short, while this would be useful, the additional value is limited compared +with the clarity, consistency and extra possibilities offered by exposing the +`senderClockOffset`. + + + +## Accessibility, Privacy, and Security Considerations + +These timestamps are already available in a form less suitable for applications +using WebRTC Encoded Transform as part of the RTCRtpContributingSource API. + +*The `captureTime` field is available via the +[RTCRtpContributingSource.captureTimestamp](https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-capturetimestamp) field. + + +*The `senderCaptureTimeOffset` field is available via the +[RTCRtpContributingSource.senderCaptureTimeOffset](https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-sendercapturetimeoffset) field. + +*The `receiveTime` field is available via the +[RTCRtpContributingSource.timestamp](https://w3c.github.io/webrtc-pc/#dom-rtcrtpcontributingsource-timestamp) field. + +While these fields are not 100% equivalent to the fields in this proposal, +they have the same privacy characteristics. Therefore, we consider that the +privacy delta of this proposal is zero. + +## References & acknowledgements + +Many thanks for valuable feedback and advice from: +- Florent Castelli +- Harald Avelstrand +- Henrik Boström From 80c78306928e71041a4d7294d1e2a4e5d85b4b8b Mon Sep 17 00:00:00 2001 From: Guido Urdaneta Date: Fri, 7 Feb 2025 16:12:29 +0100 Subject: [PATCH 2/5] Minor fix to timestamp explainer --- timestamps.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/timestamps.md b/timestamps.md index 20d3ca9..ae96888 100644 --- a/timestamps.md +++ b/timestamps.md @@ -77,12 +77,13 @@ In all of these cases, the application can log the measurements for offline analysis or A/B testing, but also adjust application parameters in real time. -### Goals [or Motivating Use Cases, or Scenarios] +### Goals - Provide Web applications using WebRTC Encoded Transform access to receive and capture timestamps in addition to existing metadata already provided. - Align encoded frame metadata with [metadata provided for raw frames](). + ### Non-goals - Provide mechanisms to improve WebRTC communication mechanisms based on the From 89cb317955eb61fa917b4f25959e088dcd767c52 Mon Sep 17 00:00:00 2001 From: Guido Urdaneta Date: Fri, 7 Feb 2025 17:22:38 +0100 Subject: [PATCH 3/5] Add privacy/security questionnaire for timestamps. --- timestamp_sp_questionnaire.md | 76 +++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 timestamp_sp_questionnaire.md diff --git a/timestamp_sp_questionnaire.md b/timestamp_sp_questionnaire.md new file mode 100644 index 0000000..d8ab9eb --- /dev/null +++ b/timestamp_sp_questionnaire.md @@ -0,0 +1,76 @@ +# Security and Privacy questionnaire + +### 2.1. What information does this feature expose, and for what purposes? + +This feature exposes three timestamps associated to encoded audio and video +frames: +* Receive Timestamp: time when a media frame was received locally. +* Capture Timestamp: time when a media frame was originally captured, set by +the system that captured the frame. +* Capture Timestamp Server Offset: clock offset between the system that captured +the frame and the system that sent the frame to the local system using this + +### 2.2. Do features in your specification expose the minimum amount of information necessary to implement the intended functionality? +Yes. + +### 2.3. Do the features in your specification expose personal information, personally-identifiable information (PII), or information derived from either? +No. + +### 2.4. How do the features in your specification deal with sensitive information? +This feature does not deal with sensitive information. + +### 2.5. Does data exposed by your specification carry related but distinct information that may not be obvious to users? +No. + +### 2.6. Do the features in your specification introduce state that persists across browsing sessions? +No. + +### 2.7. Do the features in your specification expose information about the underlying platform to origins? +No. + +### 2.8. Does this specification allow an origin to send data to the underlying platform? +No. + +### 2.9. Do features in this specification enable access to device sensors? +No. + +### 2.10. Do features in this specification enable new script execution/loading mechanisms? +No. + +### 2.11. Do features in this specification allow an origin to access other devices? +No. + +### 2.12. Do features in this specification allow an origin some measure of control over a user agent’s native UI? +No. + +### 2.13. What temporary identifiers do the features in this specification create or expose to the web? +None. It exposes timestamps but they do not seem very useful as identifiers. + +### 2.14. How does this specification distinguish between behavior in first-party and third-party contexts? +No distinction. + +### 2.15. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode? +No distinction. + +### 2.16. Does this specification have both "Security Considerations" and "Privacy Considerations" sections? +This is a minor addition to an existing specification. The existing specification has a "Privacy and security considerations" section. + +### 2.17. Do features in your specification enable origins to downgrade default security protections? +Do features in your spec enable an origin to opt-out of security settings in order to accomplish something? If so, in what situations do these features allow such downgrading, and why? +No. + +### 2.18. What happens when a document that uses your feature is kept alive in BFCache (instead of getting destroyed) after navigation, and potentially gets reused on future navigations back to the document? +In this case, peer connection are closed, and the feature becomes inaccessible. + +### 2.19. What happens when a document that uses your feature gets disconnected? +In this case, peer connection are closed, and the feature becomes inaccessible. + + +### 2.20. Does your spec define when and how new kinds of errors should be raised? +This feature does not produce new kinds of errors. + +### 2.21. Does your feature allow sites to learn about the user’s use of assistive technology? +No. + +### 2.22. What should this questionnaire have asked? +The questions seem appropriate. From 444d627aa9e7f6b926c97e12c0f29a7963dbc5dc Mon Sep 17 00:00:00 2001 From: Guido Urdaneta Date: Fri, 7 Feb 2025 17:24:44 +0100 Subject: [PATCH 4/5] Some changes to timestamps explainer --- timestamps.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/timestamps.md b/timestamps.md index ae96888..0e4fd06 100644 --- a/timestamps.md +++ b/timestamps.md @@ -20,20 +20,20 @@ and audio data is exposed as Both types of frames have a getMetadata() method that returns a number of metadata fields containing more information about the frames. -This proposal consists in adding a number of additional metadata fields +This feature consists in adding a number of additional metadata fields containing timestamps, in line with recent additions to [VideoFrameMetadata](https://w3c.github.io/webcodecs/video_frame_metadata_registry.html#videoframemetadata-members) in [WebCodecs](https://w3c.github.io/webcodecs/) and [requestVideoFrameCallback](https://wicg.github.io/video-rvfc/#video-frame-callback-metadata-attributes). -For the purposes of this proposal, we use the following definitions: +For the purposes of this feature, we use the following definitions: * The *capturer system* is a system that originally captures a media frame, typically from a local camera, microphone or screen-share session. This frame can be relayed through multiple systems before it reaches its final destination. * The *receiver system* is the final destination of the captured frames. It receives the data via an [RTCPeerConnection] and it uses the WebRTC Encoded - Transform API with the changes included in this proposal. + Transform API with the changes proposed by this feature. * The *sender system* is the system that communicates directly with the *receiver system*. It may be the same as the capturer system, but not necessarily. It is the last hop before the captured frames reach the receiver @@ -212,12 +212,12 @@ worker.postMessage(senderReceiverTimeOffset); Use the values already exposed in `RTCRtpContributingSource`. -`RTCRtpContibutingSource` already exposes the same timestamps as in this proposal. +`RTCRtpContibutingSource` already exposes the same timestamps as this feature. The problem with using those timestamps is that it is impossible to reliably associate them to a specific encoded frame exposed by the WebRTC Encoded Transform API. -This makes any of the computations in this proposal unreliable. +This makes any of the computations in this feature unreliable. ### [Alternative 2] @@ -291,9 +291,9 @@ using WebRTC Encoded Transform as part of the RTCRtpContributingSource API. *The `receiveTime` field is available via the [RTCRtpContributingSource.timestamp](https://w3c.github.io/webrtc-pc/#dom-rtcrtpcontributingsource-timestamp) field. -While these fields are not 100% equivalent to the fields in this proposal, +While these fields are not 100% equivalent to the fields in this feature, they have the same privacy characteristics. Therefore, we consider that the -privacy delta of this proposal is zero. +privacy delta of this feature is zero. ## References & acknowledgements From 61bf6ef563283786c83b88fcd7a8a628382d6691 Mon Sep 17 00:00:00 2001 From: Guido Urdaneta Date: Mon, 3 Feb 2025 10:42:26 +0100 Subject: [PATCH 5/5] Add receiveTime field to RTCEncodedVideoFrameMetadata and RTCEncodedAudioFrameMetadata Drive-by: Fix bugs preventing proper translation of the spec. --- index.bs | 25 +++ timestamp_sp_questionnaire.md | 76 --------- timestamps.md | 303 ---------------------------------- 3 files changed, 25 insertions(+), 379 deletions(-) delete mode 100644 timestamp_sp_questionnaire.md delete mode 100644 timestamps.md diff --git a/index.bs b/index.bs index cdd4a0c..64504d5 100644 --- a/index.bs +++ b/index.bs @@ -358,6 +358,7 @@ dictionary RTCEncodedVideoFrameMetadata { sequence<unsigned long> contributingSources; long long timestamp; // microseconds unsigned long rtpTimestamp; + DOMHighResTimeStamp receiveTime; DOMString mimeType; }; @@ -431,6 +432,18 @@ dictionary RTCEncodedVideoFrameMetadata { that reflects the sampling instant of the first octet in the RTP data packet.

+
+ receiveTime DOMHighResTimeStamp +
+
+

+ For frames coming from an RTCRtpReceiver, represents the timestamp + of the last received packet used to produce this video frame. This + timestamp is relative to {{Performance}}.{{Performance/timeOrigin}}. + Only exists for incoming video frames. +

+
mimeType DOMString
@@ -614,6 +627,7 @@ dictionary RTCEncodedAudioFrameMetadata { sequence<unsigned long> contributingSources; short sequenceNumber; unsigned long rtpTimestamp; + DOMHighResTimeStamp receiveTime; DOMString mimeType; }; @@ -667,6 +681,17 @@ dictionary RTCEncodedAudioFrameMetadata { that reflects the sampling instant of the first octet in the RTP data packet.

+
+ receiveTime DOMHighResTimeStamp +
+
+

+ For frames coming from an RTCRtpReceiver, represents the timestamp + of the last received packet used to produce this audio frame. This + timestamp is relative to {{Performance}}.{{Performance/timeOrigin}}. + Only exists for incoming audio frames. +

mimeType DOMString
diff --git a/timestamp_sp_questionnaire.md b/timestamp_sp_questionnaire.md deleted file mode 100644 index d8ab9eb..0000000 --- a/timestamp_sp_questionnaire.md +++ /dev/null @@ -1,76 +0,0 @@ -# Security and Privacy questionnaire - -### 2.1. What information does this feature expose, and for what purposes? - -This feature exposes three timestamps associated to encoded audio and video -frames: -* Receive Timestamp: time when a media frame was received locally. -* Capture Timestamp: time when a media frame was originally captured, set by -the system that captured the frame. -* Capture Timestamp Server Offset: clock offset between the system that captured -the frame and the system that sent the frame to the local system using this - -### 2.2. Do features in your specification expose the minimum amount of information necessary to implement the intended functionality? -Yes. - -### 2.3. Do the features in your specification expose personal information, personally-identifiable information (PII), or information derived from either? -No. - -### 2.4. How do the features in your specification deal with sensitive information? -This feature does not deal with sensitive information. - -### 2.5. Does data exposed by your specification carry related but distinct information that may not be obvious to users? -No. - -### 2.6. Do the features in your specification introduce state that persists across browsing sessions? -No. - -### 2.7. Do the features in your specification expose information about the underlying platform to origins? -No. - -### 2.8. Does this specification allow an origin to send data to the underlying platform? -No. - -### 2.9. Do features in this specification enable access to device sensors? -No. - -### 2.10. Do features in this specification enable new script execution/loading mechanisms? -No. - -### 2.11. Do features in this specification allow an origin to access other devices? -No. - -### 2.12. Do features in this specification allow an origin some measure of control over a user agent’s native UI? -No. - -### 2.13. What temporary identifiers do the features in this specification create or expose to the web? -None. It exposes timestamps but they do not seem very useful as identifiers. - -### 2.14. How does this specification distinguish between behavior in first-party and third-party contexts? -No distinction. - -### 2.15. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode? -No distinction. - -### 2.16. Does this specification have both "Security Considerations" and "Privacy Considerations" sections? -This is a minor addition to an existing specification. The existing specification has a "Privacy and security considerations" section. - -### 2.17. Do features in your specification enable origins to downgrade default security protections? -Do features in your spec enable an origin to opt-out of security settings in order to accomplish something? If so, in what situations do these features allow such downgrading, and why? -No. - -### 2.18. What happens when a document that uses your feature is kept alive in BFCache (instead of getting destroyed) after navigation, and potentially gets reused on future navigations back to the document? -In this case, peer connection are closed, and the feature becomes inaccessible. - -### 2.19. What happens when a document that uses your feature gets disconnected? -In this case, peer connection are closed, and the feature becomes inaccessible. - - -### 2.20. Does your spec define when and how new kinds of errors should be raised? -This feature does not produce new kinds of errors. - -### 2.21. Does your feature allow sites to learn about the user’s use of assistive technology? -No. - -### 2.22. What should this questionnaire have asked? -The questions seem appropriate. diff --git a/timestamps.md b/timestamps.md deleted file mode 100644 index 0e4fd06..0000000 --- a/timestamps.md +++ /dev/null @@ -1,303 +0,0 @@ -# Extra Timestamps for encoded RTC media frames - -## Authors: - -- Guido Urdaneta (Google) - -## Participate -- https://github.com/w3c/webrtc-encoded-transform - - -## Introduction - -The [WebRTC Encoded Transform](https://w3c.github.io/webrtc-encoded-transform/) -API allows applications to access encoded media flowing through a WebRTC -[RTCPeerConnection](https://w3c.github.io/webrtc-pc/#dom-rtcpeerconnection). -Video data is exposed as -[RTCEncodedVideoFrame](https://w3c.github.io/webrtc-encoded-transform/#rtcencodedvideoframe)s -and audio data is exposed as -[RTCEncodedAudioFrame](https://w3c.github.io/webrtc-encoded-transform/#rtcencodedaudioframe)s. -Both types of frames have a getMetadata() method that returns a number of -metadata fields containing more information about the frames. - -This feature consists in adding a number of additional metadata fields -containing timestamps, in line with recent additions to -[VideoFrameMetadata](https://w3c.github.io/webcodecs/video_frame_metadata_registry.html#videoframemetadata-members) -in [WebCodecs](https://w3c.github.io/webcodecs/) and -[requestVideoFrameCallback](https://wicg.github.io/video-rvfc/#video-frame-callback-metadata-attributes). - -For the purposes of this feature, we use the following definitions: -* The *capturer system* is a system that originally captures a media frame, - typically from a local camera, microphone or screen-share session. This frame - can be relayed through multiple systems before it reaches its final - destination. -* The *receiver system* is the final destination of the captured frames. It - receives the data via an [RTCPeerConnection] and it uses the WebRTC Encoded - Transform API with the changes proposed by this feature. -* The *sender system* is the system that communicates directly with the - *receiver system*. It may be the same as the capturer system, but not - necessarily. It is the last hop before the captured frames reach the receiver - system. - -The proposed new metadata fields are: -* `receiveTime`: The time when the frame was received from the sender system. -* `captureTime`: The time when the frame was captured by the capturer system. - This timestamp is set by the capturer system. -* `senderCaptureTimeOffset`: An estimate of the offset between the capturer - system clock system and the sender system clock. The receiver system can - compute the clock offset between the receiver system and the sender system - and these two offset can be used to adjust the `captureTime` to the - receiver system clock. - -`captureTime` and `senderCaptureTimeOffset` are provided in WebRTC by the -[Absolute Capture Time" header extension](https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/rtp-hdrext/abs-capture-time). - -Note that the [RTCRtpContributingSource](https://www.w3.org/TR/webrtc/#dom-rtcrtpcontributingsource) -interface also exposes these timestamps -(see also [extensions[(https://w3c.github.io/webrtc-extensions/#rtcrtpcontributingsource-extensions)]), -but in a way that is not suitable for applications using the WebRTC Encoded -Transform API. The reason is that encoded transforms operate per frame, while -the values in [RTCRtpContributingSource]() are the most recent seen by the UA, -which make it impossible to know if the values provided by -[RTCRtpContributingSource]() actually correspond to the frames being processed -by the application. - - -## User-Facing Problem - -This API supports applications where measuring the delay between the reception -of a media frame and its original capture is useful. - -Some examples use cases are: -1. Audio/video synchronization measurements -2. Performance measurements -3. Delay measurements - -In all of these cases, the application can log the measurements for offline -analysis or A/B testing, but also adjust application parameters in real time. - - -### Goals - -- Provide Web applications using WebRTC Encoded Transform access to receive and - capture timestamps in addition to existing metadata already provided. -- Align encoded frame metadata with [metadata provided for raw frames](). - - -### Non-goals - -- Provide mechanisms to improve WebRTC communication mechanisms based on the -information provided by these new metadata fields. - - -### Example - -This shows an example of an application that: -1. Computes the delay between audio and video -2. Computes the processing and logs and/or updates remote parameters based on the -delay. - -```js -// code in a DedicatedWorker -let lastVideoCaptureTime; -let lastAudioCaptureTime; -let lastVideoSenderCaptureTimeOffset; -let lastVideoProcessingTime; -let senderReceiverClockOffset = null; - -function updateAVSync() { - const avSyncDifference = lastVideoCaptureTime - lastAudioCaptureTime; - doSomethingWithAVSync(avSyncDifference); -} - -// Measures delay from original capture until reception by this system. -// Other forms of delay are also possible. -function updateEndToEndVideoDelay() { - if (senderReceiverClockOffset == null) { - return; - } - - const adjustedCaptureTime = - senderReceiverClockOffset + lastVideoSenderCaptureTimeOffset + lastVideoCaptureTime; - const endToEndDelay = lastVideoReceiveTime - adjustedCaptureTime; - doSomethingWithEndToEndDelay(endToEndDelay); -} - -function updateVideoProcessingTime() { - const processingTime = lastVideoProcessingTime - lastVideoReceiveTime; - doSomethingWithProcessingTime(); -} - -function createReceiverAudioTransform() { - return new TransformStream({ - start() {}, - flush() {}, - async transform(encodedFrame, controller) { - let metadata = encodedFrame.getMetadata(); - lastAudioCaptureTime = metadata.captureTime; - updateAVSync(); - controller.enqueue(encodedFrame); - } - }); -} - -function createReceiverVideoTransform() { - return new TransformStream({ - start() {}, - flush() {}, - async transform(encodedFrame, controller) { - let metadata = encodedFrame.getMetadata(); - lastVideoCaptureTime = metadata.captureTime; - updateAVSync(); - lastVideoReceiveTime = metadata.receiveTime; - lastVideoSenderCaptureTimeOffset = metadata.senderCaptureTimeOffset; - updateEndToEndDelay(); - doSomeEncodedVideoProcessing(encodedFrame.data); - lastVideoProcessingTime = performance.now(); - updateProcessing(); - controller.enqueue(encodedFrame); - } - }); -} - -// Code to instantiate transforms and attach them to sender/receiver pipelines. -onrtctransform = (event) => { - let transform; - if (event.transformer.options.name == "receiverAudioTransform") - transform = createReceiverAudioTransform(); - else if (event.transformer.options.name == "receiverVideoTransform") - transform = createReceiverVideoTransform(); - else - return; - event.transformer.readable - .pipeThrough(transform) - .pipeTo(event.transformer.writable); -}; - -onmessage = (event) => { - senderReceiverClockOffset = event.data; -} - - -// Code running on Window -const worker = new Worker('worker.js'); -const pc = new RTCPeerConnection(); - -// Do ICE and offer/answer exchange. Removed from this example for clarity. - -// Configure transforms in the worker -pc.ontrack = e => { - if (e.track.kind == "video") - e.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverVideoTransform" }); - else // audio - e.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverAudioTransform" }); -} - -// Compute the clock offset between the sender and this system. -const stats = pc.getStats(); -const remoteOutboundRtpStats = getRequiredStats(stats, "remote-outbound-rtp"); -const remoteInboundRtpStats = getRequiredStats(stats, "remote-inbound-rtp") -const senderReceiverTimeOffset = - remoteOutboundRtpStats.timestamp - - (remoteOutboundRtpStats.remoteTimestamp + - remoteInboundRtpStats.roundTripTime / 2); - -worker.postMessage(senderReceiverTimeOffset); -``` - - -## Alternatives considered - -### [Alternative 1] - -Use the values already exposed in `RTCRtpContributingSource`. - -`RTCRtpContibutingSource` already exposes the same timestamps as this feature. -The problem with using those timestamps is that it is impossible to reliably -associate them to a specific encoded frame exposed by the WebRTC Encoded -Transform API. - -This makes any of the computations in this feature unreliable. - -### [Alternative 2] - -Expose only `captureTime` and `receiveTime`. - -`senderCaptureTimeOffset` is a value that is provided by the -[Absolute Capture Timestamp]()https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/rtp-hdrext/abs-capture-time#absolute-capture-time -WebRTC header extension, but that extension updates the value only periodically -since there is little value in computing the estimatefor every packet, so it is -strictly speaking not a per-frame value. Arguably, an application could use -the `senderCaptureTimeOffset` already exposed in `RTCRtpContributingSource`. - -However, given that this value is coupled with `captureTime` in the header -extension, it looks appropriate and more ergonomic to expose the pair in the -frame as well. While clock offsets do not usually change significantly -in a very short time, there is some extra accuracy in having the estimated -offset between the capturer system and the sender for that particular frame. -This could be more visible, for example, if the set of relays that frames -go through from the capturer system to the sender system changes. - -Exposing `senderCaptureTimeOffset` also makes it clearer that the `captureTime` -comes from the original capturer system, so it needs to be adjusted using the -corresponding clock offset. - - -### [Alternative 3] - -Expose a `captureTime` already adjusted to the receiver system's clock. - -The problem with this option is that clock offsets are estimates. Using -estimates makes computing A/V Sync more difficult and less accurate. - -For example, if the UA uses the a single estimate during the whole session, -the A/V sync computation will be accurate, but the capture times themselves will -be inaccurate as the clock offset estimate is never updated. Any other -computation made with the `captureTime` and other local timestamps will be -inaccurate. - -### [Alternative 4] - -Expose a `localClockOffset` instead of a `senderClockOffset`. - -This would certainly support the use cases presented here, but it would have the -following downsides: -* It would introduce an inconsistency with the values exposed in `RTCRtpContibutingSource`. - This can lead to confusion, as the `senderClockOffset` is always paired together - with the `captureTime` in the header extension and developers expect this association. -* Applications can compute their own estimate of the offset between sender - and receiver using WebRTC Stats and can control how often to update it. -* Some applications might be interested in computing delays using the sender - as reference. - -In short, while this would be useful, the additional value is limited compared -with the clarity, consistency and extra possibilities offered by exposing the -`senderClockOffset`. - - - -## Accessibility, Privacy, and Security Considerations - -These timestamps are already available in a form less suitable for applications -using WebRTC Encoded Transform as part of the RTCRtpContributingSource API. - -*The `captureTime` field is available via the -[RTCRtpContributingSource.captureTimestamp](https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-capturetimestamp) field. - - -*The `senderCaptureTimeOffset` field is available via the -[RTCRtpContributingSource.senderCaptureTimeOffset](https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-sendercapturetimeoffset) field. - -*The `receiveTime` field is available via the -[RTCRtpContributingSource.timestamp](https://w3c.github.io/webrtc-pc/#dom-rtcrtpcontributingsource-timestamp) field. - -While these fields are not 100% equivalent to the fields in this feature, -they have the same privacy characteristics. Therefore, we consider that the -privacy delta of this feature is zero. - -## References & acknowledgements - -Many thanks for valuable feedback and advice from: -- Florent Castelli -- Harald Avelstrand -- Henrik Boström