Skip to content

Commit bee604f

Browse files
author
Guido Urdaneta
committed
Add timestamp explainer
1 parent 4b61373 commit bee604f

File tree

1 file changed

+302
-0
lines changed

1 file changed

+302
-0
lines changed

timestamps.md

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# Extra Timestamps for encoded RTC media frames
2+
3+
## Authors:
4+
5+
- Guido Urdaneta (Google)
6+
7+
## Participate
8+
- https://github.com/w3c/webrtc-encoded-transform
9+
10+
11+
## Introduction
12+
13+
The [WebRTC Encoded Transform](https://w3c.github.io/webrtc-encoded-transform/)
14+
API allows applications to access encoded media flowing through a WebRTC
15+
[RTCPeerConnection](https://w3c.github.io/webrtc-pc/#dom-rtcpeerconnection).
16+
Video data is exposed as
17+
[RTCEncodedVideoFrame](https://w3c.github.io/webrtc-encoded-transform/#rtcencodedvideoframe)s
18+
and audio data is exposed as
19+
[RTCEncodedAudioFrame](https://w3c.github.io/webrtc-encoded-transform/#rtcencodedaudioframe)s.
20+
Both types of frames have a getMetadata() method that returns a number of
21+
metadata fields containing more information about the frames.
22+
23+
This proposal consists in adding a number of additional metadata fields
24+
containing timestamps, in line with recent additions to
25+
[VideoFrameMetadata](https://w3c.github.io/webcodecs/video_frame_metadata_registry.html#videoframemetadata-members)
26+
in [WebCodecs](https://w3c.github.io/webcodecs/) and
27+
[requestVideoFrameCallback](https://wicg.github.io/video-rvfc/#video-frame-callback-metadata-attributes).
28+
29+
For the purposes of this proposal, we use the following definitions:
30+
* The *capturer system* is a system that originally captures a media frame,
31+
typically from a local camera, microphone or screen-share session. This frame
32+
can be relayed through multiple systems before it reaches its final
33+
destination.
34+
* The *receiver system* is the final destination of the captured frames. It
35+
receives the data via an [RTCPeerConnection] and it uses the WebRTC Encoded
36+
Transform API with the changes included in this proposal.
37+
* The *sender system* is the system that communicates directly with the
38+
*receiver system*. It may be the same as the capturer system, but not
39+
necessarily. It is the last hop before the captured frames reach the receiver
40+
system.
41+
42+
The proposed new metadata fields are:
43+
* `receiveTime`: The time when the frame was received from the sender system.
44+
* `captureTime`: The time when the frame was captured by the capturer system.
45+
This timestamp is set by the capturer system.
46+
* `senderCaptureTimeOffset`: An estimate of the offset between the capturer
47+
system clock system and the sender system clock. The receiver system can
48+
compute the clock offset between the receiver system and the sender system
49+
and these two offset can be used to adjust the `captureTime` to the
50+
receiver system clock.
51+
52+
`captureTime` and `senderCaptureTimeOffset` are provided in WebRTC by the
53+
[Absolute Capture Time" header extension](https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/rtp-hdrext/abs-capture-time).
54+
55+
Note that the [RTCRtpContributingSource](https://www.w3.org/TR/webrtc/#dom-rtcrtpcontributingsource)
56+
interface also exposes these timestamps
57+
(see also [extensions[(https://w3c.github.io/webrtc-extensions/#rtcrtpcontributingsource-extensions)]),
58+
but in a way that is not suitable for applications using the WebRTC Encoded
59+
Transform API. The reason is that encoded transforms operate per frame, while
60+
the values in [RTCRtpContributingSource]() are the most recent seen by the UA,
61+
which make it impossible to know if the values provided by
62+
[RTCRtpContributingSource]() actually correspond to the frames being processed
63+
by the application.
64+
65+
66+
## User-Facing Problem
67+
68+
This API supports applications where measuring the delay between the reception
69+
of a media frame and its original capture is useful.
70+
71+
Some examples use cases are:
72+
1. Audio/video synchronization measurements
73+
2. Performance measurements
74+
3. Delay measurements
75+
76+
In all of these cases, the application can log the measurements for offline
77+
analysis or A/B testing, but also adjust application parameters in real time.
78+
79+
80+
### Goals [or Motivating Use Cases, or Scenarios]
81+
82+
- Provide Web applications using WebRTC Encoded Transform access to receive and
83+
capture timestamps in addition to existing metadata already provided.
84+
- Align encoded frame metadata with [metadata provided for raw frames]().
85+
86+
### Non-goals
87+
88+
- Provide mechanisms to improve WebRTC communication mechanisms based on the
89+
information provided by these new metadata fields.
90+
91+
92+
### Example
93+
94+
This shows an example of an application that:
95+
1. Computes the delay between audio and video
96+
2. Computes the processing and logs and/or updates remote parameters based on the
97+
delay.
98+
99+
```js
100+
// code in a DedicatedWorker
101+
let lastVideoCaptureTime;
102+
let lastAudioCaptureTime;
103+
let lastVideoSenderCaptureTimeOffset;
104+
let lastVideoProcessingTime;
105+
let senderReceiverClockOffset = null;
106+
107+
function updateAVSync() {
108+
const avSyncDifference = lastVideoCaptureTime - lastAudioCaptureTime;
109+
doSomethingWithAVSync(avSyncDifference);
110+
}
111+
112+
// Measures delay from original capture until reception by this system.
113+
// Other forms of delay are also possible.
114+
function updateEndToEndVideoDelay() {
115+
if (senderReceiverClockOffset == null) {
116+
return;
117+
}
118+
119+
const adjustedCaptureTime =
120+
senderReceiverClockOffset + lastVideoSenderCaptureTimeOffset + lastVideoCaptureTime;
121+
const endToEndDelay = lastVideoReceiveTime - adjustedCaptureTime;
122+
doSomethingWithEndToEndDelay(endToEndDelay);
123+
}
124+
125+
function updateVideoProcessingTime() {
126+
const processingTime = lastVideoProcessingTime - lastVideoReceiveTime;
127+
doSomethingWithProcessingTime();
128+
}
129+
130+
function createReceiverAudioTransform() {
131+
return new TransformStream({
132+
start() {},
133+
flush() {},
134+
async transform(encodedFrame, controller) {
135+
let metadata = encodedFrame.getMetadata();
136+
lastAudioCaptureTime = metadata.captureTime;
137+
updateAVSync();
138+
controller.enqueue(encodedFrame);
139+
}
140+
});
141+
}
142+
143+
function createReceiverVideoTransform() {
144+
return new TransformStream({
145+
start() {},
146+
flush() {},
147+
async transform(encodedFrame, controller) {
148+
let metadata = encodedFrame.getMetadata();
149+
lastVideoCaptureTime = metadata.captureTime;
150+
updateAVSync();
151+
lastVideoReceiveTime = metadata.receiveTime;
152+
lastVideoSenderCaptureTimeOffset = metadata.senderCaptureTimeOffset;
153+
updateEndToEndDelay();
154+
doSomeEncodedVideoProcessing(encodedFrame.data);
155+
lastVideoProcessingTime = performance.now();
156+
updateProcessing();
157+
controller.enqueue(encodedFrame);
158+
}
159+
});
160+
}
161+
162+
// Code to instantiate transforms and attach them to sender/receiver pipelines.
163+
onrtctransform = (event) => {
164+
let transform;
165+
if (event.transformer.options.name == "receiverAudioTransform")
166+
transform = createReceiverAudioTransform();
167+
else if (event.transformer.options.name == "receiverVideoTransform")
168+
transform = createReceiverVideoTransform();
169+
else
170+
return;
171+
event.transformer.readable
172+
.pipeThrough(transform)
173+
.pipeTo(event.transformer.writable);
174+
};
175+
176+
onmessage = (event) => {
177+
senderReceiverClockOffset = event.data;
178+
}
179+
180+
181+
// Code running on Window
182+
const worker = new Worker('worker.js');
183+
const pc = new RTCPeerConnection();
184+
185+
// Do ICE and offer/answer exchange. Removed from this example for clarity.
186+
187+
// Configure transforms in the worker
188+
pc.ontrack = e => {
189+
if (e.track.kind == "video")
190+
e.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverVideoTransform" });
191+
else // audio
192+
e.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverAudioTransform" });
193+
}
194+
195+
// Compute the clock offset between the sender and this system.
196+
const stats = pc.getStats();
197+
const remoteOutboundRtpStats = getRequiredStats(stats, "remote-outbound-rtp");
198+
const remoteInboundRtpStats = getRequiredStats(stats, "remote-inbound-rtp")
199+
const senderReceiverTimeOffset =
200+
remoteOutboundRtpStats.timestamp -
201+
(remoteOutboundRtpStats.remoteTimestamp +
202+
remoteInboundRtpStats.roundTripTime / 2);
203+
204+
worker.postMessage(senderReceiverTimeOffset);
205+
```
206+
207+
208+
## Alternatives considered
209+
210+
### [Alternative 1]
211+
212+
Use the values already exposed in `RTCRtpContributingSource`.
213+
214+
`RTCRtpContibutingSource` already exposes the same timestamps as in this proposal.
215+
The problem with using those timestamps is that it is impossible to reliably
216+
associate them to a specific encoded frame exposed by the WebRTC Encoded
217+
Transform API.
218+
219+
This makes any of the computations in this proposal unreliable.
220+
221+
### [Alternative 2]
222+
223+
Expose only `captureTime` and `receiveTime`.
224+
225+
`senderCaptureTimeOffset` is a value that is provided by the
226+
[Absolute Capture Timestamp]()https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/rtp-hdrext/abs-capture-time#absolute-capture-time
227+
WebRTC header extension, but that extension updates the value only periodically
228+
since there is little value in computing the estimatefor every packet, so it is
229+
strictly speaking not a per-frame value. Arguably, an application could use
230+
the `senderCaptureTimeOffset` already exposed in `RTCRtpContributingSource`.
231+
232+
However, given that this value is coupled with `captureTime` in the header
233+
extension, it looks appropriate and more ergonomic to expose the pair in the
234+
frame as well. While clock offsets do not usually change significantly
235+
in a very short time, there is some extra accuracy in having the estimated
236+
offset between the capturer system and the sender for that particular frame.
237+
This could be more visible, for example, if the set of relays that frames
238+
go through from the capturer system to the sender system changes.
239+
240+
Exposing `senderCaptureTimeOffset` also makes it clearer that the `captureTime`
241+
comes from the original capturer system, so it needs to be adjusted using the
242+
corresponding clock offset.
243+
244+
245+
### [Alternative 3]
246+
247+
Expose a `captureTime` already adjusted to the receiver system's clock.
248+
249+
The problem with this option is that clock offsets are estimates. Using
250+
estimates makes computing A/V Sync more difficult and less accurate.
251+
252+
For example, if the UA uses the a single estimate during the whole session,
253+
the A/V sync computation will be accurate, but the capture times themselves will
254+
be inaccurate as the clock offset estimate is never updated. Any other
255+
computation made with the `captureTime` and other local timestamps will be
256+
inaccurate.
257+
258+
### [Alternative 4]
259+
260+
Expose a `localClockOffset` instead of a `senderClockOffset`.
261+
262+
This would certainly support the use cases presented here, but it would have the
263+
following downsides:
264+
* It would introduce an inconsistency with the values exposed in `RTCRtpContibutingSource`.
265+
This can lead to confusion, as the `senderClockOffset` is always paired together
266+
with the `captureTime` in the header extension and developers expect this association.
267+
* Applications can compute their own estimate of the offset between sender
268+
and receiver using WebRTC Stats and can control how often to update it.
269+
* Some applications might be interested in computing delays using the sender
270+
as reference.
271+
272+
In short, while this would be useful, the additional value is limited compared
273+
with the clarity, consistency and extra possibilities offered by exposing the
274+
`senderClockOffset`.
275+
276+
277+
278+
## Accessibility, Privacy, and Security Considerations
279+
280+
These timestamps are already available in a form less suitable for applications
281+
using WebRTC Encoded Transform as part of the RTCRtpContributingSource API.
282+
283+
*The `captureTime` field is available via the
284+
[RTCRtpContributingSource.captureTimestamp](https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-capturetimestamp) field.
285+
286+
287+
*The `senderCaptureTimeOffset` field is available via the
288+
[RTCRtpContributingSource.senderCaptureTimeOffset](https://w3c.github.io/webrtc-extensions/#dom-rtcrtpcontributingsource-sendercapturetimeoffset) field.
289+
290+
*The `receiveTime` field is available via the
291+
[RTCRtpContributingSource.timestamp](https://w3c.github.io/webrtc-pc/#dom-rtcrtpcontributingsource-timestamp) field.
292+
293+
While these fields are not 100% equivalent to the fields in this proposal,
294+
they have the same privacy characteristics. Therefore, we consider that the
295+
privacy delta of this proposal is zero.
296+
297+
## References & acknowledgements
298+
299+
Many thanks for valuable feedback and advice from:
300+
- Florent Castelli
301+
- Harald Avelstrand
302+
- Henrik Boström

0 commit comments

Comments
 (0)