Skip to content

Commit 1f74f3f

Browse files
author
guest271314
authored
Merge pull request #1 from w3c/master
Update
2 parents 92579b7 + 2130e95 commit 1f74f3f

File tree

10 files changed

+1980
-286
lines changed

10 files changed

+1980
-286
lines changed

.pr-preview.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"src_file": "index.bs",
3+
"type": "bikeshed",
4+
"params": {
5+
"force": 1
6+
}
7+
}
8+

CODE_OF_CONDUCT.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Code of Conduct
2+
3+
All documentation, code and communication under this repository are covered by the [W3C Code of Ethics and Professional Conduct](https://www.w3.org/Consortium/cepc/).

CONTRIBUTING.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Web Real-Time Communications Working Group
2+
3+
Contributions to this repository are intended to become part of Recommendation-track documents governed by the
4+
[W3C Patent Policy](https://www.w3.org/Consortium/Patent-Policy-20040205/) and
5+
[Software and Document License](https://www.w3.org/Consortium/Legal/copyright-software). To make substantive contributions to specifications, you must either participate
6+
in the relevant W3C Working Group or make a non-member patent licensing commitment.
7+
8+
If you are not the sole contributor to a contribution (pull request), please identify all
9+
contributors in the pull request comment.
10+
11+
To add a contributor (other than yourself, that's automatic), mark them one per line as follows:
12+
13+
```
14+
+@github_username
15+
```
16+
17+
If you added a contributor by mistake, you can remove them in a comment with:
18+
19+
```
20+
-@github_username
21+
```
22+
23+
If you are making a pull request on behalf of someone else but you had no part in designing the
24+
feature, you can remove yourself with the above syntax.

LICENSE.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
All documents in this Repository are licensed by contributors
2+
under the
3+
[W3C Software and Document License](https://www.w3.org/Consortium/Legal/copyright-software).
4+

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# WebRTC API for insertable Streams of Media
1+
# WebRTC API for Insertable Streams of Media
22

33
(not to be confused with the MediaStreams API)
44

explainer.md

Lines changed: 103 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ RTCPeerConnection API in order to build an API that is:
2323
* Able to support user defined component wrapping and replacement
2424

2525
The central idea is to expose components in an RTCPeerConnection as a collection of
26-
streams (as defined by the [WHATWG Streams API] (https://streams.spec.whatwg.org/)),
26+
streams (as defined by the [WHATWG Streams API](https://streams.spec.whatwg.org/)),
2727
which can be manipulated to introduce new components, or to wrap or replace existing
2828
components.
2929

@@ -43,6 +43,12 @@ iterations to support additional use cases such as:
4343
* Custom codecs for special purposes (in combination with WebCodecs)
4444

4545
## Code Examples
46+
0. Feature detection can be done as follows:
47+
48+
<pre>
49+
const supportsInsertableStreams = window.RTCRtpSender &&
50+
!!RTCRtpSender.prototype.createEncodedStreams;
51+
</pre>
4652

4753
1. Let an PeerConnection know that it should allow exposing the data flowing through it
4854
as streams.
@@ -54,8 +60,7 @@ streams. For example:
5460

5561
<pre>
5662
let pc = new RTCPeerConnection({
57-
forceEncodedVideoInsertableStreams: true,
58-
forceEncodedAudioInsertableStreams: true
63+
encodedInsertableStreams: true,
5964
});
6065
</pre>
6166

@@ -70,25 +75,25 @@ of an encoded frame and adds 4 bytes of padding.
7075
// Called on startup.
7176
},
7277

73-
async transform(chunk, controller) {
74-
let view = new DataView(chunk.data);
78+
async transform(encodedFrame, controller) {
79+
let view = new DataView(encodedFrame.data);
7580
// Create a new buffer with 4 additional bytes.
76-
let newData = new ArrayBuffer(chunk.data.byteLength + 4);
81+
let newData = new ArrayBuffer(encodedFrame.data.byteLength + 4);
7782
let newView = new DataView(newData);
7883

7984
// Fill the new buffer with a negated version of all
8085
// the bits in the original frame.
81-
for (let i = 0; i < chunk.data.byteLength; ++i)
86+
for (let i = 0; i < encodedFrame.data.byteLength; ++i)
8287
newView.setInt8(i, ~view.getInt8(i));
8388
// Set the padding bytes to zero.
8489
for (let i = 0; i < 4; ++i)
85-
newView.setInt8(chunk.data.byteLength + i, 0);
90+
newView.setInt8(encodedFrame.data.byteLength + i, 0);
8691

8792
// Replace the frame's data with the new buffer.
88-
chunk.data = newData;
93+
encodedFrame.data = newData;
8994

9095
// Send it to the output stream.
91-
controller.enqueue(chunk);
96+
controller.enqueue(encodedFrame);
9297
},
9398

9499
flush() {
@@ -104,7 +109,7 @@ Transform stream to the track's sender.
104109
let stream = await navigator.mediaDevices.getUserMedia({video:true});
105110
let [track] = stream.getTracks();
106111
let videoSender = pc.addTrack(track, stream)
107-
let senderStreams = videoSender.getEncodedVideoStreams();
112+
let senderStreams = videoSender.createEncodedStreams();
108113

109114
// Do ICE and offer/answer exchange.
110115

@@ -116,39 +121,30 @@ senderStreams.readable
116121
4. Do the corresponding operations on the receiver side.
117122

118123
<pre>
119-
let pc = new RTCPeerConnection({forceEncodedVideoInsertableStreams: true});
124+
let pc = new RTCPeerConnection({encodedInsertableStreams: true});
120125
pc.ontrack = e => {
121-
let receivers = pc.getReceivers();
122-
let videoReceiver = null;
123-
for (const r of receivers) {
124-
if (r.track.kind == 'video')
125-
videoReceiver = r;
126-
}
127-
if (!videoReceiver)
128-
return;
129-
130126
let receiverTransform = new TransformStream({
131127
start() {},
132128
flush() {},
133-
async transform(chunk, controller) {
129+
async transform(encodedFrame, controller) {
134130
// Reconstruct the original frame.
135-
let view = new DataView(chunk.data);
131+
let view = new DataView(encodedFrame.data);
136132

137133
// Ignore the last 4 bytes
138-
let newData = new ArrayBuffer(chunk.data.byteLength - 4);
134+
let newData = new ArrayBuffer(encodedFrame.data.byteLength - 4);
139135
let newView = new DataView(newData);
140136

141137
// Negate all bits in the incoming frame, ignoring the
142138
// last 4 bytes
143-
for (let i = 0; i < chunk.data.byteLength - 4; ++i)
139+
for (let i = 0; i < encodedFrame.data.byteLength - 4; ++i)
144140
newView.setInt8(i, ~view.getInt8(i));
145141

146-
chunk.data = newData;
147-
controller.enqueue(chunk);
142+
encodedFrame.data = newData;
143+
controller.enqueue(encodedFrame);
148144
},
149145
});
150146

151-
let receiverStreams = videoReceiver.createEncodedVideoStreams();
147+
let receiverStreams = e.receiver.createEncodedStreams();
152148
receiverStreams.readable
153149
.pipeThrough(receiverTransform)
154150
.pipeTo(receiverStreams.writable);
@@ -158,7 +154,7 @@ pc.ontrack = e => {
158154
## API
159155

160156
The following are the IDL modifications proposed by this API.
161-
Future iterations will add additional operations following a similar pattern.
157+
Future iterations may add additional operations following a similar pattern.
162158

163159
<pre>
164160
// New dictionary.
@@ -175,42 +171,103 @@ enum RTCEncodedVideoFrameType {
175171
"delta",
176172
};
177173

174+
// New dictionaries for video and audio metadata.
175+
dictionary RTCEncodedVideoFrameMetadata {
176+
long long frameId;
177+
sequence&lt;long long&gt; dependencies;
178+
unsigned short width;
179+
unsigned short height;
180+
long spatialIndex;
181+
long temporalIndex;
182+
long synchronizationSource;
183+
sequence&lt;long&gt; contributingSources;
184+
};
185+
186+
dictionary RTCEncodedAudioFrameMetadata {
187+
long synchronizationSource;
188+
sequence&lt;long&gt; contributingSources;
189+
};
190+
178191
// New interfaces to define encoded video and audio frames. Will eventually
179192
// re-use or extend the equivalent defined in WebCodecs.
180-
// The additionalData fields contain metadata about the frame and might be
193+
// The additionalData fields contain metadata about the frame and will
181194
// eventually be exposed differently.
182195
interface RTCEncodedVideoFrame {
183196
readonly attribute RTCEncodedVideoFrameType type;
184197
readonly attribute unsigned long long timestamp;
185198
attribute ArrayBuffer data;
186-
readonly attribute ArrayBuffer additionalData;
199+
RTCVideoFrameMetadata getMetadata();
187200
};
188201

189202
interface RTCEncodedAudioFrame {
190203
readonly attribute unsigned long long timestamp;
191204
attribute ArrayBuffer data;
192-
readonly attribute ArrayBuffer additionalData;
205+
RTCAudioFrameMetadata getMetadata();
193206
};
194207

195-
196-
// New fields in RTCConfiguration
197-
dictionary RTCConfiguration {
198-
...
199-
boolean forceEncodedVideoInsertableStreams = false;
200-
boolean forceEncodedAudioInsertableStreams = false;
208+
// New field in RTCConfiguration
209+
partial dictionary RTCConfiguration {
210+
boolean encodedInsertableStreams = false;
201211
};
202212

203213
// New methods for RTCRtpSender and RTCRtpReceiver
204-
interface RTCRtpSender {
205-
// ...
206-
RTCInsertableStreams createEncodedVideoStreams();
207-
RTCInsertableStreams createEncodedAudioStreams();
214+
partial interface RTCRtpSender {
215+
RTCInsertableStreams createEncodedStreams();
208216
};
209217

210-
interface RTCRtpReceiver {
211-
// ...
212-
RTCInsertableStreams createEncodedVideoStreams();
213-
RTCInsertableStreams createEncodedAudioStreams();
218+
partial interface RTCRtpReceiver {
219+
RTCInsertableStreams createEncodedStreams();
214220
};
215221

216222
</pre>
223+
224+
## Design considerations ##
225+
226+
This design is built upon the Streams API. This is a natural interface
227+
for stuff that can be considered a "sequence of objects", and has an ecosystem
228+
around it that allows some concerns to be handed off easily.
229+
230+
In particular:
231+
232+
* Sequencing comes naturally; streams are in-order entities.
233+
* With the Transferable Streams paradigm, changing what thread is doing
234+
the processing can be done in a manner that has been tested by others.
235+
* Since other users of Streams interfaces are going to deal with issues
236+
like efficient handover and WASM interaction, we can expect to leverage
237+
common solutions for these problems.
238+
239+
There are some challenges with the Streams interface:
240+
241+
* Queueing in response to backpressure isn't an appropriate reaction in a
242+
real-time environment. This can be mitigated at the sender by not queueing,
243+
preferring to discard frames or not generating them.
244+
* How to interface to congestion control signals, which travel in the
245+
opposite direction from the streams flow.
246+
* How to integrate error signalling and recovery, given that most of the
247+
time, breaking the pipeline is not an appropriate action.
248+
249+
These things may be solved by use of non-data "frames" (in the forward direction),
250+
by reverse streams of non-data "frames" (in the reverse direction), or by defining
251+
new interfaces based on events, promises or callbacks.
252+
253+
Experimentation with the prototype API seems to show that performance is
254+
adequate for real-time processing; the streaming part is not contributing
255+
very much to slowing down the pipelines.
256+
257+
## Alternatives to Streams ##
258+
One set of alternatives involve callback-based or event-based interfaces; those
259+
would require developing new interfaces that allow the relevant WebRTC
260+
objects to be visible in the worker context in order to do processing off
261+
the main thread. This would seem to be a significantly bigger specification
262+
and implementation effort.
263+
264+
Another path would involve specifying a worklet API, similar to the AudioWorklet,
265+
and specifying new APIs for connecting encoders and decoders to such worklets.
266+
This also seemed to involve a significantly larger set of new interfaces, with a
267+
correspondingly larger implementation effort, and would offer less flexibility
268+
in how the processing elements could be implemented.
269+
270+
271+
272+
273+

0 commit comments

Comments
 (0)