Skip to content

Commit 92579b7

Browse files
authored
Merge pull request #2 from guidou/master
Update explainer to reflect API implemented in Chromium
2 parents 6d66b0f + 85515d5 commit 92579b7

File tree

1 file changed

+173
-70
lines changed

1 file changed

+173
-70
lines changed

explainer.md

Lines changed: 173 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Explainer - WebRTC Insertable Stream Processing of Media
1+
# Explainer - WebRTC Insertable Streams
22

33
## Problem to be solved
44

@@ -10,104 +10,207 @@ We need an API for processing media that:
1010
* Allows the use of techniques like Workers to avoid blocking on the main thread
1111
* Does not negatively impact security or privacy of current communications
1212

13+
1314
## Approach
1415

15-
This document builds on [WebCodecs](https://github.com/pthatcherg/web-codecs/), and tries to unify the concepts from there with the existing RTCPeerConnection API in order to build an API that is:
16+
This document builds on concepts previously proposed by
17+
[WebCodecs](https://github.com/pthatcherg/web-codecs/), and applies them to the existing
18+
RTCPeerConnection API in order to build an API that is:
1619

1720
* Familiar to existing PeerConnection users
18-
* Able to support user defined component wrapping and replacement
21+
* Able to support insertion of user-defined components
1922
* Able to support high performance user-specified transformations
23+
* Able to support user defined component wrapping and replacement
2024

21-
The central component of the API is the concept (inherited from WebCodecs) of a component’s main role being a TransformStream (part of the WHATWG Streams spec).
22-
23-
A PeerConnection in this model is a bunch of TransformStreams, connected together into a network that provides the functions expected of it. In particular:
24-
25-
* MediaStreamTrack contains a TransformStream (input & output: Media samples)
26-
* RTPSender contains a TransformStream (input: Media samples, output: RTP packets)
27-
* RTPReceiver contains a TransformStream (input: RTP packets, output: Media samples)
28-
25+
The central idea is to expose components in an RTCPeerConnection as a collection of
26+
streams (as defined by the [WHATWG Streams API] (https://streams.spec.whatwg.org/)),
27+
which can be manipulated to introduce new components, or to wrap or replace existing
28+
components.
2929

30-
RTPSender and RTPReceiver are composable objects - a sender has an encoder and a
31-
RTP packetizer, which pipe into each other; a receiver has an RTP depacketizer
32-
and a decoder.
3330

31+
## Use cases
3432

35-
The encoder is an object that takes a Stream(raw frames) and emits a Stream(encoded frames). It will also have API surface for non-data interfaces like asking the encoder to produce a keyframe, or setting the normal keyframe interval, target bitrate and so on.
33+
The first use case to be supported by the API is the processing of encoded media, with
34+
end-to-end encryption intended as the motivating application. As such, the first version
35+
of the API will focus on this use case. However, the same approach can be used in future
36+
iterations to support additional use cases such as:
3637

37-
## Use cases
38-
The use cases for this API include the following cases from the [WebRTC NV use cases](https://www.w3.org/TR/webrtc-nv-use-cases/) document:
39-
* Funny Hats (pre-processing inserted before codec)
38+
* Funny Hats (processing inserted before encoding or after decoding)
4039
* Background removal
4140
* Voice processing
42-
* Secure Web conferencing with trusted Javascript (from [the pull request](https://github.com/w3c/webrtc-nv-use-cases/pull/49))
43-
44-
In addition, the following use cases can be addressed because the codec's dynamic parameters are exposed to the application):
4541
* Dynamic control of codec parameters
4642
* App-defined bandwidth distribution between tracks
43+
* Custom codecs for special purposes (in combination with WebCodecs)
4744

48-
When it's possible to replace the returned codec with a completely custom codec, we can address:
49-
* Custom codec for special purposes
45+
## Code Examples
5046

47+
1. Let an PeerConnection know that it should allow exposing the data flowing through it
48+
as streams.
5149

52-
## Code examples
50+
To ensure backwards compatibility, if the Insertable Streams API is not used, an
51+
RTCPeerConnection should work exactly as it did before the introduction of this API.
52+
Therefore, we explicitly let the RTCPeerConnection know that we want to use insertable
53+
streams. For example:
5354

54-
In order to insert your own processing in the media pipeline, do the following:
55-
56-
1. Declare a function that does what you want to a single frame.
57-
<pre>
58-
function mungeFunction(frame) { … }
59-
</pre>
60-
2. Set up a transform stream that will apply this function to all frames passed to it.
61-
<pre>
62-
var munger = new TransformStream({transformer: mungeFunction});
63-
</pre>
64-
3. Create a function that will take the original encoder, connect it to the transformStream in an appropriate way, and return an object that can be treated by the rest of the system as if it is an encoder:
6555
<pre>
66-
function installMunger(encoder, context) {
67-
encoder.readable.pipeTo(munger.writable);
68-
var wrappedEncoder = { readable: munger.readable,
69-
writable: encoder.writable };
70-
return wrappedEncoder;
71-
}
56+
let pc = new RTCPeerConnection({
57+
forceEncodedVideoInsertableStreams: true,
58+
forceEncodedAudioInsertableStreams: true
59+
});
7260
</pre>
73-
4. Tell the PeerConnection to call this function whenever an encoder is created:
61+
62+
2. Set up transform streams that perform some processing on data.
63+
64+
The following example negates every bit in the original data payload
65+
of an encoded frame and adds 4 bytes of padding.
66+
7467
<pre>
75-
pc = new RTCPeerConnection ({
76-
encoderFactory: installMunger;
77-
});
68+
let senderTransform = new TransformStream({
69+
start() {
70+
// Called on startup.
71+
},
72+
73+
async transform(chunk, controller) {
74+
let view = new DataView(chunk.data);
75+
// Create a new buffer with 4 additional bytes.
76+
let newData = new ArrayBuffer(chunk.data.byteLength + 4);
77+
let newView = new DataView(newData);
78+
79+
// Fill the new buffer with a negated version of all
80+
// the bits in the original frame.
81+
for (let i = 0; i < chunk.data.byteLength; ++i)
82+
newView.setInt8(i, ~view.getInt8(i));
83+
// Set the padding bytes to zero.
84+
for (let i = 0; i < 4; ++i)
85+
newView.setInt8(chunk.data.byteLength + i, 0);
86+
87+
// Replace the frame's data with the new buffer.
88+
chunk.data = newData;
89+
90+
// Send it to the output stream.
91+
controller.enqueue(chunk);
92+
},
93+
94+
flush() {
95+
// Called when the stream is about to be closed.
96+
}
97+
});
7898
</pre>
7999

80-
Or do it all using a deeply nested set of parentheses:
100+
3. Create a MediaStreamTrack, add it to the RTCPeerConnection and connect the
101+
Transform stream to the track's sender.
81102

82103
<pre>
83-
pc = new RTCPeerConnection( {
84-
encoderFactory: (encoder) => {
85-
var munger = new TransformStream({
86-
transformer: munge
87-
});
88-
var wrapped = { readable: munger.readable,
89-
writable: encoder.writable };
90-
encoder.readable.pipeTo(munger.writable);
91-
return wrappedEncoder;
92-
}
93-
});
104+
let stream = await navigator.mediaDevices.getUserMedia({video:true});
105+
let [track] = stream.getTracks();
106+
let videoSender = pc.addTrack(track, stream)
107+
let senderStreams = videoSender.getEncodedVideoStreams();
108+
109+
// Do ICE and offer/answer exchange.
110+
111+
senderStreams.readable
112+
.pipeThrough(senderTransform)
113+
.pipeTo(senderStreams.writable);
94114
</pre>
95115

96-
The PC will then connect the returned object’s “writable” to the media input, and the returned object’s “readable” to the RTP packetizer’s input.
116+
4. Do the corresponding operations on the receiver side.
97117

98-
When the processing is to be done in a worker, we let the factory method pass the pipes to the worker:
99118
<pre>
100-
pc = new RTCPeerConnection({
101-
encoderFactory: (encoder) => {
102-
var munger = new TransformStream({ transformer: munge });
103-
output = encoder.readable.pipeThrough(munger.writable);
104-
worker.postMessage([‘munge this’, munger], [munger]);
105-
Return { readable: output, writable: encoder.writable };
106-
}
107-
})});
119+
let pc = new RTCPeerConnection({forceEncodedVideoInsertableStreams: true});
120+
pc.ontrack = e => {
121+
let receivers = pc.getReceivers();
122+
let videoReceiver = null;
123+
for (const r of receivers) {
124+
if (r.track.kind == 'video')
125+
videoReceiver = r;
126+
}
127+
if (!videoReceiver)
128+
return;
129+
130+
let receiverTransform = new TransformStream({
131+
start() {},
132+
flush() {},
133+
async transform(chunk, controller) {
134+
// Reconstruct the original frame.
135+
let view = new DataView(chunk.data);
136+
137+
// Ignore the last 4 bytes
138+
let newData = new ArrayBuffer(chunk.data.byteLength - 4);
139+
let newView = new DataView(newData);
140+
141+
// Negate all bits in the incoming frame, ignoring the
142+
// last 4 bytes
143+
for (let i = 0; i < chunk.data.byteLength - 4; ++i)
144+
newView.setInt8(i, ~view.getInt8(i));
145+
146+
chunk.data = newData;
147+
controller.enqueue(chunk);
148+
},
149+
});
150+
151+
let receiverStreams = videoReceiver.createEncodedVideoStreams();
152+
receiverStreams.readable
153+
.pipeThrough(receiverTransform)
154+
.pipeTo(receiverStreams.writable);
155+
}
108156
</pre>
109157

110-
## Implementation efficiency opportunities
111-
The API outlined here gives the implementation lots of opportunity to optimize. For instance, when the UA discovers that it has been asked to run a pipe from an internal encoder to an internal RTP sender, it has no need to convert the data into the Javascript format, since it is never going to be exposed to Javascript, and does not need to switch to the thread on which Javascript is running.
158+
## API
159+
160+
The following are the IDL modifications proposed by this API.
161+
Future iterations will add additional operations following a similar pattern.
112162

113-
Similarly, piping from a MediaStreamTrack created on the main thread to a processing step that is executing in a worker has no need to touch the main thread; the media buffers can be piped directly to the worker.
163+
<pre>
164+
// New dictionary.
165+
dictionary RTCInsertableStreams {
166+
ReadableStream readable;
167+
WritableStream writable;
168+
};
169+
170+
// New enum for video frame types. Will eventually re-use the equivalent defined
171+
// by WebCodecs.
172+
enum RTCEncodedVideoFrameType {
173+
"empty",
174+
"key",
175+
"delta",
176+
};
177+
178+
// New interfaces to define encoded video and audio frames. Will eventually
179+
// re-use or extend the equivalent defined in WebCodecs.
180+
// The additionalData fields contain metadata about the frame and might be
181+
// eventually be exposed differently.
182+
interface RTCEncodedVideoFrame {
183+
readonly attribute RTCEncodedVideoFrameType type;
184+
readonly attribute unsigned long long timestamp;
185+
attribute ArrayBuffer data;
186+
readonly attribute ArrayBuffer additionalData;
187+
};
188+
189+
interface RTCEncodedAudioFrame {
190+
readonly attribute unsigned long long timestamp;
191+
attribute ArrayBuffer data;
192+
readonly attribute ArrayBuffer additionalData;
193+
};
194+
195+
196+
// New fields in RTCConfiguration
197+
dictionary RTCConfiguration {
198+
...
199+
boolean forceEncodedVideoInsertableStreams = false;
200+
boolean forceEncodedAudioInsertableStreams = false;
201+
};
202+
203+
// New methods for RTCRtpSender and RTCRtpReceiver
204+
interface RTCRtpSender {
205+
// ...
206+
RTCInsertableStreams createEncodedVideoStreams();
207+
RTCInsertableStreams createEncodedAudioStreams();
208+
};
209+
210+
interface RTCRtpReceiver {
211+
// ...
212+
RTCInsertableStreams createEncodedVideoStreams();
213+
RTCInsertableStreams createEncodedAudioStreams();
214+
};
215+
216+
</pre>

0 commit comments

Comments
 (0)