You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Explainer - WebRTC Insertable Stream Processing of Media
1
+
# Explainer - WebRTC Insertable Streams
2
2
3
3
## Problem to be solved
4
4
@@ -10,104 +10,207 @@ We need an API for processing media that:
10
10
* Allows the use of techniques like Workers to avoid blocking on the main thread
11
11
* Does not negatively impact security or privacy of current communications
12
12
13
+
13
14
## Approach
14
15
15
-
This document builds on [WebCodecs](https://github.com/pthatcherg/web-codecs/), and tries to unify the concepts from there with the existing RTCPeerConnection API in order to build an API that is:
16
+
This document builds on concepts previously proposed by
17
+
[WebCodecs](https://github.com/pthatcherg/web-codecs/), and applies them to the existing
18
+
RTCPeerConnection API in order to build an API that is:
16
19
17
20
* Familiar to existing PeerConnection users
18
-
* Able to support user defined component wrapping and replacement
21
+
* Able to support insertion of user-defined components
19
22
* Able to support high performance user-specified transformations
23
+
* Able to support user defined component wrapping and replacement
20
24
21
-
The central component of the API is the concept (inherited from WebCodecs) of a component’s main role being a TransformStream (part of the WHATWG Streams spec).
22
-
23
-
A PeerConnection in this model is a bunch of TransformStreams, connected together into a network that provides the functions expected of it. In particular:
24
-
25
-
* MediaStreamTrack contains a TransformStream (input & output: Media samples)
26
-
* RTPSender contains a TransformStream (input: Media samples, output: RTP packets)
27
-
* RTPReceiver contains a TransformStream (input: RTP packets, output: Media samples)
28
-
25
+
The central idea is to expose components in an RTCPeerConnection as a collection of
26
+
streams (as defined by the [WHATWG Streams API] (https://streams.spec.whatwg.org/)),
27
+
which can be manipulated to introduce new components, or to wrap or replace existing
28
+
components.
29
29
30
-
RTPSender and RTPReceiver are composable objects - a sender has an encoder and a
31
-
RTP packetizer, which pipe into each other; a receiver has an RTP depacketizer
32
-
and a decoder.
33
30
31
+
## Use cases
34
32
35
-
The encoder is an object that takes a Stream(raw frames) and emits a Stream(encoded frames). It will also have API surface for non-data interfaces like asking the encoder to produce a keyframe, or setting the normal keyframe interval, target bitrate and so on.
33
+
The first use case to be supported by the API is the processing of encoded media, with
34
+
end-to-end encryption intended as the motivating application. As such, the first version
35
+
of the API will focus on this use case. However, the same approach can be used in future
36
+
iterations to support additional use cases such as:
36
37
37
-
## Use cases
38
-
The use cases for this API include the following cases from the [WebRTC NV use cases](https://www.w3.org/TR/webrtc-nv-use-cases/) document:
39
-
* Funny Hats (pre-processing inserted before codec)
38
+
* Funny Hats (processing inserted before encoding or after decoding)
40
39
* Background removal
41
40
* Voice processing
42
-
* Secure Web conferencing with trusted Javascript (from [the pull request](https://github.com/w3c/webrtc-nv-use-cases/pull/49))
43
-
44
-
In addition, the following use cases can be addressed because the codec's dynamic parameters are exposed to the application):
45
41
* Dynamic control of codec parameters
46
42
* App-defined bandwidth distribution between tracks
43
+
* Custom codecs for special purposes (in combination with WebCodecs)
47
44
48
-
When it's possible to replace the returned codec with a completely custom codec, we can address:
49
-
* Custom codec for special purposes
45
+
## Code Examples
50
46
47
+
1. Let an PeerConnection know that it should allow exposing the data flowing through it
48
+
as streams.
51
49
52
-
## Code examples
50
+
To ensure backwards compatibility, if the Insertable Streams API is not used, an
51
+
RTCPeerConnection should work exactly as it did before the introduction of this API.
52
+
Therefore, we explicitly let the RTCPeerConnection know that we want to use insertable
53
+
streams. For example:
53
54
54
-
In order to insert your own processing in the media pipeline, do the following:
55
-
56
-
1. Declare a function that does what you want to a single frame.
57
-
<pre>
58
-
function mungeFunction(frame) { … }
59
-
</pre>
60
-
2. Set up a transform stream that will apply this function to all frames passed to it.
61
-
<pre>
62
-
var munger = new TransformStream({transformer: mungeFunction});
63
-
</pre>
64
-
3. Create a function that will take the original encoder, connect it to the transformStream in an appropriate way, and return an object that can be treated by the rest of the system as if it is an encoder:
65
55
<pre>
66
-
function installMunger(encoder, context) {
67
-
encoder.readable.pipeTo(munger.writable);
68
-
var wrappedEncoder = { readable: munger.readable,
69
-
writable: encoder.writable };
70
-
return wrappedEncoder;
71
-
}
56
+
let pc = new RTCPeerConnection({
57
+
forceEncodedVideoInsertableStreams: true,
58
+
forceEncodedAudioInsertableStreams: true
59
+
});
72
60
</pre>
73
-
4. Tell the PeerConnection to call this function whenever an encoder is created:
61
+
62
+
2. Set up transform streams that perform some processing on data.
63
+
64
+
The following example negates every bit in the original data payload
65
+
of an encoded frame and adds 4 bytes of padding.
66
+
74
67
<pre>
75
-
pc = new RTCPeerConnection ({
76
-
encoderFactory: installMunger;
77
-
});
68
+
let senderTransform = new TransformStream({
69
+
start() {
70
+
// Called on startup.
71
+
},
72
+
73
+
async transform(chunk, controller) {
74
+
let view = new DataView(chunk.data);
75
+
// Create a new buffer with 4 additional bytes.
76
+
let newData = new ArrayBuffer(chunk.data.byteLength + 4);
77
+
let newView = new DataView(newData);
78
+
79
+
// Fill the new buffer with a negated version of all
80
+
// the bits in the original frame.
81
+
for (let i = 0; i < chunk.data.byteLength; ++i)
82
+
newView.setInt8(i, ~view.getInt8(i));
83
+
// Set the padding bytes to zero.
84
+
for (let i = 0; i < 4; ++i)
85
+
newView.setInt8(chunk.data.byteLength + i, 0);
86
+
87
+
// Replace the frame's data with the new buffer.
88
+
chunk.data = newData;
89
+
90
+
// Send it to the output stream.
91
+
controller.enqueue(chunk);
92
+
},
93
+
94
+
flush() {
95
+
// Called when the stream is about to be closed.
96
+
}
97
+
});
78
98
</pre>
79
99
80
-
Or do it all using a deeply nested set of parentheses:
100
+
3. Create a MediaStreamTrack, add it to the RTCPeerConnection and connect the
101
+
Transform stream to the track's sender.
81
102
82
103
<pre>
83
-
pc = new RTCPeerConnection( {
84
-
encoderFactory: (encoder) => {
85
-
var munger = new TransformStream({
86
-
transformer: munge
87
-
});
88
-
var wrapped = { readable: munger.readable,
89
-
writable: encoder.writable };
90
-
encoder.readable.pipeTo(munger.writable);
91
-
return wrappedEncoder;
92
-
}
93
-
});
104
+
let stream = await navigator.mediaDevices.getUserMedia({video:true});
105
+
let [track] = stream.getTracks();
106
+
let videoSender = pc.addTrack(track, stream)
107
+
let senderStreams = videoSender.getEncodedVideoStreams();
108
+
109
+
// Do ICE and offer/answer exchange.
110
+
111
+
senderStreams.readable
112
+
.pipeThrough(senderTransform)
113
+
.pipeTo(senderStreams.writable);
94
114
</pre>
95
115
96
-
The PC will then connect the returned object’s “writable” to the media input, and the returned object’s “readable” to the RTP packetizer’s input.
116
+
4. Do the corresponding operations on the receiver side.
97
117
98
-
When the processing is to be done in a worker, we let the factory method pass the pipes to the worker:
99
118
<pre>
100
-
pc = new RTCPeerConnection({
101
-
encoderFactory: (encoder) => {
102
-
var munger = new TransformStream({ transformer: munge });
let pc = new RTCPeerConnection({forceEncodedVideoInsertableStreams: true});
120
+
pc.ontrack = e => {
121
+
let receivers = pc.getReceivers();
122
+
let videoReceiver = null;
123
+
for (const r of receivers) {
124
+
if (r.track.kind == 'video')
125
+
videoReceiver = r;
126
+
}
127
+
if (!videoReceiver)
128
+
return;
129
+
130
+
let receiverTransform = new TransformStream({
131
+
start() {},
132
+
flush() {},
133
+
async transform(chunk, controller) {
134
+
// Reconstruct the original frame.
135
+
let view = new DataView(chunk.data);
136
+
137
+
// Ignore the last 4 bytes
138
+
let newData = new ArrayBuffer(chunk.data.byteLength - 4);
139
+
let newView = new DataView(newData);
140
+
141
+
// Negate all bits in the incoming frame, ignoring the
142
+
// last 4 bytes
143
+
for (let i = 0; i < chunk.data.byteLength - 4; ++i)
144
+
newView.setInt8(i, ~view.getInt8(i));
145
+
146
+
chunk.data = newData;
147
+
controller.enqueue(chunk);
148
+
},
149
+
});
150
+
151
+
let receiverStreams = videoReceiver.createEncodedVideoStreams();
152
+
receiverStreams.readable
153
+
.pipeThrough(receiverTransform)
154
+
.pipeTo(receiverStreams.writable);
155
+
}
108
156
</pre>
109
157
110
-
## Implementation efficiency opportunities
111
-
The API outlined here gives the implementation lots of opportunity to optimize. For instance, when the UA discovers that it has been asked to run a pipe from an internal encoder to an internal RTP sender, it has no need to convert the data into the Javascript format, since it is never going to be exposed to Javascript, and does not need to switch to the thread on which Javascript is running.
158
+
## API
159
+
160
+
The following are the IDL modifications proposed by this API.
161
+
Future iterations will add additional operations following a similar pattern.
112
162
113
-
Similarly, piping from a MediaStreamTrack created on the main thread to a processing step that is executing in a worker has no need to touch the main thread; the media buffers can be piped directly to the worker.
163
+
<pre>
164
+
// New dictionary.
165
+
dictionary RTCInsertableStreams {
166
+
ReadableStream readable;
167
+
WritableStream writable;
168
+
};
169
+
170
+
// New enum for video frame types. Will eventually re-use the equivalent defined
171
+
// by WebCodecs.
172
+
enum RTCEncodedVideoFrameType {
173
+
"empty",
174
+
"key",
175
+
"delta",
176
+
};
177
+
178
+
// New interfaces to define encoded video and audio frames. Will eventually
179
+
// re-use or extend the equivalent defined in WebCodecs.
180
+
// The additionalData fields contain metadata about the frame and might be
0 commit comments