Skip to content

Commit a24f157

Browse files
committed
Merge branch 'master' of github.com:alvestrand/webrtc-media-streams
2 parents 695c034 + 673ef2a commit a24f157

File tree

4 files changed

+1904
-11
lines changed

4 files changed

+1904
-11
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22

33
(not to be confused with the MediaStreams API)
44

5-
This repository contains a '''proposal''' for an API that allows the
5+
This repository contains a **proposal** for an API that allows the
66
insertion of user-defined processing steps in the pipeline that
77
handles media in a WebRTC context.
88

99
In order to allow such processing, it defines a number of extensions
10-
to the objects defined in WEBRTC-PC and MEDIACAPTURE-MAIN, and also
11-
draws upon definitions from WEBRTC-CODEC.
10+
to the objects defined in [WEBRTC-PC](https://w3c.github.io/webrtc-pc/) and [MEDIACAPTURE-MAIN](https://w3c.github.io/mediacapture-main/), and also
11+
draws upon definitions from [WEBRTC-CODEC](https://github.com/WICG/web-codecs).
1212

1313

explainer.md

Lines changed: 112 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,113 @@
1-
# Explainer
1+
# Explainer - WebRTC Insertable Stream Processing of Media
22

3-
TBD
3+
## Problem to be solved
4+
5+
We need an API for processing media that:
6+
* Allows the processing to be specified by the user, not the browser
7+
* Allows the processed data to be handled by the browser as if it came through
8+
the normal pipeline
9+
* Allows the use of techniques like WASM to achieve effective processing
10+
* Allows the use of techniques like Workers to avoid blocking on the main thread
11+
* Does not negatively impact security or privacy of current communications
12+
13+
## Approach
14+
15+
This document builds on [WebCodecs](https://github.com/pthatcherg/web-codecs/), and tries to unify the concepts from there with the existing RTCPeerConnection API in order to build an API that is:
16+
17+
* Familiar to existing PeerConnection users
18+
* Able to support user defined component wrapping and replacement
19+
* Able to support high performance user-specified transformations
20+
21+
The central component of the API is the concept (inherited from WebCodecs) of a component’s main role being a TransformStream (part of the WHATWG Streams spec).
22+
23+
A PeerConnection in this model is a bunch of TransformStreams, connected together into a network that provides the functions expected of it. In particular:
24+
25+
* MediaStreamTrack contains a TransformStream (input & output: Media samples)
26+
* RTPSender contains a TransformStream (input: Media samples, output: RTP packets)
27+
* RTPReceiver contains a TransformStream (input: RTP packets, output: Media samples)
28+
29+
30+
RTPSender and RTPReceiver are composable objects - a sender has an encoder and a
31+
RTP packetizer, which pipe into each other; a receiver has an RTP depacketizer
32+
and a decoder.
33+
34+
35+
The encoder is an object that takes a Stream(raw frames) and emits a Stream(encoded frames). It will also have API surface for non-data interfaces like asking the encoder to produce a keyframe, or setting the normal keyframe interval, target bitrate and so on.
36+
37+
## Use cases
38+
The use cases for this API include the following cases from the [WebRTC NV use cases](https://www.w3.org/TR/webrtc-nv-use-cases/) document:
39+
* Funny Hats (pre-processing inserted before codec)
40+
* Background removal
41+
* Voice processing
42+
* Secure Web conferencing with trusted Javascript (from [the pull request](https://github.com/w3c/webrtc-nv-use-cases/pull/49))
43+
44+
In addition, the following use cases can be addressed because the codec's dynamic parameters are exposed to the application):
45+
* Dynamic control of codec parameters
46+
* App-defined bandwidth distribution between tracks
47+
48+
When it's possible to replace the returned codec with a completely custom codec, we can address:
49+
* Custom codec for special purposes
50+
51+
52+
## Code examples
53+
54+
In order to insert your own processing in the media pipeline, do the following:
55+
56+
1. Declare a function that does what you want to a single frame.
57+
<pre>
58+
function mungeFunction(frame) { … }
59+
</pre>
60+
2. Set up a transform stream that will apply this function to all frames passed to it.
61+
<pre>
62+
var munger = new TransformStream({transformer: mungeFunction});
63+
</pre>
64+
3. Create a function that will take the original encoder, connect it to the transformStream in an appropriate way, and return an object that can be treated by the rest of the system as if it is an encoder:
65+
<pre>
66+
function installMunger(encoder, context) {
67+
encoder.readable.pipeTo(munger.writable);
68+
var wrappedEncoder = { readable: munger.readable,
69+
writable: encoder.writable };
70+
return wrappedEncoder;
71+
}
72+
</pre>
73+
4. Tell the PeerConnection to call this function whenever an encoder is created:
74+
<pre>
75+
pc = new RTCPeerConnection ({
76+
encoderFactory: installMunger;
77+
});
78+
</pre>
79+
80+
Or do it all using a deeply nested set of parentheses:
81+
82+
<pre>
83+
pc = new RTCPeerConnection( {
84+
encoderFactory: (encoder) => {
85+
var munger = new TransformStream({
86+
transformer: munge
87+
});
88+
var wrapped = { readable: munger.readable,
89+
writable: encoder.writable };
90+
encoder.readable.pipeTo(munger.writable);
91+
return wrappedEncoder;
92+
}
93+
});
94+
</pre>
95+
96+
The PC will then connect the returned object’s “writable” to the media input, and the returned object’s “readable” to the RTP packetizer’s input.
97+
98+
When the processing is to be done in a worker, we let the factory method pass the pipes to the worker:
99+
<pre>
100+
pc = new RTCPeerConnection({
101+
encoderFactory: (encoder) => {
102+
var munger = new TransformStream({ transformer: munge });
103+
output = encoder.readable.pipeThrough(munger.writable);
104+
worker.postMessage([‘munge this’, munger], [munger]);
105+
Return { readable: output, writable: encoder.writable };
106+
}
107+
})});
108+
</pre>
109+
110+
## Implementation efficiency opportunities
111+
The API outlined here gives the implementation lots of opportunity to optimize. For instance, when the UA discovers that it has been asked to run a pipe from an internal encoder to an internal RTP sender, it has no need to convert the data into the Javascript format, since it is never going to be exposed to Javascript, and does not need to switch to the thread on which Javascript is running.
112+
113+
Similarly, piping from a MediaStreamTrack created on the main thread to a processing step that is executing in a worker has no need to touch the main thread; the media buffers can be piped directly to the worker.

index.bs

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ Markup Shorthands: css no
1414
<pre class='anchors'>
1515
spec: WEBRTC; urlPrefix: https://w3c.github.io/webrtc-pc/
1616
type: interface
17-
for: RTCRtpEncodingParameters; text: RTCRtpEncodingParameters; url: #dom
18-
-rtcrtpencodingparameters
17+
for: RTCRtpEncodingParameters; text: RTCRtpEncodingParameters; url: #dom-rtcrtpencodingparameters
1918
type: enum
2019
text: RTCPriorityType; url: #dom-rtcprioritytype
2120
type: attribute
@@ -30,5 +29,11 @@ The Streams definition doesn't use WebIDL much, but the WebRTC spec does.
3029
This specification shows the IDL extensions for WebRTC.
3130

3231
<pre class='idl'>
32+
callback EncoderDecorator = Encoder(Encoder encoder, Config config);
33+
callback DecoderDecorator = Decoder(Decoder encoder, Config config);
3334

35+
partial dictionary RTCConfiguration {
36+
EncoderDecorator encoderFactory;
37+
DecoderDecorator decoderFactory;
38+
};
3439
</pre>

0 commit comments

Comments
 (0)