|
1 |
| -# Explainer |
| 1 | +# Explainer - WebRTC Insertable Stream Processing of Media |
2 | 2 |
|
3 |
| -TBD |
| 3 | +## Problem to be solved |
| 4 | + |
| 5 | +We need an API for processing media that: |
| 6 | +* Allows the processing to be specified by the user, not the browser |
| 7 | +* Allows the processed data to be handled by the browser as if it came through |
| 8 | + the normal pipeline |
| 9 | +* Allows the use of techniques like WASM to achieve effective processing |
| 10 | +* Allows the use of techniques like Workers to avoid blocking on the main thread |
| 11 | +* Does not negatively impact security or privacy of current communications |
| 12 | + |
| 13 | +## Approach |
| 14 | + |
| 15 | +This document builds on [WebCodecs](https://github.com/pthatcherg/web-codecs/), and tries to unify the concepts from there with the existing RTCPeerConnection API in order to build an API that is: |
| 16 | + |
| 17 | +* Familiar to existing PeerConnection users |
| 18 | +* Able to support user defined component wrapping and replacement |
| 19 | +* Able to support high performance user-specified transformations |
| 20 | + |
| 21 | +The central component of the API is the concept (inherited from WebCodecs) of a component’s main role being a TransformStream (part of the WHATWG Streams spec). |
| 22 | + |
| 23 | +A PeerConnection in this model is a bunch of TransformStreams, connected together into a network that provides the functions expected of it. In particular: |
| 24 | + |
| 25 | +* MediaStreamTrack contains a TransformStream (input & output: Media samples) |
| 26 | +* RTPSender contains a TransformStream (input: Media samples, output: RTP packets) |
| 27 | +* RTPReceiver contains a TransformStream (input: RTP packets, output: Media samples) |
| 28 | + |
| 29 | + |
| 30 | +RTPSender and RTPReceiver are composable objects - a sender has an encoder and a |
| 31 | +RTP packetizer, which pipe into each other; a receiver has an RTP depacketizer |
| 32 | +and a decoder. |
| 33 | + |
| 34 | + |
| 35 | +The encoder is an object that takes a Stream(raw frames) and emits a Stream(encoded frames). It will also have API surface for non-data interfaces like asking the encoder to produce a keyframe, or setting the normal keyframe interval, target bitrate and so on. |
| 36 | + |
| 37 | +## Use cases |
| 38 | +The use cases for this API include the following cases from the [WebRTC NV use cases](https://www.w3.org/TR/webrtc-nv-use-cases/) document: |
| 39 | +* Funny Hats (pre-processing inserted before codec) |
| 40 | +* Background removal |
| 41 | +* Voice processing |
| 42 | +* Secure Web conferencing with trusted Javascript (from [the pull request](https://github.com/w3c/webrtc-nv-use-cases/pull/49)) |
| 43 | + |
| 44 | +In addition, the following use cases can be addressed because the codec's dynamic parameters are exposed to the application): |
| 45 | +* Dynamic control of codec parameters |
| 46 | +* App-defined bandwidth distribution between tracks |
| 47 | + |
| 48 | +When it's possible to replace the returned codec with a completely custom codec, we can address: |
| 49 | +* Custom codec for special purposes |
| 50 | + |
| 51 | + |
| 52 | +## Code examples |
| 53 | + |
| 54 | +In order to insert your own processing in the media pipeline, do the following: |
| 55 | + |
| 56 | +1. Declare a function that does what you want to a single frame. |
| 57 | +<pre> |
| 58 | +function mungeFunction(frame) { … } |
| 59 | +</pre> |
| 60 | +2. Set up a transform stream that will apply this function to all frames passed to it. |
| 61 | +<pre> |
| 62 | +var munger = new TransformStream({transformer: mungeFunction}); |
| 63 | +</pre> |
| 64 | +3. Create a function that will take the original encoder, connect it to the transformStream in an appropriate way, and return an object that can be treated by the rest of the system as if it is an encoder: |
| 65 | +<pre> |
| 66 | +function installMunger(encoder, context) { |
| 67 | + encoder.readable.pipeTo(munger.writable); |
| 68 | + var wrappedEncoder = { readable: munger.readable, |
| 69 | + writable: encoder.writable }; |
| 70 | + return wrappedEncoder; |
| 71 | +} |
| 72 | +</pre> |
| 73 | +4. Tell the PeerConnection to call this function whenever an encoder is created: |
| 74 | +<pre> |
| 75 | +pc = new RTCPeerConnection ({ |
| 76 | + encoderFactory: installMunger; |
| 77 | +}); |
| 78 | +</pre> |
| 79 | + |
| 80 | +Or do it all using a deeply nested set of parentheses: |
| 81 | + |
| 82 | +<pre> |
| 83 | +pc = new RTCPeerConnection( { |
| 84 | + encoderFactory: (encoder) => { |
| 85 | + var munger = new TransformStream({ |
| 86 | + transformer: munge |
| 87 | + }); |
| 88 | + var wrapped = { readable: munger.readable, |
| 89 | + writable: encoder.writable }; |
| 90 | + encoder.readable.pipeTo(munger.writable); |
| 91 | + return wrappedEncoder; |
| 92 | + } |
| 93 | +}); |
| 94 | +</pre> |
| 95 | + |
| 96 | +The PC will then connect the returned object’s “writable” to the media input, and the returned object’s “readable” to the RTP packetizer’s input. |
| 97 | + |
| 98 | +When the processing is to be done in a worker, we let the factory method pass the pipes to the worker: |
| 99 | +<pre> |
| 100 | +pc = new RTCPeerConnection({ |
| 101 | + encoderFactory: (encoder) => { |
| 102 | + var munger = new TransformStream({ transformer: munge }); |
| 103 | + output = encoder.readable.pipeThrough(munger.writable); |
| 104 | + worker.postMessage([‘munge this’, munger], [munger]); |
| 105 | + Return { readable: output, writable: encoder.writable }; |
| 106 | + } |
| 107 | + })}); |
| 108 | +</pre> |
| 109 | + |
| 110 | +## Implementation efficiency opportunities |
| 111 | +The API outlined here gives the implementation lots of opportunity to optimize. For instance, when the UA discovers that it has been asked to run a pipe from an internal encoder to an internal RTP sender, it has no need to convert the data into the Javascript format, since it is never going to be exposed to Javascript, and does not need to switch to the thread on which Javascript is running. |
| 112 | + |
| 113 | +Similarly, piping from a MediaStreamTrack created on the main thread to a processing step that is executing in a worker has no need to touch the main thread; the media buffers can be piped directly to the worker. |
0 commit comments