|
2 | 2 | title: Introduction |
3 | 3 | --- |
4 | 4 |
|
5 | | -## Work in Progress 🚧 |
| 5 | +PulseBeam provides high-level, opinionated real-time media servers that allow |
| 6 | +you to build video, audio, and even generic real-time data (faster than |
| 7 | +WebSockets) to production as fast as possible with minimal friction. |
| 8 | + |
| 9 | +We use WebRTC as the underlying transport protocol. Unlike traditional WebRTC |
| 10 | +deployments, we made conscious choices to reduce complexity and aim for system |
| 11 | +stability. |
| 12 | + |
| 13 | +Here are a couple of notable choices (If you have feedback on this, you're very |
| 14 | +welcome to reach us—we want to hear from you!): |
| 15 | + |
| 16 | +1. SFU only |
| 17 | +2. No WebSockets |
| 18 | +3. No STUN & TURN Servers |
| 19 | +4. No WebRTC Port Range |
| 20 | +5. H264 and Opus codecs |
| 21 | + |
| 22 | +## SFU Only |
| 23 | + |
| 24 | +There are typically 3 foundational network architectures for WebRTC: |
| 25 | + |
| 26 | +- **P2P:** Direct client-to-client. Simple, but bandwidth and the number of |
| 27 | + connections explode with more participants. Not ideal for group calls. |
| 28 | +- **MCU:** Server mixes everyone’s streams. Easier for clients, but adds |
| 29 | + latency, heavy server load, and reduces flexibility. |
| 30 | +- **SFU:** Server forwards streams without heavy processing. Clients handle |
| 31 | + rendering, so latency stays low, quality stays high, and it scales well with |
| 32 | + many participants. |
| 33 | + |
| 34 | +PulseBeam chooses SFU only because it hits the sweet spot: scalable, |
| 35 | +low-latency, high-quality real-time media, and low CPU usage. |
| 36 | + |
| 37 | +## No WebSocket |
| 38 | + |
| 39 | +We don't use WebSockets. Instead, we categorize WebRTC signaling into 2 separate |
| 40 | +categories: |
| 41 | + |
| 42 | +- **Connection (low-frequency):** join, leave, reconnect |
| 43 | +- **Media (high-frequency):** media subscription, layout changes, audio level, |
| 44 | + stream quality |
| 45 | + |
| 46 | +We use HTTP for connection and data channels for media. The rationale here is to |
| 47 | +minimize the number of persistent connections from the client to the server. |
| 48 | +After the WebRTC connection is established through HTTP, the client has a |
| 49 | +persistent bidirectional connection directly to the server—we might as well use |
| 50 | +it for handling media. |
| 51 | + |
| 52 | +Aside from reducing the DevOps burden of configuring infra properly, data |
| 53 | +channels allow much higher signaling frequency and lower per-message latency. |
| 54 | +This also means a more responsive UI for the end-users. |
| 55 | + |
| 56 | +## No STUN & TURN Servers |
| 57 | + |
| 58 | +STUN servers are used to discover the public IP and port of clients. TURN |
| 59 | +servers are used as a fallback in case no direct connection can be made. |
| 60 | + |
| 61 | +We don't use STUN servers because PulseBeam is SFU only, and static |
| 62 | +configuration at server startup is preferred to avoid potential network |
| 63 | +failures. No TURN servers are needed because PulseBeam requires every deployment |
| 64 | +to have a fixed, addressable IPv4 or IPv6 address. We also support direct TCP |
| 65 | +connections. |
| 66 | + |
| 67 | +While we want to say this is a novel approach, it is very similar to what Google |
| 68 | +Meet has been doing in production for years. With no STUN and TURN servers, |
| 69 | +air-gap deployment is straightforward: just run the service binary. No external |
| 70 | +service is needed. |
| 71 | + |
| 72 | +## No WebRTC Port Range |
| 73 | + |
| 74 | +There's no 1:1 mapping between an OS UDP/TCP port and a WebRTC connection. |
| 75 | +Instead, we use a combination of the 5-tuple (src_ip, src_port, dst_ip, |
| 76 | +dst_port, proto) and WebRTC ICE metadata to multiplex a single port. Thus, we |
| 77 | +only require 2 ports for WebRTC: **udp/3478** and **tcp/443**. |
| 78 | + |
| 79 | +In theory, the server is able to serve much more than ~16k connections (the |
| 80 | +typical limit from ephemeral UDP port exhaustion). This is useful for packing |
| 81 | +more low-load connections like audio-only and/or data-channel-only. We haven't |
| 82 | +done any benchmarks for this particular scenario yet—stay tuned! |
| 83 | + |
| 84 | +## H264 and Opus Codecs |
| 85 | + |
| 86 | +To reduce the "signaling dance," which can disrupt the end-user experience, we |
| 87 | +want to avoid re-constructing media streams to different codecs in-flight. Thus, |
| 88 | +we allow only 1 codec per media type per room. This trades off some bandwidth |
| 89 | +efficiency compared to modern codecs in ideal cases, but we argue that most |
| 90 | +production systems live in a less-than-ideal world where we have to support |
| 91 | +clients with less hardware capability. |
| 92 | + |
| 93 | +For audio, Opus won easily. It is prevalent and robust. |
| 94 | + |
| 95 | +For video, the answer is complicated. There are H264, VP8, VP9, AV1 (of course, |
| 96 | +there are H265, H266, AV2 too). But the decision comes down to compatibility. |
| 97 | +H264 is chosen for now because it’s the only codec where you can actually count |
| 98 | +on hardware acceleration being present on almost any device—from security |
| 99 | +cameras to cheap TVs. By letting the hardware do the work, the device stays cool |
| 100 | +and the battery lasts longer. This ensures a smooth experience even on weaker |
| 101 | +devices, rather than forcing a low-end CPU to struggle with software decoding. |
0 commit comments