WebRTC - A Guide to Understanding How it Works

WebRTC is free open-source software that enables real-time communication in web browsers without needing plugins. The technology works through application programming interfaces (APIs) written in JavaScript.

So how does webRTC work? WebRTC protocols ensure that data transfers happen in real time and with low overhead. They also eliminate the need for integrations for quality control during network conditions that change.

Media Streaming

Media streaming is the process of sending audio or video files to a user over the Internet. This type of file delivery offers some benefits over downloading media files, such as playing the content immediately without waiting to download or storing it on a computer’s hard drive. It also allows users to watch various media types, including sports, movies, and music.

Streaming services can also provide interactive features like personalized playlists or video searches to help users find and watch what they want. Streaming systems can also monitor the types of content visitors are watching and provide further recommendations to improve the user experience.

The quality of media streams depends on the network connection, which can affect the overall user experience. For example, a slow Internet connection can cause the audio or video files to be dropped or lost while transmitted to the user. To avoid these problems, a high-speed Internet connection is required for streaming to work properly.

WebRTC enables media streaming by supporting Real-Time Transport Protocol (RTP). RTP is an Internet standard packet format for audio and video over IP networks. SRTP, which stands for Secure Real-time Transport Protocol, encrypts the data sent to and from WebRTC peers over an IP network. The protocol uses encryption keys generated by Datagram Transport Layer Security (DTLS).

To deliver media over a WebRTC-connected network, the browser’s web server manages and controls the data packets that arrive on the network from the server. This includes encoding optimization, dealing with packet loss, jitter, and error recovery, and implementing flow and congestion control algorithms.

Signaling

Signaling is the process by which WebRTC peers communicate with each other. It includes exchanging offer-answer SDP messages, media format and codec candidates, and ICE negotiation.

When a user or application initiates a WebRTC peer connection, it uses an intermediary server to handle the initial call setup. This server is responsible for generating a list of potential candidates for the call and determining if any devices support WebRTC and the appropriate protocol and codec to make the connection work.

After a few moments, the call is established. Peers can begin streaming audio and video and sharing screen space. The RTCPeerConnection interface enables clients to connect to a peer, maintain the connection, and terminate it when necessary.

In addition to providing a simple way for WebRTC clients to create direct connections with each other, the PeerConnection API also provides methods for handling SDP negotiation, codec implementations, Network Address Translation (NAT) traversal, packet loss, bandwidth management, and media transfer. Depending on how you build your WebRTC app, you should use a third-party signaling server to assist with some of these processes.

Several methods of signaling are available, including STUN servers and TURN servers. The most commonly used method is STUN, which translates a device’s IP address into a public IP address so that WebRTC can establish a peer connection.

When using this method, a client sends its IP address and port number to a STUN server. The server then passes this information to a WebRTC peer in response.

Once a peer is connected, it will receive incoming media streams and send its stream back to the media server if it is available. The media server will encrypt and deliver the stream to the other peer in a suitable resolution.

Another common approach is to use a signaling server that enables peer discovery and identifies a device’s identity. This service can be used for both metadata and real-time communication but is especially useful for one-to-many broadcasts and group sessions where the peers may not know who they are talking to.

Peer Connection

Peer connection is the key to establishing and maintaining real-time communications with WebRTC. Without it, your browser cannot transmit audio and video between peers in real time.

To establish peer connection, the two peers must discover one another through signaling or via a server that handles the signaling process on their behalf. The process requires a lot of infrastructure and can be a complex engineering challenge.

First, the RTCPeerConnection object must gather metadata about each peer’s media capabilities and possible network addresses. These data can be passed through the browser’s signaling server (or a “signaling broker”) or transferred directly to the other browser over a peer-to-peer network link.

Next, the RTCPeerConnection objects must acquire and register media streams from their peers. This includes audio, video, and application data–e.g., file transfer, text chat, and game updates.

Each stream is transmitted over a DataChannel, which uses either SRTP (for voice and video) or SCTP (for the data channel). The underlying network transport implements flow and control algorithms to probe bandwidth availability and optimizes the quality of each stream.

Once a stream has been acquired and registered, it can be sent to the remote peer as an offer, requesting that the other peer accept the connection. The offer is a string of SDP descriptions that describes the streaming media, codecs, and options supported by the browser, any ICE candidates already gathered, and other details about the connection between the two agents.

WebRTC – A Guide to Understanding How it Works

Mara Bragdon