What is the Real-time Transport Protocol (RTP)?
Real-time Transport Protocol (RTP) is a network standard designed for transmitting audio or video data that is optimized for consistent delivery of live data. It is used in internet telephony, Voice over IP and video telecommunication. It can be used for one-on-one calls (unicast) or in one-to-many conferences (multicast).
RTP was standardized by the Internet Engineering Task Force (IETF) in 1996 with Request for Comments (RFC) 1889. It was updated in 2003 by RFC 3550.
IETF designed RTP for sending live or real-time video over the internet. All network data is sent in discrete bunches, called packets. Because of the distributed nature of the internet, it is expected for some packets to arrive with different time spacings (called jitter), in the wrong order (called out-of-order delivery), or to not be delivered at all (called packet loss).
RTP can compensate for these issues without severely impacting the call quality. It favors the quick delivery of packets over ensuring all data is received. This helps the video stream to be consistent and always playing, instead of buffering or stopping playback.
To illustrate this difference, imagine a user wanted to watch a video on the internet. The video streaming service would use RTP to send the video data to their computer. If some of the data packets were lost, RTP would correct for this error and the video may lose a few frames or a fraction of a second of audio. This could be so brief as to be unnoticeable to the viewer.
If instead they wanted to save an exact copy of a video, using another protocol -- such as HTTP -- would download the video exactly. If any packets were lost, it would request the packet be re-sent, causing the download to go slower but be fully accurate.
RTP Control Protocol (RTCP) is used in conjunction with RTP to send information back to the sender about the media stream. RTCP is primarily used for the client to send quality of service (QoS) data, such as jitter, packet loss and round-trip time (RTT). The server may use this information to switch to a different codec or stream quality. This data can also be used for control signaling or to collect information about the participants when many are connected to the stream.
RTP does not define specific codecs or signaling and uses other standards for data types. It can use several signaling protocols such as session initiation protocol (SIP), H.323 or XMPP. The multimedia can be of almost any codec, including G.711, MP3, H.264 or MPEG-2.
What applications use the Real-time Transport Protocol?
RTP is typically used when a media stream needs to be delivered live or received by very many users simultaneously.
Voice over IP (VoIP) relies on RTP for media transmission. They will often use SIP to initiate and control the call and encrypt the call with SRTP. Some example VoIP servers that use RTP are Asterisk, 3CX and other PBX software.
Most internet-based audio and video conferencing services use RTP. These services often use RTP as the underlying media transmission method and add convenience features and standards on top. Some examples include Microsoft Teams, Apple FaceTime, Cisco WebEx and WhatsApp. Zoom conferences use a close derivative of RTP.
Real-time streaming protocol (RTSP) builds on RTP, and can be used to send video between a server and a client. VideoLAN is a popular RTSP server. Many security cameras also support sending video as RTSP to be saved by a video security server. Some live TV or streaming services implement RTSP for its ease of broadcasting to many viewers.
Most modern on-demand video streaming services have transitioned away from using RTP in favor of precaching and using dynamic adaptive streaming over HTTP (DASH).
What are the Real-time Transport Protocol's technical details?
RTP most often uses UDP packets. This is because UDP is designed for quick and simple data transmission without ensuring delivery. RTP can be used with TCP, but this is not recommended as the time-sensitive nature of RTP contrasts with TCP's emphasis on reliability at the expense of speed.
Any port number can be used with RTP. Generally, it will be in the high port range of 1024 to 65535. RTP will be an even-numbered port and RTCP will be the next odd-numbered port. The Internet Assigned Numbers Authority has registered port 5004 for RTP and 5005 for RTCP use, and many applications will use these as standards.
RTP packets include: a sequence number, which is used to detect lost packets; payload identification, which describes the specific media codec; frame indication, which marks the beginning and end of each IP frame; source identification, which identifies the originator of the frame; and intramedia synchronization, which uses timestamps to detect different delay jitter within a single stream and compensate for it.
Security vulnerabilities exist in improperly implemented RTP servers. RTP is not inherently encrypted or authenticated. If these were not enabled, it could leave the media stream open to recording, spoofing or man-in-the-middle attacks. It is therefore very important that VoIP systems that use RTP be properly configured and secured.
RTP is also vulnerable to distributed denial of service (DDOS) attacks that can bring down a media stream or the clients connecting to one. Specific services that use RTP may have their own vulnerabilities in the software.