Real-time Transport Protocol

Real-time Transport Protocol
Communication protocol
AbbreviationRTP
PurposeDelivering audio and video
Developer(s)Audio-Video Transport Working Group of the IETF
IntroductionJanuary 1996; 29 years ago (1996-01)
Based onNetwork Voice Protocol[1]
RFC(s)RFC 1889, 3550, 3551

The Real-time Transport Protocol (RTP) is a network protocol for delivering audio and video over IP networks. RTP is used in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications including WebRTC, television services and web-based push-to-talk features.

RTP typically runs over User Datagram Protocol (UDP). RTP is used in conjunction with the RTP Control Protocol (RTCP). While RTP carries the media streams (e.g., audio and video), RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. RTP is one of the technical foundations of Voice over IP and in this context is often used in conjunction with a signaling protocol such as the Session Initiation Protocol (SIP) which establishes connections across the network.

RTP was developed by the Audio-Video Transport Working Group of the Internet Engineering Task Force (IETF) and first published in 1996 as RFC 1889 which was then superseded by RFC 3550 in 2003.[2]

Overview

Research on audio and video over packet-switched networks dates back to the early 1970s. The Internet Engineering Task Force (IETF) published RFC 741 in 1977 and began developing RTP in 1992,[1] and would go on to develop Session Announcement Protocol (SAP), the Session Description Protocol (SDP), and the Session Initiation Protocol (SIP).

RTP is designed for end-to-end, real-time transfer of streaming media. The protocol provides facilities for jitter compensation and detection of packet loss and out-of-order delivery, which are common, especially during UDP transmissions on an IP network. RTP allows data transfer to multiple destinations through IP multicast.[3] RTP is regarded as the primary standard for audio/video transport in IP networks and is used with an associated profile and payload format.[4] The design of RTP is based on the architectural principle known as application-layer framing where protocol functions are implemented in the application as opposed to the operating system's protocol stack.

Real-time multimedia streaming applications require timely delivery of information and often can tolerate some packet loss to achieve this goal. For example, loss of a packet in an audio application may result in loss of a fraction of a second of audio data, which can be made unnoticeable with suitable error concealment algorithms.[5] The Transmission Control Protocol (TCP), although standardized for RTP use,[6] is not normally used in RTP applications because TCP favors reliability over timeliness. Instead, the majority of the RTP implementations are built on the User Datagram Protocol (UDP).[5] Other transport protocols specifically designed for multimedia sessions are SCTP[7] and DCCP,[8] although, as of 2012, they were not in widespread use.[9]

RTP was developed by the Audio/Video Transport working group of the IETF standards organization. RTP is used in conjunction with other protocols such as H.323 and RTSP.[4] The RTP specification describes two protocols: RTP and RTCP. RTP is used for the transfer of multimedia data, and the RTCP is used to periodically send control information and QoS parameters.[10]

The data transfer protocol, RTP, carries real-time data. Information provided by this protocol includes timestamps (for synchronization), sequence numbers (for packet loss and reordering detection) and the payload format which indicates the encoded format of the data.[11] The control protocol, RTCP, is used for quality of service (QoS) feedback and synchronization between the media streams. The bandwidth of RTCP traffic compared to RTP is small, typically around 5%.[11][12]

RTP sessions are typically initiated between communicating peers using a signaling protocol, such as H.323, the Session Initiation Protocol (SIP), RTSP, or Jingle (XMPP). These protocols may use the Session Description Protocol to specify the parameters for the sessions.[13]

An RTP session is established for each multimedia stream. Audio and video streams may use separate RTP sessions, enabling a receiver to selectively receive components of a particular stream.[14] The RTP and RTCP design is independent of the transport protocol. Applications most typically use UDP with port numbers in the unprivileged range (1024 to 65535).[15] The Stream Control Transmission Protocol (SCTP) and the Datagram Congestion Control Protocol (DCCP) may be used when a reliable transport protocol is desired. The RTP specification recommends even port numbers for RTP and the use of the next odd port number for the associated RTCP session.[16]: 68  A single port can be used for RTP and RTCP in applications that multiplex the protocols.[17]

RTP is used by real-time multimedia applications such as voice over IP, audio over IP, WebRTC, Internet Protocol television, and professional video over IP including SMPTE 2022 and SMPTE 2110.

Profiles and payload formats

RTP is designed to carry a multitude of multimedia formats, which permits the development of new formats without revising the RTP standard. To this end, the information required by a specific application of the protocol is not included in the generic RTP header. For each class of application (e.g., audio, video), RTP defines a profile and associated payload formats.[10] Every instantiation of RTP in a particular application requires a profile and payload format specifications.[16]: 71 

The profile defines the codecs used to encode the payload data and their mapping to payload format codes in the protocol field Payload Type (PT) of the RTP header. Each profile is accompanied by several payload format specifications, each of which describes the transport of particular encoded data.[4] Examples of audio payload formats are G.711, G.723, G.726, G.729, GSM, QCELP, MP3, and DTMF, and examples of video payloads are H.261, H.263, H.264, H.265 and MPEG-1/MPEG-2.[18] The mapping of MPEG-4 audio/video streams to RTP packets is specified in RFC 3016, and H.263 video payloads are described in RFC 2429.[19]

Examples of RTP profiles include:

Packet header

RTP packets are created at the application layer and handed to the transport layer for delivery. Each unit of RTP media data created by an application begins with the RTP packet header.

RTP packet header
Offsets Octet 0 1 2 3
Octet Bit [a] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 0 Version P X CC M PT Sequence number
4 32 Timestamp
8 64 SSRC identifier
12 96 CSRC identifiers
...
12+4×CC 96+32×CC Profile-specific extension header ID Extension header length
16+4×CC 128+32×CC Extension header
...

The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application.[22] The fields in the header are as follows:

  • Version: (2 bits) Indicates the version of the protocol. Current version is 2.[23]
  • P (Padding): (1 bit) Used to indicate if there are extra padding bytes at the end of the RTP packet. Padding may be used to fill up a block of certain size, for example as required by an encryption algorithm. The last byte of the padding contains the number of padding bytes that were added (including itself).[16]: 12 [23]
  • X (Extension): (1 bit) Indicates presence of an extension header between the header and payload data. The extension header is application or profile specific.[23]
  • CC (CSRC count): (4 bits) Contains the number of CSRC identifiers (defined below) that follow the SSRC (also defined below).[16]: 12 
  • M (Marker): (1 bit) Signaling used at the application level in a profile-specific manner. If it is set, it means that the current data has some special relevance for the application.[16]: 13 
  • PT (Payload type): (7 bits) Indicates the format of the payload and thus determines its interpretation by the application. Values are profile specific and may be dynamically assigned.[24]
  • Sequence number: (16 bits) The sequence number is incremented for each RTP data packet sent and is to be used by the receiver to detect packet loss[3] and to accommodate out-of-order delivery. The initial value of the sequence number should be randomized to make known-plaintext attacks on Secure Real-time Transport Protocol more difficult.[16]: 13 
  • Timestamp: (32 bits) Used by the receiver to play back the received samples at appropriate time and interval. When several media streams are present, the timestamps may be independent in each stream.[b] The granularity of the timing is application specific. For example, an audio application that samples data once every 125 μs (8 kHz, a common sample rate in digital telephony) would use that value as its clock resolution. Video streams typically use a 90 kHz clock. The clock granularity is one of the details that is specified in the RTP profile for an application.[25]
  • SSRC: (32 bits) Synchronization source identifier uniquely identifies the source of a stream. The synchronization sources within the same RTP session will be unique.[16]: 15 
  • CSRC: (32 bits each, the number of entries is indicated by the CSRC count field) Contributing source IDs enumerate contributing sources to a stream that has been generated from multiple sources.[16]: 15 
  • Header extension: (optional, presence indicated by Extension field) The first 32-bit word contains a profile-specific identifier (16 bits) and a length specifier (16 bits) that indicates the length of the extension in 32-bit units, excluding the 32 bits of the extension header. The extension header data follows.[16]: 18 

Application design

A functional multimedia application requires other protocols and standards used in conjunction with RTP. Protocols such as SIP, Jingle, RTSP, H.225 and H.245 are used for session initiation, control and termination. Other standards, such as H.264, MPEG and H.263, are used for encoding the payload data as specified by the applicable RTP profile.[26]

An RTP sender captures the multimedia data, then encodes, frames and transmits it as RTP packets with appropriate timestamps and increasing timestamps and sequence numbers. The sender sets the payload type field in accordance with connection negotiation and the RTP profile in use. The RTP receiver detects missing packets and may reorder packets. It decodes the media data in the packets according to the payload type and presents the stream to its user.[26]

Standards documents

  • RFC 3550, Standard 64, RTP: A Transport Protocol for Real-Time Applications
  • RFC 3551, Standard 65, RTP Profile for Audio and Video Conferences with Minimal Control
  • RFC 4855, Media Type Registration of RTP Payload Formats
  • RFC 4856, Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences
  • RFC 7656, A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources
  • RFC 3190, RTP Payload Format for 12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio
  • RFC 6184, RTP Payload Format for H.264 Video
  • RFC 3640, RTP Payload Format for Transport of MPEG-4 Elementary Streams
  • RFC 6416, RTP Payload Format for MPEG-4 Audio/Visual Streams
  • RFC 2250, RTP Payload Format for MPEG1/MPEG2 Video

See also

Notes

  1. ^ Bits are ordered most significant to least significant; bit offset 0 is the most significant bit of the first octet. Octets are transmitted in network order. Bit transmission order is medium dependent.
  2. ^ RFC 7273 provides a means for signalling the relationship between media clocks of different streams.

References

  1. ^ a b Perkins 2003, p. 6.
  2. ^ Wright, Gavin. "What is the Real-time Transport Protocol (RTP)?". TechTarget. Retrieved 2022-11-10.
  3. ^ a b Daniel Hardy (2002). Network. De Boeck Université. p. 298.
  4. ^ a b c Perkins 2003, p. 55
  5. ^ a b Perkins 2003, p. 46
  6. ^ RFC 4571
  7. ^ Farrel, Adrian (2004). The Internet and its protocols. Morgan Kaufmann. p. 363. ISBN 978-1-55860-913-6.
  8. ^ Ozaktas, Haldun M.; Levent Onural (2007). THREE-DIMENSIONAL TELEVISION. Springer. p. 356. ISBN 978-3-540-72531-2.
  9. ^ Hogg, Scott. "What About Stream Control Transmission Protocol (SCTP)?". Network World. Archived from the original on August 30, 2014. Retrieved 2017-10-04.
  10. ^ a b Larry L. Peterson (2007). Computer Networks. Morgan Kaufmann. p. 430. ISBN 978-1-55860-832-0.
  11. ^ a b Perkins 2003, p. 56
  12. ^ Peterson & Davie 2007, p. 435
  13. ^ RFC 4566: SDP: Session Description Protocol, M. Handley, V. Jacobson, C. Perkins, IETF (July 2006)
  14. ^ Zurawski, Richard (2004). "RTP, RTCP and RTSP protocols". The industrial information technology handbook. CRC Press. pp. 28–7. ISBN 978-0-8493-1985-3.
  15. ^ Collins, Daniel (2002). "Transporting Voice by using IP". Carrier grade voice over IP. McGraw-Hill Professional. pp. 47. ISBN 978-0-07-136326-6.
  16. ^ a b c d e f g h i RFC 3550
  17. ^ Multiplexing RTP Data and Control Packets on a Single Port. IETF. April 2010. doi:10.17487/RFC5761. RFC 5761. Retrieved November 21, 2015.
  18. ^ Perkins 2003, p. 60
  19. ^ Chou, Philip A.; Mihaela van der Schaar (2007). Multimedia over IP and wireless networks. Academic Press. pp. 514. ISBN 978-0-12-088480-3.
  20. ^ Perkins 2003, p. 367
  21. ^ Breese, Finley (2010). Serial Communication over RTP/CDP. BoD - Books on Demand. pp. [1]. ISBN 978-3-8391-8460-8.
  22. ^ Peterson & Davie 2007, p. 430
  23. ^ a b c Peterson & Davie 2007, p. 431
  24. ^ Perkins 2003, p. 59
  25. ^ Peterson, p.432
  26. ^ a b Perkins 2003, pp. 11–13

Further reading