The increasing popularity of multimedia streaming applications introduces new challenges in content distribution. Web-initiated multimedia streams typically experience high start-up delay, due to large protocol overheads and the poor delay, throughput, and loss properties of the Internet. Internet service providers can improve performance by caching the initial segment (the prefix) of popular streams at proxies near the requesting clients. The proxy can initiate transmission to the client while simultaneously requesting the remainder of the stream from the server. This paper analyzes the challenges of realizing a prefix-caching service in the context of the IETF's Real-Time Streaming Protocol (RTSP), a multimedia streaming protocol that derives from HTTP. We describe how to exploit existing RTSP features, such as the Range header, and how to avoid several round-trip delays by caching protocol information at the proxy. Based on our experiences, we propose extensions to RTSP that would ease the development of new multimedia proxy services. In addition, we discuss how caching the partial contents of multimedia streams introduces new challenges in cache coherency and feedback control. Then, we briefly present our preliminary implementation of prefix caching on a Linux-based PC, and describe how the proxy interoperates with the RealNetworks server and client.
The popularity of Internet multimedia applications has grown dramatically in the past several years, spurred by the penetration of the Web and the increasing capacity of backbone and local-access networks. Although HTTP can be used to transfer audio and video content, most multimedia transmissions are simply initiated on the Web. Then, the client's player contacts the multimedia server using a different set of protocols that are better suited to streaming applications. For example, the RealNetworks client and server communicate using the Real-Time Streaming Protocol (RTSP) , an IETF draft standard that derives from HTTP. RTSP is used in a number of commercial streaming media applications, including the RealNetworks client and server.
Despite the advent of new multimedia protocols, today's audio and video applications typically experience high start-up delay, due to protocol overheads and the poor delay, throughput, and loss properties of the Internet. The route from the server to the client often traverses multiple ISP domains, giving the client's service provider little control over the end-to-end performance. Instead, response time can be reduced by caching popular streams at proxies near the requesting clients (e.g., at the head-end of the access network). The ISP can carefully provision the proxy-client path for good quality-of-service, and protect the client from the vagaries of the server-proxy path. Although proxy caching of text and image data has been widely explored in the past several years, proxy services for multimedia streams introduce new challenges - very large objects, strict timing requirements, and new protocols.
The large size of most multimedia streams makes conventional Web caching techniques inappropriate. Rather than storing the entire contents of a large audio or video clip, the proxy could cache a set of frames from the beginning of the stream (i.e., the prefix of the clip) . This allows the proxy to initiate transmission to the client while simultaneously requesting the remainder of the stream (the suffix) from the server. The prefix could be cached while satisfying a client request, prefetched from the server upon access to the enclosing Web page, or pushed explicitly by the server. The prefix should be large enough to hide the round-trip delays, jitter, and small time-scale fluctuations in server-proxy throughput (e.g., 5-second prefix). Caching a larger prefix reduces the amount of traffic between the server and the proxy on subsequent client requests, and allows the proxy to support additional functions, such as transcoding or workahead transmission, without introducing client delay.
For ease of deployment, multimedia proxy services, such as prefix caching, should not require changes to existing client/server software or network mechanisms and should operate in the context of standard protocols. This paper investigates the challenges of realizing the prefix-caching service in the context of the RTP, RTSP, and RTCP family of IETF protocols. Section 2 briefly reviews the RTP, RTCP, and RTSP protocols. In Section 3, we describe how to exploit existing RTSP features, such as the Range header, and how to avoid several round-trip delays by caching protocol information at the proxy. Then, in Section 4 we discuss how caching partial contents at the proxy introduces new challenges in cache coherency and feedback control. Drawing on the protocol analysis, Section 5 briefly describes the architecture and preliminary implementation of our prefix-caching proxy on a Linux-based PC. We used the RealNetworks G2 server and player as a source and sink for testing the proxy. To aid other researchers in experimenting with multimedia proxy services, we describe how to coax the RealNetworks implementation of RTSP, including how to coax the RealServer to stream with RTP and RTCP, rather than the proprietary RDT protocol. Section 6 concludes the paper with a discussion of future research directions.
The paper complements recent work on multimedia caching and proxy services. Earlier research on multimedia caching proposed storing a sliding interval of successive frames to satisfy client requests that arrive close together in time [3,4,5]. Other recent work proposed having the proxy cache a fixed subset of frames, such as the prefix of the stream or a subset of frames in high-bandwidth regions of the stream, to reduce the overhead of transmitting to the client [6,7,2,8,9]. Our paper complements these studies by focusing on the protocol issues that arise in caching partial contents. Other recent work has introduced multimedia proxy services that operate within existing streaming protocols, including systems that perform transcoding, retransmission, workahead smoothing, and ad insertion [10,11,12,13]. We complement this work by focusing on the unique challenges of performing prefix caching at a proxy.
The RTP, RTCP, and RTSP suite of protocols supports multimedia streaming in the Internet, and serves as the starting point for our work on prefix caching.
RTP  provides the basic functionality for transferring real-time data over packet networks. Since RTP does not include mechanisms for reliable delivery or flow control, transport of RTP packets must rely on underlying protocols such as UDP and TCP. RTP typically runs over UDP, though TCP is sometimes used for reliable transport or to stream across firewalls that discard UDP packets. RTP provides source identification (randomly chosen SSRC identifier), payload type identification (to signal the appropriate decoding and playback information to the client), sequence numbering (for ordering packets and detecting losses), and timestamping (to control the playback time and measure jitter). Data packets contain a generic RTP header with these fields, as well as payload-specific information to improve the quality of delivery for specific media (e.g., MPEG). Interpretation of some header fields, such as the timestamp, is payload dependent. The RTP header may also identify contributing sources (CSRCs) for the payload carried in the packet; such a list is typically inserted by mixers or translators.
RTCP  monitors the delivery of RTP packets. The protocol provides feedback on the quality of data distribution, establishes an identification (CNAME) for each participant, scales the control packet transmission with the number of participants (to avoid generating excessive feedback traffic in large multicast groups), and provides minimal session control information. RTCP packets consist of source descriptors, sender reports, receiver reports, BYE packets (signifying the end of participation in a session), and APP packets (for application-specific functions). Receiver reports include the stream's SSRC, the fraction of RTP packets lost, the sequence number of the last RTP packet received, and the packet interarrival jitter. Senders can use this information to modify their transmission rates or to switch to a different encoder. The sender reports include the stream's SSRC, the sequence number of the last RTP packet sent, the wallclock time of the last transmission, and the number of RTP packets and bytes sent. The client can use the RTP timestamp and wallclock time information for media synchronization.
RTSP  coordinates the delivery of multimedia files, acting as a ``VCR remote control.'' RTSP typically runs over TCP, though UDP can also be used. Though conceptually similar to HTTP/1.1 , RTSP is stateful and has several different methods. The OPTIONS method inquires about server capabilities (e.g., RTSP version number, supported methods, etc.), and the DESCRIBE method inquires about the properties of a particular file (e.g., Last-Modified time and session description information, typically using SDP ). Client and server state machines are created with the SETUP method, which also negotiates transport parameters (e.g., RTP over unicast UDP on a particular port). The client sends a SETUP message for each stream (e.g., audio and video) in the file. Streaming of the file is initiated with the PLAY method, which can include a Range header to control the playback point. The TEARDOWN method terminates the session, releasing the resources at the server and client sites. The protocol also includes a number of recommended or optional methods.
In this section, we describe how to reduce client delay by caching protocol information at the proxy, and how to use the RTSP Range header to fetch the suffix of the stream. The discussion draws on Figure 1, which illustrates the handling of RTSP messages when the proxy has cached the prefix of each stream (e.g., audio and video), the description of the presentation, and the options supported by the server.
The handling of client requests depends on whether or not the proxy has cached the prefix and the related RTSP information. The client uses the OPTIONS message to learn which methods are supported by the server. The server (and therefore the proxy) should support the basic methods such as OPTIONS, DESCRIBE, SETUP, PLAY, and TEARDOWN. The proxy can respond directly to the client's OPTIONS request, avoiding the delay in contacting the server, as shown in Figure 1. In particular, the proxy could respond with a list of methods that were cached during a previous interaction with the server. Alternatively, the proxy could respond with a default list of basic methods. In either case, the proxy may occasionally send an incorrect list of methods (e.g., if the server ceases to support some method). If the client later attempts to invoke the unsupported method, the server would simply respond with an error message (response code 405 ``Method not allowed'') that includes the current list of methods, which the proxy can cache and forward to the client. Note that this error would not cause streaming to fail.
The DESCRIBE message is used to retrieve information about the particular presentation, which consists of one or more underlying streams (e.g., audio, video). Like the OPTIONS message, the DESCRIBE message is optional and does not affect the RTSP state machine at the client or the server. If the description resides in the cache, the proxy can respond directly to the client; otherwise, the proxy must contact the server. However, note that the cached DESCRIBE information may not be up-to-date. The proxy can employ a variety of techniques to reduce the likelihood of stale information, as discussed in Section 4.1.
The SETUP request is used to negotiate the transport parameters, including the transport protocol (e.g., RTP) and the TCP/UDP port numbers. Upon receiving the client SETUP message, port numbers are generated for the proxy's end of the RTP and RTCP connections, and a session identifier is selected. The port numbers and session identifier are sent to the client. In the meantime, the proxy generates a separate SETUP message to the server. Each stream in the multimedia presentation results in separate connections on both the proxy-client and server-proxy paths, and consist of TCP or UDP connections for RTP and RTCP. The proxy could request that the server stream the RTP packets over TCP (or a related semi-reliable transport protocol), since the cached prefix could hide transient periods of TCP congestion where the server does not send packets fast enough for continuous playback. To coordinate the transfer of RTP and RTCP messages, the proxy must maintain a mapping table to direct messages to the appropriate outgoing connection (and with the appropriate session identifier), as discussed in Section 5.1.
The client can send the PLAY message after receiving the proxy response to the SETUP request. If the prefix is cached, the proxy can respond immediately to the PLAY request and initiate the streaming of RTP and RTCP messages to the client. The client can begin playback, without waiting for the proxy to communicate with the server. Note that the proxy may or may not have received the SETUP response from the server, depending on the delay on the server-proxy path. Once the proxy has received a SETUP response from the server, indicating the server's session identifier, the proxy can send the PLAY request to the server, with the appropriate Range header, as discussed in more detail in Section 3.2. After transmitting the prefix of the stream to the client, the proxy can start sending the RTP packets in the suffix retrieved from the server.
Depending on the size of the prefix, the proxy may decide to wait longer before asking the server to transmit the suffix. Fetching the suffix too early would require extra storage resources at the proxy, and may result in extra server load and network traffic if the client eventually pauses or stops the transfer of the presentation. Yet, the suffix should be requested sufficiently ahead of when the proxy would need to start transmitting this data to the client. Requesting the suffix a bit early allows the proxy to hide the server-proxy round-trip delay (which can be estimated from RTCP reports), or additional delays for other functions such as smoothing or transcoding.
When the entire stream resides in the cache, the proxy acts as a server in responding to the client PLAY request. When the cache does not contain any portion of the requested resource, the proxy forwards the PLAY request to the server and simply shuttles messages back and forth between the server and the client, acting as an application-level router. The operation of the proxy becomes more interesting when the cache stores only the prefix of the stream. In this case, the proxy can reply to the client PLAY request and initiate transmission of RTP and RTCP messages to the client, while requesting the suffix from the server. Fetching the suffix requires the proxy to initiate a Range request for the appropriate portion of the stream. As part of linking the prefix and suffix, the proxy must specify the appropriate Range and handle inaccuracies in the server response.
As part of caching multimedia content, the proxy keeps track of the size of the prefix in terms of the timestamps in the RTP packets. The RTSP Range request is defined for both SMPTE Relative timestamps and NPT (Normal Play Time). SMPTE has the format hours:minutes:seconds:frames.subframes with the origin at the beginning of the clip. In contrast, NPT time consists of a decimal fraction, where the left part of the decimal may be expressed in either hours, minutes or seconds, and the right part of the decimal point measures fractions of a seconds. These formats require the proxy to convert the RTP timestamps into time, by applying the payload-specific clock rate . According to the protocol specification, servers supporting the Range header must understand the NPT range format and should understand the SMPTE range format. In most cases, the SMPTE format is preferable, since it supports more indexing into individual packets within a frame. But, the proxy could easily avoid the need for such fine-grain indexing by ending the prefix at frame boundaries.
The proxy does not know in advance whether or not the server supports Range requests for a particular presentation (some streams do not allow seek operations), and whether or not the SMPTE format is supported. The proxy can learn, and cache, this information for each presentation by sending Range requests to the server and noting the response (e.g., the server sends a 426 ``Header Field Not Valid for Resource'' response when Range is not supported). The proxy could avoid polling the server for this information if the RTSP SETUP or PLAY response included information about support for Range requests. To avoid changing existing RTSP implementations, providing the information about Range support could be optional, since the proxy could always infer the information and/or decline to cache partial contents when the information is not provided.
In addition, RTSP does not require the server to handle the Range request precisely. For example, the request may identify a starting time that does not correspond to any particular frame, forcing the server to initiate transmission from the previous or subsequent frame. In addition, the server may not support arbitrary indexing to individual frames. (For example, an MPEG stream typically consists of I, P, and B frames within a group of pictures (GOP). The server may not precisely satisfy a Range request that starts in the middle of a GOP. In fact, allowing clients to index to any arbitrary frame could introduce substantial overhead at the server (to maintain fine-grain indices, or to parse the content to sequence to the appropriate frame), particularly for variable-bit-rate streams, or streams without a fixed GOP structure.) The RTSP protocol does not dictate how accurately the server must handle the Range header, and does not provide a way for the server to indicate in advance how much inaccuracy could be introduced. As such, the proxy should be conservative, and ensure that the transmission to the client does not have duplicate or missing packets. Fortunately, the server reply includes a Range response header that indicates what range of time is currently being played. If the beginning of the suffix contains frames that are already at the end of the prefix, the proxy discards the overlapping packets. To avoid a gap between the prefix and the suffix, the proxy could initiate a conservative Range request, or issue a second request if a gap arises.
In order to link the prefix and the suffix, all RTP headers must be consistent, otherwise the client will not associate the two parts to the same stream. The sensitive fields are sequence numbers, timestamps, and source identifier (SSRC), which have been selected separately by the proxy (for the prefix) and the server (for the suffix). Therefore, the proxy will have to change the RTP header fields of the suffix to match the SSRC it chose for the prefix, and the timestamps and sequence numbers to indicate that the suffix must be played after the prefix. In streaming the suffix, the proxy overwrites the SSRC field in each RTP packet with the value it selected as part of initiating transmission of the prefix. The proxy knows the timestamp and sequence number used in transmitting the last RTP packet of the prefix. The base timestamp and sequence number for the server's transmission of the suffix are provided in the RTP-Info header in the PLAY response. The proxy can then add/subtract the appropriate constant for the timestamp and sequence number fields of each packet in the suffix.
Prefix caching decouples the server transmission from client reception. Caching partial contents of the stream, and overlapping the prefix transmission on the proxy-client path with the suffix on the server-proxy path, introduces new challenges in cache coherency and feedback control. This section identifies the key issues and presents several possible solutions.
Caching RTSP messages and multimedia content at a proxy introduces potential coherency problems when the server changes this information. Cache coherency is a critical issue in the Web, where many resources change very frequently. The problem is arguably less important for multimedia presentations, which may change less frequently - the server could simply assign a new URL when new content is created. Still, the paradigms for authoring and updating multimedia content are not well understood, and the use of proxies should not place additional restrictions on how servers are managed. As such, the proxy should incorporate mechanisms to prevent the transmission of a suffix that does not come from the same version of the presentation as the cached prefix.
The proxy could reduce the likelihood of introducing an inconsistent stream by sending a DESCRIBE message to the server for every client request, and checking that the Last-Modified time matches the cached prefix. However, the additional exchange with the server would increase client start-up latency, which would partially defeat the purpose of prefix caching. Instead, the proxy could apply a timeout (time-to-live) to the cached information, revalidating (with an ``If-Modified-Since'' request) only if the timeout has expired. To avoid forcing the client to wait for the proxy to check with the server, the proxy could periodically validate the cached information, even if no client has requested the stream. To reduce the overhead of creating a TCP connection to the server, the proxy could freshen the DESCRIBE information during other communication with this server (e.g., while coordinating the transmission of another stream from the same server). Such piggyback cache validation has proved to be a very effective technique for controlling the consistency of Web resources .
To ensure that a client never receives a suffix that does not match with the prefix, the proxy could send a DESCRIBE request to the server just after requesting the suffix. This would allow the proxy to verify that the suffix comes from the same presentation as the cached prefix. In the rare event that the suffix comes from a more recent version of the presentation, the proxy could terminate the transmission to the client rather than send inconsistent information. Note that this approach does not delay transmission to the client in the common case when the presentation has not changed. From the client's point of view, the disruption of the transmission would appear as a transient server error. For example, the proxy could send a ``500 Internal Server Error,'' ``502 Bad Gateway,'' or ``503 Service Unavailable'' message. Note that the proxy could also send such an error message if the server does not respond to the proxy's request for the suffix, or is unable to devote resources to satisfying the proxy's request. After receiving the error message from the proxy, the client could optionally initiate the request a second time, and receive a fresh copy of the presentation from the server.
Finally, cache inconsistency can be avoided with additional support from the server. The simplest approach would be to have the server indicate an expiration time for the presentation, and ensure that the contents will not change in the meantime. An alternative solution would be to allow additional communication between the server and the proxy. For example, the server could push the prefix of each popular multimedia presentation to interested proxies, perhaps using multicast to distribute the content. In this case, the proxy acts as a (partial) replica of the server content, rather than as a conventional cache. The server could indicate the expiration time for the presentation, or promise to send updates as part of changing the presentation. The proxies could subscribe to receive updates for particular content of interest.
Prefix caching introduces additional challenges in generating accurate and timely reports of server and client performance. If the entire stream resides in the cache, the proxy acts as a server in generating sender RTCP reports and responding to the client's receiver reports. On a cache miss, the proxy acts as an application-level router and simply forwards the reports generated by the server and the client. Yet, the appropriate behavior is less clear when the proxy sends the prefix of the stream and fetches the suffix from the server. On the one hand, the proxy could view the server-proxy and proxy-client paths as distinct, and have two unrelated RTCP sessions with the server and the client, respectively. Despite the conceptual appeal of this approach, the proxy runs the risk of hiding important information about performance problems on the server-proxy or proxy-client paths. Instead, the proxy could generate and receive reports while transmitting the prefix, and switch to forwarding reports while transmitting the suffix. In forwarding the server-generated reports, the proxy would need to map the key fields, such as the SSRC, timestamp, and sequence number, that have a different base in the suffix than in the prefix.
Forwarding server and client RTCP reports during transmission of the suffix has the advantage of informing the server about the performance experienced by the client. However, the information could be rather out of date, since the suffix is not sent to the client until transmission of the prefix is complete. The proxy has a number of possible options to control the RTCP reports sent to the server. Reporting transient performance problems (e.g., burst of loss) on the proxy-client path may not be particularly relevant, especially if this information is out-of-date. Instead, the proxy could choose to report only persistent problems that require a response by the server (e.g., to switch to a lower bandwidth version of the stream). In generating these reports, the proxy must be careful to reflect the delay that has been introduced in sending the RTP packets in the suffix to the client. Otherwise, the server may think that a large number of packets have not been received. The proxy could achieve these goals by basing the receiver reports on the packets that have been received by the proxy (even if they have not all been sent to the client), while using the loss rate experienced by the client to compute the number of lost packets to report to the server.
Correctly modifying the RTCP receiver reports is a delicate process. Instead, the proxy could take advantage of the cached prefix to change the properties of the server-proxy and proxy-client paths. The proxy could take advantage of the prefix cache to hide the delay for explicit retransmission of lost packets on the server-proxy path. For example, the proxy could issue a Range request (perhaps on a separate connection) to retrieve a previously-requested interval of frames from the server. This would isolate the client from the loss on the server-proxy path. Rather than issuing RTSP Range requests (which is arguably a rather coarse-grain way to effect packet retransmissions), the proxy could have the server transmit RTP packets over a TCP connection (by requesting TCP in the initial SETUP message). The TCP connection would ensure a reliable transmission of the RTP packets, and would obviate the need for an RTCP session between the server and the proxy. The proxy could still use UDP for transferring RTP packets to the client.
To address potential performance problems on the proxy-client path, the proxy could perform additional functions, such as transcoding and workahead smoothing. Transcoding  would reduce the bandwidth requirements of the stream by lowering the quality. The proxy could perform transcoding either by discarding a subset of the incoming packets, or by reencoding the stream (e.g., by discarding high-frequency components in each MPEG slice). In contrast, workahead smoothing would involve transmitting frames into the client playback buffer ahead of the playout time . The cached prefix allows the proxy to transmit frames ahead of time without incurring additional client playback latency. The transmission of additional frames would allow time for slower transmission of large frames (smoothing out a burst) or retransmission of lost packets (based on the receiver reports). The proxy could perform these functions on the proxy-client path without involving the server. If the proxy sends RTCP receiver reports to the server, these messages should reflect the improvements made in the proxy-client path, to avoid triggering conflicting recovery operations at the server.
This section describes the architecture of the proxy and identifies the key implementation issues. We describe the session data structure, the operation of the cache, and the scheduling of processor and link resources. A preliminary implementation of the prefix-caching proxy has been implemented in C on a Linux-based PC, and successfully tested using the RealServer and RealPlayer (from www.real.com) as a source and sink.
The client sends RTSP messages directly to the proxy by configuring the player with the proxy's IP address. The client has a TCP connection to/from the proxy for RTSP operations, a UDP connection from the proxy to receive RTP packets, and a UDP connection to/from the proxy for RTCP. Having the client contact the proxy explicitly has advantages over transparent caching, where the proxy must lie in the path of client messages that are directed to the server. In addition, transparent caching would also introduce extra complexity and overhead for the proxy to intercept and modify all of the client and server packets, including reconstruction of the TCP stream for the RTSP messages. These challenges, while important, are beyond the scope of our initial implementation effort.
The proxy coordinates communication with the client and the server using a data structure that stores key information about the ongoing session. The fields in this data structure are populated as the session progresses. A session can include multiple streams, simultaneously or back-to-back; for example, a presentation often includes both an audio and a video stream, and perhaps a third stream for subtitles. The reply to the OPTIONS message contains cacheable information about the server, but does not include any session-level information. The reply to the DESCRIBE message includes cacheable information about a resource, such as its URI and Last-Modified Time. For illustration, we describe how the session data structure is populated on a cache miss:
The session structure also includes a number of flags that indicate whether or not the requested streams are in the cache, and which packet should be sent next. The session data structure is populated in a similar manner when the proxy cache contains the prefix, except that the session identifier, base timestamp, and base sequence number are chosen by the proxy.
The desire to efficiently transmit cached multimedia content to a large collection of clients drives most of the design decisions for the caching architecture. For example, the proxy caches RTP packets, rather than raw multimedia content. This obviates the need to parse the body of the RTP packets and repacketize the content, at the expense of additional storage overhead for RTP headers. Each item in the cache represents a particular multimedia resource, including the DESCRIBE information and a linked list of RTP packets, ordered by sequence number. In the simplest case, the incoming RTP packets are appended to the linked list as they arrive. Since packets sometimes arrive out of order, the proxy could insert each packet in the appropriate place in the list based on the sequence number field in the RTP packet header.
Each time a stream is transmitted from the cache, the proxy must select an SSRC and a base timestamp and sequence number (at random). As such, the proxy must alter these header fields in the cached RTP packets before transmitting to the client. To avoid extra data copying, the proxy can send the packet using a gather function (e.g., writev()) that can combine buffers from multiple locations. However, not all systems have efficient support for the gathering function. In this case, the proxy can overwrite the RTP header fields directly in the cache, and initiate a regular socket send operation for the entire packet. To avoid alteration or deletion of the packet during the copy into the socket buffer, each packet includes a lock bit to indicate that a (non-blocking) transmission is in progress; the lock bit is cleared after completion of the socket call. Also, since the proxy overwrites certain RTP header fields, the cache stores additional information to allow computation of the sequence number and timestamp fields for subsequent transmissions of the packet.
To handle requests from multiple clients, the proxy must have an effective way to allocate memory, processing, and I/O resources between different sessions. Our prototype implementation has an event-driven architecture, where the proxy coordinates all activity between client-server pairs in a single process. This avoids the overheads (and jitter) introduced by context switching between the processes, and simplifies the sharing of the cache amongst all of the sessions. In addition, the event-driven architecture enables the proxy to have greater control over resource allocation, by sending and receiving RTP data for each session in proportion to the bit-rate requirements. Instead of having a process scheduler implicitly allocate resources to each session, the event-driven approach shares resources by scheduling socket operations. The proxy can timeshare the system by performing accounting of the number of bytes handled for each stream.
The proxy, like a server, must carefully schedule transmission of RTP packets to avoid underflow and overflow at the client site. If the proxy sends packets to the client over a TCP connection, the TCP flow control could implicitly pace the transmission. However, sending UDP packets requires greater care. The proxy schedules packets based on the RTP timestamps and packet sizes. The timestamp clock rate is payload dependent , requiring the proxy to maintain a mapping between payload types and clock rates to determine how many packets to transmit during a given time interval.
Ideally, the proxy could schedule the transmission of each packet at the appropriate departure time. However, such fine-grain scheduling is typically not possible, and is arguably not necessary. Instead, the proxy should pick an appropriate time interval (say, one second) for coordinating packet transmissions for each session. When streaming multimedia content to multiple clients, the proxy proxy can sequence through the set of sessions, transmitting the appropriate number of packets for each active session. A system with kernel support for multimedia streaming could explicitly support packet scheduling . Ideally, in addition to allowing per-session scheduling and variable transmission rates, the kernel could conceivably compute the appropriate rate directly from the RTP timestamps, payload-specific clock rate, and packet sizes. Such system support would aid in scaling the proxy to support a larger number of sessions.
Testing the prototype proxy requires multimedia clients and servers that implement RTP, RTCP, and RTSP. Currently, reference RTSP software is available for a source and a sink. However, the reference source cannot support multiple simultaneous clients, and the reference client does not display the multimedia stream; in addition, the reference code implements an earlier version of RTSP. Rather than extending the reference code, we decided to experiment with the RealNetworks server and player, which both support RTSP/1.0. This provided an opportunity to test the interoperability of our proxy with the dominant Internet multimedia components, and to study the RealNetworks RTSP implementation. Our testbed consists of a Windows NT 4.0 PC running a trial version of RealServer G2 (version 220.127.116.113), and a Windows 95 PC running RealPlayer G2 (version 18.104.22.168).
The RealPlayer is configured with the IP address and port number of the proxy. The decision to experiment with a local copy of the G2 server, rather than with existing multimedia servers in the Internet, was motivated by several issues. First, during the development and testing of our proxy software, we did not want to risk generating incorrect RTSP messages that might crash an operational G2 server. The G2 server in our testbed did crash a few times during our initial development of the proxy. Second, AT&T Research has a firewall that filters UDP packets. This would force RTP and RTCP transfers to use TCP, whereas we wish to experiment with UDP; later, we could conceivably put the proxy outside of the firewall, or experiment with RTP and RTCP transfers over TCP. Third, our initial implementation operates on a private Ethernet, to prevent accidental generation of excess traffic on the local area network; as such, the proxy does not have direct access to the larger Internet.
Although the RealServer can generate RTP and RTCP packets, the server typically employs a proprietary transport protocol called RDT. The selection of the transport protocol is initiated by the client SETUP message. For example, the SETUP message
SETUP rtsp://22.214.171.124:554/g2video.rm/streamid=0 RTSP/1.0 CSeq: 3 RealChallenge2: 7c7687849ca62b5a063fd037f9d13d7a01d0a8e3, sd=78920df3 Transport: x-real-rdt/udp;client_port=6970;mode=play, x-pn-tng/udp;client_port=6970;mode=play, rtp/avp;unicast;client_port=6970-6971;mode=play If-Match: 31278-2shows three candidate transport protocols, starting with RDT. Upon receiving this message, the server selects RDT as the transport protocol. To convince the server to send RTP and RTCP packets, the proxy modifies the Transport header to include only the RTP protocol. Then, the server selects RTP as the transport protocol, and transmits RTP and RTCP messages after receiving the PLAY request. Modifying the SETUP message is sufficient to convince the server to use RTP for most encoding formats (e.g., .wav, .au, and .avi files). However, the RealServer will not send RealNetworks files (e.g., .ra or .rm) using RTP. We discuss our experiences with the RealNetworks components and present a complete trace of the RTSP messages between the client and server in .
The initial prototype of proxy prefix caching implements a subset of the architecture, along with the modifications necessary for interacting with the RealNetworks server and player. Using the basic implementation, we can successfully process client RTSP messages, requesting the full contents from the server on a cache miss, or partial contents when a portion of the stream resides in the cache. The playback at the client is seamless in both cases. The proxy can sustain continuous transmission to multiple clients. In addition to viewing the stream at the client, we also verified the continuity of the RTP stream by packet inspection. The prototype is sufficient to test our uses of the RTP and RTSP protocols. However, supporting the full use of the protocols and scaling to a large number of simultaneous sessions would require enhancements to our implementation. The current prototype transfers RTSP messages over TCP, and RTP and RTCP messages over UDP, and assumes that a session has at most two simultaneous streams. The proxy forwards RTCP messages between the client and server, but does not generate reports itself. For simplicity, transfers of RTP packets to the client use a blocking send call. Future work can focus on extending the implementation and evaluating the performance.
Prefix caching offers an effective way for Internet service providers to improve user performance and network efficiency, without requiring additional support from the client, the server, or the network. The service can operate within existing standard protocols. Still, the service would benefit from improvements in the capabilities exchange in RTSP to provide more up-front information about the supported submethods, such as the Range header and acceptable time representations. In addition, prior knowledge of a bound on the server's inaccuracy in satisfying Range requests would be useful. These features could be added to RTSP, or supported in an alternate protocol with similar functionality.
As part of our ongoing work, we are pursuing these possible extensions to the RTSP standard, and are studying the cache coherency and RTCP feedback issues in greater detail. These issues become more complicated when the server can dynamically adjust the quality of the media stream in response to network congestion or server load. In addition, we are investigating how to combine prefix caching with other multimedia proxy services, such as workahead smoothing, retransmission, and transcoding.
We would like to thank Ethendranath Bommaiah and Kobus Van Der Merwe for their help in working with the RealNetworks RTSP messages, and Henning Schulzerinne for discussions about the current version of RTSP and possible extensions. Thanks also to Shubho Sen for his comments on an earlier version of the paper.