"Packet Video 2000", May, 2000

Design and Implementation of DV based video over RTP

Akimichi Ogawa, Katsushi Kobayashi*, Kazunori Sugiura, Osamu Nakamura, Jun Murai
Keio University, Japan,
Communication Research Laboratory, Japan*

This paper discusses the semantics of sending high quality , high bandwidth video and audio streams using the Internet as a transport media. We have focused on Digital Video(DV) format for video and audio media. Digital Video is a popular consumer video format using IEEE1394 interface for exchanging digital video stream. Video compression in DV format is similar to motion JPEG. DV format uses DCT(Discrete Cosine Transform) and VLC(Variable Length Coding) technique for video compression. Furthermore, inter frame compression technique used in MPEG is omitted. We implemented Internet Video transmission system(DVTS: Digital Video Transport System) using DV. IPv4 and v6 are supported for network layer semantics. Real-time Transport Protocol(RTP) is implemented as an assurance of interoperability. DV/RTP payload format is under discussion in IETF(Internet Engineering Task Force) AVT(Audio/Video Transport) working group. Operability tests have been demonstrated in several network configurations, using commodity Internet and private network. DV/RTP generates maximum of 33Mbps of streaming packets. Decreasing packet streams by discarding video frame rate enables optimization of DV/RTP streams for variety of network bandwidth. Degrading frame rate of Video streams to 1/10 enables transmission of DV/RTP streams over 10Base-T based Ethernet connection.

1. Introduction

    With massive growth of IP network based system, it becomes popular for today's campus LAN infrastructure of 100Mbps class access link and 1Gbps class backbone. Broadband infrastructure enables congestion free networks. High-performance network switching technology has been already accomplished for IP multicast stream with 100Mbps class bandwidth without critical packet loss. In such LAN based networks, bandwidth capability for sending high quality digitized video and audio media stream is acceptable. We have focused on DV(Digital Video)[1] for sending video and audio media via networks. DV(Digital Video)[1] is a packet based video/audio format. Each DV video format specifies standard digital video interface media to exchange digital stream data, i.e. IEEE1394[2] for consumer and SDTI for professional. Both digital interface media have strong restriction in the cable length and network configuration. Expensive special equipment is required to eliminate those restrictions. Single consumer SD format DV stream consumes 25Mbps of network bandwidth over IEEE1394 bus. It is not difficult to ensure end-to-end throughput for DV stream on LAN environment, both on unicast and multicast connection. This LAN infrastructure enables remote video editing system across broadcast station and campus wide video distribution system using DV system.
    If global Internet infrastructure provides enough bandwidth to send DV class stream, the style of broadcasting and communication will be changed, e.g., live spot in TV news program and distance learning system. On global IP network, bandwidth of the backbone network is increasing massively compared to the LAN infrastructure. WDM(Wave Division Multiplexing) and high performance switching technology are used for broadband backbone network connections. Moreover, QoS support on the Internet is developed and will be accomplished using Intserv(Integrated Services) and Diffserv(Differentiated Services) approaches. High bandwidth and QoS support network implementation having enough performance to transmit 100Mbps class stream for each connection. Test of broadband network interconnection begins on some advanced network test beds focusing on the next generation Internet (NGI), e.g. Internet2 and APAN(Asia Pacific Advanced Network). Such NGI technologies will be available to usual Internet users who connect to the Internet with dial-up today.
    In this paper, we present real-time NTSC quality communication tool. We have already presented the preliminary implementation IP based DV communication for demonstrating its effectiveness on SC98(Super computer Conference 98). So here, we will introduce the RTP(Real-time Transportation Protocol) based DV real-time streaming tool for both of IPv4 and IPv6 we developed. We have been proposing NTSC quality video communication using consumer product. We will also describe rate control function implemented on our DV tool using picture frame discarding. Finally, we discuss our experience and application trial of the DV tools on live Internet environment of unicast and multicast.

2. Digital Video Encoding Format

    The DV format is designed for magnetic tape media. DV Format is specially optimized for enhancing the characteristics in recording digital video and audio data using helical scan magnetic systems. Several minor formats are defined for various purposes on both of consumer and professional format[1]. DV format is the most popular format for both consumer and professional because of its small tape media size (6.3 mm, 120min/cassete), full digital recording capability, appropriate cost compared with 8mm camcorder and easy to configure non-linear editing system linking to PC.
    DV format is implemented in its abstraction of framing method. This abstraction observes synchronization issues such as lip synchronization. Every data including video, audio and system data are managed within the units of its video picture frame. The DV digital data stream is composed of three level hierarchical structure. A single video frame data in the DV format stream is divided into several "DIF(Digital Interface Format) sequence". A DIF sequence is composed in 150 chunks of 80 bytes length DIF blocks. A DIF block is the primitive unit for all DV stream and is common in every DV specifications family. Each 80 byte DIF block contains 3 byte ID header specifying the type of the DIF block, and its position in the DIF sequence. Five types of DIF blocks are defined in the ID header: DIF sequence header, Subcode, Video Auxiliary information (VAUX), Audio data and Video data. In this paper, we define DIF sequence header, Subcode and VAUX as a system data. Audio, video and system data can be separated with DIF block unit. Audio DIF block data also consists of audio Auxiliary information and audio data.
    For compression of the video data, the DV format uses only intra frame DCT(Discrete Cosine Transform) and VLC(Variable Length Coding) compression technique at the fixed ratio. Unlike MPEG1 and MPEG2, the DV format does not use an inter frame compression technique. A video picture frame is divided into rectangle or clipped rectangle shaped DCT super blocks. "DIF sequence" of DV stream corresponds to integral number of DCT super blocks. DCT super blocks are divided into 27 rectangle or square shaped DCT macro blocks and DCT macro block also divided into 4 or 6 DCT block units.
    Audio part is encoded with sampled data in video frame unit; sampling frequency is 32 kHz, 44.1 kHz or 48 kHz, quantization is 16, 12 or 20 bit. 

3. RTP Payload for DV format Video

    RTP is designed to accomplish realtime stream transportation using the Internet. RTP provides functions for realtime packet-based communication. RTP protocol itself is independent from other upper layer encoding formats. However, RTP is designed in the concept of ALF(Application Layer Framing). RTP depends on the specification of encapsulation format and protocol behavior for each encoding format as H.261, M-JPEG and MPEG. We implemented RTP as an underlying protocol in our DV system. Standardization of DV/RTP encapsulation format is proposed in the IETF and is under discussion[3][4].
    Every DV stream data is constructed with 80 bytes DIF blocks including 3 bytes ID header. The format of the DV over RTP encoding uses RTP fixed header only, and does not use RTP extension header. Any integral number of DIF blocks may be packed into one RTP packet, directly concatenated after the RTP fixed header(Fig. 1). Except that all DIF blocks in one RTP packet must be from the same video frame. DIF blocks from the next video frame will not be packed into the same RTP packet even if more payload space remains. Transition from one video frame to the next is indicated by a change in the RTP timestamp. Thus, DV/RTP does not rely on a particular packet for video frame transition.

Figure 1. RTP Packet Format

Two types of DV stream system are defined in the proposal i.e. audio and video data are transmitted with single bundled stream or with separate streams. There are two strategies for sending unbundled DV/RTP streams. 1) send a DV/RTP stream without video data, as audio stream. 2) convert DV audio data to common PCM audio format, and then send the converted audio stream. In the proposal, when sending DV video and audio data using different RTP streams, it is recommended to send audio data using common PCM audio format. When using method 1) for sending unbundled audio, video and audio data uses the same granularity for RTP timestamp. Thus, lip synchronization can be obtained using RTP timestamp. When using method 2) for sending unbundled audio, RTP timestamp granularity for video and audio will differ. RTP clock and absolute time can be paralleled using RTCP (RTP Control Protocol). However, perfect lip synchronization is not obtained. To obtain perfect lip synchronization, use of bundled DV video and audio stream is required.

4. Implementation of DV over RTP on FreeBSD

We implemented IP based DV video transmission system called DV Transport System(DVTS) using DV/RTP[5][6]. The overview of the DVTS is shown in Fig. 2. The system consists of a Pentium based PC with FreeBSD as an operating system, IEEE1394 device driver and interface[7], and DV/RTP stream sender and receiver application. Both DV/RTP sender and receiver PC have an IEEE1394 interface on the PCI bus. The camcorder connected DV/RTP sender side (Shown in the left side of Fig. 2) creates IEEE1394 encapsulated DV packet stream. The sender application receives the DV stream via PCI IEEE1394 interface card, encapsulates the DV packet of IEEE1394 into RTP, and transmits it to the IP network. The receiver application obtains the IEEE1394 DV packets by reconstructing DV data received using RTP. IEEE1394 header is attached to the reconstructed DV/IEEE1394 packet and transferred to the DV recorder deck via PCI IEEE1394 interface card (Shown in the right side of Fig. 2). The DV recorder deck displays the DV data on the connected display. The DV system we implemented has the advantage that the system can be configured only with highly available standard PCI based PC compatibles, consumer DV camcorder and DV VCR equipment having IEEE1394 interface.

Figure 2. System Architecture

4.1 IEEE1394 device driver for FreeBSD

  In order to use consumer DV devices equipped with IEEE1394 interface, we designed and implemented an IEEE1394 device driver on FreeBSD 3.3[7]. IEEE1394 high speed serial bus system is designed for a packet based shared media computer bus system. The network bandwidth is logically specified from 100Mbps to 3.2Gbps. The goal of IEEE1394 is to integrate and observe various interface and cable specification into only single bus (cable) system, i.e. storage device interface instead of SCSI and IDE, peripheral of parallel and serial, network of ethernet, processor interconnect of VME and also RCA cable of audio and visual equipment. Heterogeneous speed devices can be connected within a single IEEE1394 physical network, which enables devices are made at the appropriate cost.
    Three types of transmission mode is provided by IEEE1394; 1)isochronous stream mode for QoS which provides especially strict packet jitter and guaranteed bandwidth without reliable communication, 2) asynchronous stream for best effort without reliability, and 3)asynchronous request for the reliable communication.

Figure 3. Isochronous packet timing on IEEE1394

Data timing in IEEE1394 is shown in Fig 3. Every packet transmission action is brought with 8 kHz time slice whose value corresponds to the fairness unit in the IEEE1394 system. The 8kHz time slice unit is also divided into 6144 time slot for bandwidth management. Isochronous stream transmission would be done by taking the number of time slot the sender requires first, sending a packet whose size is smaller than the time slot at every 8kHz fairness unit. Therefore, the packet jitter of the isochronous stream mode is suppressed in the order of 8kHz (125 micro second), and the condition might be enough for any jitter sensitive high quality packet video system.  It is not easy for the legacy packet based shared network system to satisfy such conditions. However, every IEEE1394 LSI chip already supports the isochronous stream mode on its hardware level, and the cost of a chip is less than $20. Consumer DV adopts IEEE1394 as its digital interface standard, although the isochronous stream mode does not ensure reliable communication.
    When sending DV stream on IEEE1394, the 80 bytes DIF blocks are aggregated to appropriate size, e.g. an IEEE1394 packet of consumer SD DV stream consists of 6 DIF blocks. 8 bytes common isochronous information (CIP) header is prepended to aggregated DV packet. The CIP

4.2 Network utilization and frame discard

Full DV stream consumes over 30Mbps when using standard NTSC quality video as 525 lines and 29.97 picture frames per second. When utilization of bandwidth is increased in commodity networks, resulting in bandwidth for sending full rate of DV streams is unavailable. If there is less bandwidth available for the infrastructure itself, DVTS needs to adjust its bandwidth usage.
In many cases, a full rate transmission is not required, and reduced frame rate video is acceptable. In contrast, audio data does not use as much bandwidth as the picture data. However, it requires the stable and continuous transmission. Therefore, discarding picture frames and preserving audio frames enables effective compression of DV streams without critical communication failures. We established compression of DV stream by discarding picture frames. The quality of the DV/RTP image of half rate picture frames (15frames per second in 525-60) is close to the quality used in common animation (12 frames per second), and acceptable for communication. This compression does not increase any cost of the system. We did not implement additional complicated compression techniques, which will lead to the entire system to require costs.
In a full rate transmission, if enough bandwidth for the DV stream is available, the sender application simply forwards every DV DIF blocks to the receiver PC via IP. If enough bandwidth is not available, the sender application reduces output rate by discarding appropriate data of a DV stream. In our implementation, the sender pulls out the audio data from the discarded frame and sends the pulled-out audio data to the receiver via IP.
In DV format, DV/IEEE1394 packets must be sent continuously to the IEEE1394 interface. To send DV/IEEE1394 packets continuously, the receiver application of DVTS consists of two processes, and one process displays the DV/IEEE1394 data continuously. There are two error concealment strategies for packet loss. 1) if a packet loss is detected, display the previous frame that is complete. 2) if a packet loss is detected, use the related data from the previous frame. In DVTS, the latter strategy is used. Since every data in DV format consists of 80 byte DIF blocks, it is very easy to find the 8x8 DCT data in the particular position. When a packet drops, the receiver uses the related DIF block from the previous frame for the DIF block the dropped packet contains. In DV format, the DCT blocks are distributed. Thus, small amount of packet loss will not lead to critical loss of video quality. When the sender application discards a frame, the receiver application simply sends the previous frame to the IEEE1394 interface. If the received packet is consisted of only an audio data , the receiver application displays the previous picture frame with incoming audio packets. The frame rate of the DVTS and the consumed bandwidth is shown in Fig.4.

Figure 4. Frame Discarding

RTP does not ensure the packet's reachability to the destination host. Thus, tolerance to packet loss and jitters is required. RTP is also not aware of congestion along with the intermediate path. Thus, mechanism to reduce data rate of the DV/RTP stream, is also required. Our receiver application does frame buffering for absorbing jitters. Frame buffering in the receiver application is shown in Fig.5.

Figure 5. Frame Buffering

The receiver application consists of two processes using shared memory for frame buffers. Though the large size of the frame buffer suppresses jitter effect, the large buffer system cannot avoid large play out delay. The number of frame buffer is settled considering the network situation and the requirement of application when starting the receiver application. 1)One process is a receiver process that decapsulates a RTP packet into a DV packet, and writes it to the shared frame buffer. The receiver process updates the shared frame buffer with simply by overwriting newly received data. The receiver process ignores the incontinuity of the DV stream, due to some reasons as unexpected packet losses or senders not to send the packets continuously. The field in the shared frame buffer without receiving a new DV data will not be updated and the previous frame data will remain within that field. When the receiver process finishes writing a frame, the receiver notifies it to the display process using the flag in the shared frame buffer.
2)The other process is a display process that sends out the shared frame buffer data to the IEEE1394 interface. The display process examines each flag for the frame buffer in the shared memory, sends it when the flag is set. The previous frame buffer is used when the next frame buffer is not ready.

4.3 IPv6 Support and multicast

For compatibility to the next generation Internet, Version 4 and 6 of IP are implemented and supported in DVTS. For IPv6 support, KAME[8] implementation for FreeBSD 3.3 is used. Modification for supporting IPv6 is minimal.
We measured bandwidths consumed by the DVTS traffic over IPv4 and IPv6. The network topology for the measurement is shown in Fig 6. Single DV sender sends DV stream to DV receiver through a PC router. The measurement was performed in the PC router. No commodity traffic was in the network during the measurement. The measured bandwidth and the frame rate for IPv4 and IPv6 are shown in Table 1. Since the difference between IPv4 and IPv6 was only the IP header, there were no significant difference between the two.

Figure 6.

Table 1. Traffic of DV Stream on IPv4 and IPv6

frame rate bandwidth v4 (Mbps) bandwidth v6 (Mbps)
1/1 30.47 31.70
1/2 15.72 16.83
1/3 11.48 11.84
1/4 9.01 9.33
1/5 7.54 7.83
1/10 4.74 4.87
1/20 3.26 3.39
1/30 2.79 2.90

4.4 DV/RTP application on global Internet

We have demonstrated communication and conferences between long distance using DV/RTP.
We have presented a DV communication efforts between the USA and Japan using APAN Trans pacific link on November, 1998(JST) for showing the effectiveness of use over the world scale Internet (Fig. 7). The UDP encapsulated packet from the USA to Japan was also forwarded to Korea.
The communication effort was a 90 minute lecture given from USA by J.Murai, one of the co-authors. The inter-continent lecture was held bi-directionally. The Japanese students’ responses and questions were also brought in via the same system. The lecture was held at the Keio University Shonan Fujisawa Campus(SFC), Japan. The lecture was done using a half frame rate. There were no packet drops while using half rate during the lecture. We changed rate during the lecture to show and explain about frame discarding.

Figure 7. Network Topology of DV Communication at SC98

The network bandwidth used at TransPAC for this lecture is shown in Fig 8. The graph was created by MRTG(Multi Router Traffic Grapher)[9]. The grey area is a five minute exponentially decaying moving average of input bits per second on the USA to Japan Exchange Point. The solid line is a five minute exponentially decaying moving average of output bits per second on the USA to Japan Tokyo Exchange Point.

Figure 8. MRTG graph

On November 1999, we multicasted realtime DV/RTP stream toward 10 organizations widely distributed in Japan, from Kurashiki University of Science and the Arts. The network topology is shown in Fig 9. The backbone network includes the JGN (Japan Gigabit Network, TTNet (Tokyo Telecommunication Network Co., Inc. Laboratory) experimental network, and CRL(Communication Research Laboratory) experimental network. Also, OKIX (Okayama Internet Exchange, http://www.okix.or.jp)provided the special technical support for the demonstration. The WIDE project workshop held at the Kurashiki University of Science and the Arts has been an interactive multimedia distributed remote conference. The workshop was multicasted to the following sites.

The demonstration system uses the IPv6 technology and the PIM-SM (Protocol Independent Multicast - Sparse Mode) routing protocol, those are the next generation Internet core technologies discussed at the IETF(Internet Engineering Task Force).

Figure 9. JB Network

4.5 Interoperability with Other Implementations

DV/RTP function has been provided by Comet router made by the Comet project at Fujitsu Laboratory. Comet box is a prototype system for the next generation of the Internet. Comet box has IEEE1394 interface and offers DV/RTP forwarding. We have verified the interoperability between Comet and our system. Also, some other DV over RTP system development efforts are ongoing and the activity will be accelerated after the DV over RTP standard is fixed.

5. Conclusion and Future Work

In this paper, we focused on Digital Video format as high bandwidth, high quality video and audio media. We implemented DVTS system for transmitting DV stream through the Internet. IPv4 and v6 are used for network layer protocol. For interoperability with other implementations, DV/RTP format is being discussed at the IETF. DVTS has ability to decrease bandwidth usage of DV/RTP stream, by discarding DV picture frames. Discarding DV picture frames can decrease large amount of bandwidth usage of DV/RTP stream, and still obtain good quality of video and audio media for communication. DVTS have been demonstrated at variety of network configuration. Our current sender application uses static frame rate decided by the sender side operator. However, in the Internet, the effective network bandwidth is likely to change every moment. In DV/RTP, RTCP can be used for feedback of the network situation. Automatic adaptation to the effective bandwidth using RTCP is an open issue, and will be accorded in the future.
We also need to do an interoperability test between other DV/RTP systems. In this paper, we only mention the implementation for consumer DV system for 525-60 system. There is a DV/RTP system implemented in 625-50 system. For communication with that implementation, the DV/RTP system for 625-50 system is under development.
We would also like to extend SDTI DV system not only IEEE1394 and to implement the system for professional DV and DV HD. When using DV products without IEEE1394 interfaces, a mechanism to display DV image and to play audio is required. We would also like to create a system that does not require a IEEE1394 interface.


[1] "Specifications Consumer-Use Digital VCR's using 6.3mm magnetic tape", HD Digital VCR Conference, 1994 society, 1995

[2] "IEEE Standard for a High Performance Serial Bus", IEEE computer society, 1995

[3] K.Kobayashi A.Ogawa S.Casner C.Bormann, "RTP Payload Format for DV Video", Internet Draft, 2000

[4] K.Kobayashi A.Ogawa S.Casner C.Bormann, "RTP Payload Format for 12-, 20- and 24-bit DV Audio", Internet Draft, 2000

[5] A.Ogawa K.Kobayashi K.Sugiura O.Nakamura J.Murai, "Design and Implementation of DV Stream over Internet", IWS99, 1999

[6] A.Ogawa, "DVTS(Digital Video Transport System)", http://www.sfc.wide.ad.jp/DVTS/, as of 1999

[7] K.Kobayashi, "Design and Implementation of Firewire device driver on FreeBSD", pp 41-51 Proc. FREENIX, USENIX 1999, 1999

[8] "The KAME Project", http://www.kame.net/, as of 1999

[9] T.Oetiker, "Multi Router Traffic Grapher", http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html, as of 1999

[10] W.B.Pennebaker J.L.Mitchell, "JPEG Still Image Data Compression Standard", published by : Van Nostrand Rheinhlod, 1993

[11] ISO/IEC JTC1/SC29/WG11, "Short-MPEG1 description, Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s", ISO, 1996

[12] "School of Internet", http://www.sfc.wide.ad.jp/soi/, 1999

[13] S.Jacobs A.Eleftheriadis, "Providing video serivces over networks without quality of service guarantees", RTMW'96, Sophia Antipolis, France, 1996

[14] H.Schulzrinne S.Casner R.Frederick V.Jacobson,"RTP: A Transport Protocol for Real-Time Applications",RFC1889, 1996

[15] S.Floyd K.Fall, "Promoting the Use of End-to-End Congestion Control in the Internet", IEEE/ACM Transactions on Networking, 1998

[16] J.Mahdavi S.Floyd, "TCP-friendly unicast rate-based flow control", http://ftp.ee.lbl.gov/floyd/papers.html, as of 1997

[17] D.Sialem H.Schulzrinne, "The Loss-Delay Based Adjustment Algorithm: A TCP-Friendly Adaption Scheme", Network and Operating System Support for Digital Audio and Video (NOSSDAV), 1998

[18] K.Cho, "A Framework for Alternate Queueing: Towards Traffic Management by PC-UNIX Based Routers." In Proceedings of USENIX, 1999