"Packet Video 2000", May, 2000
Design and Implementation of DV based video over RTP
Akimichi Ogawa, Katsushi Kobayashi*, Kazunori Sugiura, Osamu Nakamura,
Jun Murai
Keio University, Japan,
Communication Research Laboratory, Japan*
abstract
This paper discusses the semantics of sending high quality , high bandwidth
video and audio streams using the Internet as a transport media. We have
focused on Digital Video(DV) format for video and audio media. Digital
Video is a popular consumer video format using IEEE1394 interface for exchanging
digital video stream. Video compression in DV format is similar to motion
JPEG. DV format uses DCT(Discrete Cosine Transform) and VLC(Variable Length
Coding) technique for video compression. Furthermore, inter frame compression
technique used in MPEG is omitted. We implemented Internet Video transmission
system(DVTS: Digital Video Transport System) using DV. IPv4 and v6 are
supported for network layer semantics. Real-time Transport Protocol(RTP)
is implemented as an assurance of interoperability. DV/RTP payload format
is under discussion in IETF(Internet Engineering Task Force) AVT(Audio/Video
Transport) working group. Operability tests have been demonstrated in several
network configurations, using commodity Internet and private network. DV/RTP
generates maximum of 33Mbps of streaming packets. Decreasing packet streams
by discarding video frame rate enables optimization of DV/RTP streams for
variety of network bandwidth. Degrading frame rate of Video streams to
1/10 enables transmission of DV/RTP streams over 10Base-T based Ethernet
connection.
1. Introduction
With massive growth of IP network based system, it becomes
popular for today's campus LAN infrastructure of 100Mbps class access link
and 1Gbps class backbone. Broadband infrastructure enables congestion free
networks. High-performance network switching technology has been already
accomplished for IP multicast stream with 100Mbps class bandwidth without
critical packet loss. In such LAN based networks, bandwidth capability
for sending high quality digitized video and audio media stream is acceptable.
We have focused on DV(Digital Video)[1] for sending video and audio media
via networks. DV(Digital Video)[1] is a packet based video/audio format.
Each DV video format specifies standard digital video interface media to
exchange digital stream data, i.e. IEEE1394[2] for consumer and SDTI for
professional. Both digital interface media have strong restriction in the
cable length and network configuration. Expensive special equipment is
required to eliminate those restrictions. Single consumer SD format DV
stream consumes 25Mbps of network bandwidth over IEEE1394 bus. It is not
difficult to ensure end-to-end throughput for DV stream on LAN environment,
both on unicast and multicast connection. This LAN infrastructure enables
remote video editing system across broadcast station and campus wide video
distribution system using DV system.
If global Internet infrastructure provides enough
bandwidth to send DV class stream, the style of broadcasting and communication
will be changed, e.g., live spot in TV news program and distance learning
system. On global IP network, bandwidth of the backbone network is increasing
massively compared to the LAN infrastructure. WDM(Wave Division Multiplexing)
and high performance switching technology are used for broadband backbone
network connections. Moreover, QoS support on the Internet is developed
and will be accomplished using Intserv(Integrated Services) and Diffserv(Differentiated
Services) approaches. High bandwidth and QoS support network implementation
having enough performance to transmit 100Mbps class stream for each connection.
Test of broadband network interconnection begins on some advanced network
test beds focusing on the next generation Internet (NGI), e.g. Internet2
and APAN(Asia Pacific Advanced Network). Such NGI technologies will be
available to usual Internet users who connect to the Internet with dial-up
today.
In this paper, we present real-time NTSC quality
communication tool. We have already presented the preliminary implementation
IP based DV communication for demonstrating its effectiveness on SC98(Super
computer Conference 98). So here, we will introduce the RTP(Real-time Transportation
Protocol) based DV real-time streaming tool for both of IPv4 and IPv6 we
developed. We have been proposing NTSC quality video communication using
consumer product. We will also describe rate control function implemented
on our DV tool using picture frame discarding. Finally, we discuss our
experience and application trial of the DV tools on live Internet environment
of unicast and multicast.
2. Digital Video Encoding Format
The DV format is designed for magnetic tape media. DV
Format is specially optimized for enhancing the characteristics in recording
digital video and audio data using helical scan magnetic systems. Several
minor formats are defined for various purposes on both of consumer and
professional format[1]. DV format is the most popular format for both consumer
and professional because of its small tape media size (6.3 mm, 120min/cassete),
full digital recording capability, appropriate cost compared with 8mm camcorder
and easy to configure non-linear editing system linking to PC.
DV format is implemented in its abstraction of framing
method. This abstraction observes synchronization issues such as lip synchronization.
Every data including video, audio and system data are managed within the
units of its video picture frame. The DV digital data stream is composed
of three level hierarchical structure. A single video frame data in the
DV format stream is divided into several "DIF(Digital Interface Format) sequence". A DIF sequence
is composed in 150 chunks of 80 bytes length DIF blocks. A DIF block is
the primitive unit for all DV stream and is common in every DV specifications
family. Each 80 byte DIF block contains 3 byte ID header specifying the
type of the DIF block, and its position in the DIF sequence. Five types
of DIF blocks are defined in the ID header: DIF sequence header, Subcode,
Video Auxiliary information (VAUX), Audio data and Video data. In this
paper, we define DIF sequence header, Subcode and VAUX as a system data.
Audio, video and system data can be separated with DIF block unit. Audio
DIF block data also consists of audio Auxiliary information and audio data.
For compression of the video data, the DV format
uses only intra frame DCT(Discrete Cosine Transform) and VLC(Variable Length
Coding) compression technique at the fixed ratio. Unlike MPEG1 and MPEG2,
the DV format does not use an inter frame compression technique. A video
picture frame is divided into rectangle or clipped rectangle shaped DCT
super blocks. "DIF sequence" of DV stream corresponds to integral number
of DCT super blocks. DCT super blocks are divided into 27 rectangle or
square shaped DCT macro blocks and DCT macro block also divided into 4
or 6 DCT block units.
Audio part is encoded with sampled data in video
frame unit; sampling frequency is 32 kHz, 44.1 kHz or 48 kHz, quantization
is 16, 12 or 20 bit.
3. RTP Payload for DV format Video
RTP is designed to accomplish realtime stream transportation
using the Internet. RTP provides functions for realtime packet-based communication.
RTP protocol itself is independent from other upper layer encoding formats.
However, RTP is designed in the concept of ALF(Application Layer Framing).
RTP depends on the specification of encapsulation format and protocol behavior
for each encoding format as H.261, M-JPEG and MPEG. We implemented RTP
as an underlying protocol in our DV system. Standardization of DV/RTP encapsulation
format is proposed in the IETF and is under discussion[3][4].
Every DV stream data is constructed with 80 bytes
DIF blocks including 3 bytes ID header. The format of the DV over RTP encoding
uses RTP fixed header only, and does not use RTP extension header. Any
integral number of DIF blocks may be packed into one RTP packet, directly
concatenated after the RTP fixed header(Fig. 1). Except that all DIF blocks
in one RTP packet must be from the same video frame. DIF blocks from the
next video frame will not be packed into the same RTP packet even if more
payload space remains. Transition from one video frame to the next is indicated
by a change in the RTP timestamp. Thus, DV/RTP does not rely on a particular
packet for video frame transition.
Figure 1. RTP Packet Format
Two types of DV stream system are defined in the proposal i.e. audio and
video data are transmitted with single bundled stream or with separate
streams. There are two strategies for sending unbundled DV/RTP streams.
1) send a DV/RTP stream without video data, as audio stream. 2) convert
DV audio data to common PCM audio format, and then send the converted audio
stream. In the proposal, when sending DV video and audio data using different
RTP streams, it is recommended to send audio data using common PCM audio
format. When using method 1) for sending unbundled audio, video and audio
data uses the same granularity for RTP timestamp. Thus, lip synchronization
can be obtained using RTP timestamp. When using method 2) for sending unbundled
audio, RTP timestamp granularity for video and audio will differ. RTP clock
and absolute time can be paralleled using RTCP (RTP Control Protocol).
However, perfect lip synchronization is not obtained. To obtain perfect
lip synchronization, use of bundled DV video and audio stream is required.
4. Implementation of DV over RTP on FreeBSD
We implemented IP based DV video transmission system called DV Transport
System(DVTS) using DV/RTP[5][6]. The overview of the DVTS is shown in Fig.
2. The system consists of a Pentium based PC with FreeBSD as an operating
system, IEEE1394 device driver and interface[7], and DV/RTP stream sender
and receiver application. Both DV/RTP sender and receiver PC have an IEEE1394
interface on the PCI bus. The camcorder connected DV/RTP sender side (Shown
in the left side of Fig. 2) creates IEEE1394 encapsulated DV packet stream.
The sender application receives the DV stream via PCI IEEE1394 interface
card, encapsulates the DV packet of IEEE1394 into RTP, and transmits it
to the IP network. The receiver application obtains the IEEE1394 DV packets
by reconstructing DV data received using RTP. IEEE1394 header is attached
to the reconstructed DV/IEEE1394 packet and transferred to the DV recorder
deck via PCI IEEE1394 interface card (Shown in the right side of Fig. 2).
The DV recorder deck displays the DV data on the connected display. The
DV system we implemented has the advantage that the system can be configured
only with highly available standard PCI based PC compatibles, consumer
DV camcorder and DV VCR equipment having IEEE1394 interface.
Figure 2. System Architecture
4.1 IEEE1394 device driver for FreeBSD
In order to use consumer DV devices equipped with IEEE1394 interface,
we designed and implemented an IEEE1394 device driver on FreeBSD 3.3[7].
IEEE1394 high speed serial bus system is designed for a packet based shared
media computer bus system. The network bandwidth is logically specified
from 100Mbps to 3.2Gbps. The goal of IEEE1394 is to integrate and observe
various interface and cable specification into only single bus (cable)
system, i.e. storage device interface instead of SCSI and IDE, peripheral
of parallel and serial, network of ethernet, processor interconnect of
VME and also RCA cable of audio and visual equipment. Heterogeneous speed
devices can be connected within a single IEEE1394 physical network, which
enables devices are made at the appropriate cost.
Three types of transmission mode is provided by
IEEE1394; 1)isochronous stream mode for QoS which provides especially strict
packet jitter and guaranteed bandwidth without reliable communication,
2) asynchronous stream for best effort without reliability, and 3)asynchronous
request for the reliable communication.
Figure 3. Isochronous packet timing on IEEE1394
Data timing in IEEE1394 is shown in Fig 3. Every packet transmission
action is brought with 8 kHz time slice whose value corresponds to the
fairness unit in the IEEE1394 system. The 8kHz time slice unit is also
divided into 6144 time slot for bandwidth management. Isochronous stream
transmission would be done by taking the number of time slot the sender
requires first, sending a packet whose size is smaller than the time slot
at every 8kHz fairness unit. Therefore, the packet jitter of the isochronous
stream mode is suppressed in the order of 8kHz (125 micro second), and
the condition might be enough for any jitter sensitive high quality packet
video system. It is not easy for the legacy packet based shared network
system to satisfy such conditions. However, every IEEE1394 LSI chip already
supports the isochronous stream mode on its hardware level, and the cost
of a chip is less than $20. Consumer DV adopts IEEE1394 as its digital
interface standard, although the isochronous stream mode does not ensure
reliable communication.
When sending DV stream on IEEE1394, the 80 bytes
DIF blocks are aggregated to appropriate size, e.g. an IEEE1394 packet
of consumer SD DV stream consists of 6 DIF blocks. 8 bytes common isochronous
information (CIP) header is prepended to aggregated DV packet. The CIP
4.2 Network utilization and frame discard
Full DV stream consumes over 30Mbps when using standard NTSC quality video
as 525 lines and 29.97 picture frames per second. When utilization of bandwidth
is increased in commodity networks, resulting in bandwidth for sending
full rate of DV streams is unavailable. If there is less bandwidth available
for the infrastructure itself, DVTS needs to adjust its bandwidth usage.
In many cases, a full rate transmission is not required, and reduced
frame rate video is acceptable. In contrast, audio data does not use as
much bandwidth as the picture data. However, it requires the stable and
continuous transmission. Therefore, discarding picture frames and preserving
audio frames enables effective compression of DV streams without critical
communication failures. We established compression of DV stream by discarding
picture frames. The quality of the DV/RTP image of half rate picture frames
(15frames per second in 525-60) is close to the quality used in common
animation (12 frames per second), and acceptable for communication. This
compression does not increase any cost of the system. We did not implement
additional complicated compression techniques, which will lead to the entire
system to require costs.
In a full rate transmission, if enough bandwidth for the DV stream
is available, the sender application simply forwards every DV DIF blocks
to the receiver PC via IP. If enough bandwidth is not available, the sender
application reduces output rate by discarding appropriate data of a DV
stream. In our implementation, the sender pulls out the audio data from
the discarded frame and sends the pulled-out audio data to the receiver
via IP.
In DV format, DV/IEEE1394 packets must be sent continuously to the
IEEE1394 interface. To send DV/IEEE1394 packets continuously, the receiver
application of DVTS consists of two processes, and one process displays
the DV/IEEE1394 data continuously. There are two error concealment strategies
for packet loss. 1) if a packet loss is detected, display the previous
frame that is complete. 2) if a packet loss is detected, use the related
data from the previous frame. In DVTS, the latter strategy is used. Since
every data in DV format consists of 80 byte DIF blocks, it is very easy
to find the 8x8 DCT data in the particular position. When a packet drops,
the receiver uses the related DIF block from the previous frame for the
DIF block the dropped packet contains. In DV format, the DCT blocks are
distributed. Thus, small amount of packet loss will not lead to critical
loss of video quality. When the sender application discards a frame, the
receiver application simply sends the previous frame to the IEEE1394 interface.
If the received packet is consisted of only an audio data , the receiver
application displays the previous picture frame with incoming audio packets.
The frame rate of the DVTS and the consumed bandwidth is shown in Fig.4.
Figure 4. Frame Discarding
RTP does not ensure the packet's reachability to the destination host.
Thus, tolerance to packet loss and jitters is required. RTP is also not
aware of congestion along with the intermediate path. Thus, mechanism to
reduce data rate of the DV/RTP stream, is also required. Our receiver application
does frame buffering for absorbing jitters. Frame buffering in the receiver
application is shown in Fig.5.
Figure 5. Frame Buffering
The receiver application consists of two processes using shared memory
for frame buffers. Though the large size of the frame buffer suppresses
jitter effect, the large buffer system cannot avoid large play out delay.
The number of frame buffer is settled considering the network situation
and the requirement of application when starting the receiver application.
1)One process is a receiver process that decapsulates a RTP packet into
a DV packet, and writes it to the shared frame buffer. The receiver process
updates the shared frame buffer with simply by overwriting newly received
data. The receiver process ignores the incontinuity of the DV stream, due
to some reasons as unexpected packet losses or senders not to send the
packets continuously. The field in the shared frame buffer without receiving
a new DV data will not be updated and the previous frame data will remain
within that field. When the receiver process finishes writing a frame,
the receiver notifies it to the display process using the flag in the shared
frame buffer.
2)The other process is a display process that sends out the shared
frame buffer data to the IEEE1394 interface. The display process examines
each flag for the frame buffer in the shared memory, sends it when the
flag is set. The previous frame buffer is used when the next frame buffer
is not ready.
4.3 IPv6 Support and multicast
For compatibility to the next generation Internet, Version 4 and 6 of IP
are implemented and supported in DVTS. For IPv6 support, KAME[8] implementation
for FreeBSD 3.3 is used. Modification for supporting IPv6 is minimal.
We measured bandwidths consumed by the DVTS traffic over IPv4 and IPv6.
The network topology for the measurement is shown in Fig 6. Single DV sender
sends DV stream to DV receiver through a PC router. The measurement was
performed in the PC router. No commodity traffic was in the network during
the measurement. The measured bandwidth and the frame rate for IPv4 and
IPv6 are shown in Table 1. Since the difference between IPv4 and IPv6 was
only the IP header, there were no significant difference between the two.
Figure 6.
Table 1. Traffic of DV Stream on IPv4 and IPv6
| frame rate |
bandwidth v4 (Mbps) |
bandwidth v6 (Mbps) |
| 1/1 |
30.47 |
31.70 |
| 1/2 |
15.72 |
16.83 |
| 1/3 |
11.48 |
11.84 |
| 1/4 |
9.01 |
9.33 |
| 1/5 |
7.54 |
7.83 |
| 1/10 |
4.74 |
4.87 |
| 1/20 |
3.26 |
3.39 |
| 1/30 |
2.79 |
2.90 |
4.4 DV/RTP application on global Internet
We have demonstrated communication and conferences between long distance
using DV/RTP.
We have presented a DV communication efforts between the USA and Japan
using APAN Trans pacific link on November, 1998(JST) for showing the effectiveness
of use over the world scale Internet (Fig. 7). The UDP encapsulated packet
from the USA to Japan was also forwarded to Korea.
The communication effort was a 90 minute lecture given from USA by
J.Murai, one of the co-authors. The inter-continent lecture was held bi-directionally.
The Japanese students’ responses and questions were also brought in via
the same system. The lecture was held at the Keio University Shonan Fujisawa
Campus(SFC), Japan. The lecture was done using a half frame rate. There
were no packet drops while using half rate during the lecture. We changed
rate during the lecture to show and explain about frame discarding.
Figure 7. Network Topology of DV Communication at SC98
The network bandwidth used at TransPAC for this lecture is shown in
Fig 8. The graph was created by MRTG(Multi Router Traffic Grapher)[9].
The grey area is a five minute exponentially decaying moving average of
input bits per second on the USA to Japan Exchange Point. The solid line
is a five minute exponentially decaying moving average of output bits per
second on the USA to Japan Tokyo Exchange Point.
Figure 8. MRTG graph
On November 1999, we multicasted realtime DV/RTP stream toward 10 organizations
widely distributed in Japan, from Kurashiki University of Science and the
Arts. The network topology is shown in Fig 9. The backbone network includes
the JGN (Japan Gigabit Network, TTNet (Tokyo Telecommunication Network
Co., Inc. Laboratory) experimental network, and CRL(Communication Research
Laboratory) experimental network. Also, OKIX (Okayama Internet Exchange,
http://www.okix.or.jp)provided the special technical support for the demonstration.
The WIDE project workshop held at the Kurashiki University of Science and
the Arts has been an interactive multimedia distributed remote conference.
The workshop was multicasted to the following sites.
-
Kyushu University (http://www.kyushu-u.ac.jp),
-
Kyushu Institute Technology (http://www.kyutech.ac.jp),
-
Hiroshima University (http://www.hiroshima-u.ac.jp),
-
Osaka University (http://www.osaka-u.ac.jp),
-
NAIST (Nara Institute of Science and Technology, http://nara.aist-nara.ac.jp),
-
JAIST (Japan Advanced Institute of Science and Technology, http://www.jaist.ac.jp),
-
Kyoto University (http://www.kyoto-u.ac.jp),
-
The University of Tokyo (http://www.u-tokyo.ac.jp),
-
Keio University (http://www.keio.ac.jp),
-
KAME Project (http://www.kame.net),
-
TTNet(http://www.ttnet.co.jp).
The demonstration system uses the IPv6 technology and the PIM-SM (Protocol
Independent Multicast - Sparse Mode) routing protocol, those are the next
generation Internet core technologies discussed at the IETF(Internet Engineering
Task Force).
Figure 9. JB Network
4.5 Interoperability with Other Implementations
DV/RTP function has been provided by Comet router made by the Comet project
at Fujitsu Laboratory. Comet box is a prototype system for the next generation
of the Internet. Comet box has IEEE1394 interface and offers DV/RTP forwarding.
We have verified the interoperability between Comet and our system. Also,
some other DV over RTP system development efforts are ongoing and the activity
will be accelerated after the DV over RTP standard is fixed.
5. Conclusion and Future Work
In this paper, we focused on Digital Video format as high bandwidth, high
quality video and audio media. We implemented DVTS system for transmitting
DV stream through the Internet. IPv4 and v6 are used for network layer
protocol. For interoperability with other implementations, DV/RTP format
is being discussed at the IETF. DVTS has ability to decrease bandwidth
usage of DV/RTP stream, by discarding DV picture frames. Discarding DV
picture frames can decrease large amount of bandwidth usage of DV/RTP stream,
and still obtain good quality of video and audio media for communication.
DVTS have been demonstrated at variety of network configuration. Our current
sender application uses static frame rate decided by the sender side operator.
However, in the Internet, the effective network bandwidth is likely to
change every moment. In DV/RTP, RTCP can be used for feedback of the network
situation. Automatic adaptation to the effective bandwidth using RTCP is
an open issue, and will be accorded in the future.
We also need to do an interoperability test between other DV/RTP systems.
In this paper, we only mention the implementation for consumer DV system
for 525-60 system. There is a DV/RTP system implemented in 625-50 system.
For communication with that implementation, the DV/RTP system for 625-50
system is under development.
We would also like to extend SDTI DV system not only IEEE1394 and to
implement the system for professional DV and DV HD. When using DV products
without IEEE1394 interfaces, a mechanism to display DV image and to play
audio is required. We would also like to create a system that does not
require a IEEE1394 interface.
Bibliography
[1] "Specifications Consumer-Use Digital VCR's using 6.3mm magnetic tape", HD Digital VCR Conference, 1994 society, 1995
[2] "IEEE Standard for a High Performance Serial Bus", IEEE computer society, 1995
[3] K.Kobayashi A.Ogawa S.Casner C.Bormann, "RTP Payload Format for DV Video", Internet Draft, 2000
[4] K.Kobayashi A.Ogawa S.Casner C.Bormann, "RTP Payload Format for 12-, 20- and 24-bit DV Audio", Internet Draft, 2000
[5] A.Ogawa K.Kobayashi K.Sugiura O.Nakamura J.Murai, "Design and Implementation of DV Stream over Internet", IWS99, 1999
[6] A.Ogawa, "DVTS(Digital Video Transport System)", http://www.sfc.wide.ad.jp/DVTS/, as of 1999
[7] K.Kobayashi, "Design and Implementation of Firewire device driver on FreeBSD", pp 41-51 Proc. FREENIX, USENIX 1999, 1999
[8] "The KAME Project", http://www.kame.net/, as of 1999
[9] T.Oetiker, "Multi Router Traffic Grapher", http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html, as of 1999
[10] W.B.Pennebaker J.L.Mitchell, "JPEG Still Image Data Compression Standard", published by : Van Nostrand Rheinhlod, 1993
[11] ISO/IEC JTC1/SC29/WG11, "Short-MPEG1 description, Coding of moving
pictures and associated audio for digital storage media at up to about 1,5 Mbit/s", ISO, 1996
[12] "School of Internet", http://www.sfc.wide.ad.jp/soi/, 1999
[13] S.Jacobs A.Eleftheriadis, "Providing video serivces over networks without quality of service guarantees", RTMW'96, Sophia Antipolis, France, 1996
[14] H.Schulzrinne S.Casner R.Frederick V.Jacobson,"RTP: A Transport Protocol for Real-Time Applications",RFC1889, 1996
[15] S.Floyd K.Fall, "Promoting the Use of End-to-End Congestion Control in the Internet", IEEE/ACM Transactions on Networking, 1998
[16] J.Mahdavi S.Floyd, "TCP-friendly unicast rate-based flow control", http://ftp.ee.lbl.gov/floyd/papers.html, as of 1997
[17] D.Sialem H.Schulzrinne, "The Loss-Delay Based Adjustment Algorithm: A TCP-Friendly Adaption Scheme", Network and Operating System Support for Digital Audio and Video (NOSSDAV), 1998
[18] K.Cho, "A Framework for Alternate Queueing: Towards Traffic Management by PC-UNIX Based Routers." In Proceedings of USENIX, 1999