Keio University, Graduate School of Media and Governance
MAUI Project
Ph.D. Dissertation

[ English | Japanese ]
Back to Index Page

TITLE Low-latency Distributed Architecture using IP (Internet Protocol)

Due to the migration of services to the cloud and automatic update of a computer screen using Ajax and WebSockets, and the demands of faster applications, improve- ment in network throughput and low latency communication is required in the data center. For example, a distributed memory cache server, such as memcached, which is used by Facebook and Twitter, or storage services, such as iSCSI, needs low-latency communication.

IP network technology, which is used in a data center, is superior in longevity. However, compared with InfiniBand or a special purpose network, the IP network takes a long time to communicate, causing lower performance of processing in which low-delay communication is required for these services. Even when RDMA is used in low-latency communication, overhead is caused by the mechanism for DMA read.

In this research, we propose IP-NUMA to reduce the communication delay com- pared to RDMA by removing the DMA read. IP-NUMA is a technique which takes the place of Berkeley sockets, and reduces communication delay on PCs, by sharing parts of memory among nodes by using a NUMA architecture and creating a flexible IP protocol as an interconnect. A ping-pong program writing memories using IP- NUMA (implemented on PC+FPGA with Linux) reduced communication delay by 90% compared to Berkeley sockets. One-way communication delay could be reduced by 11% or more 1.081μs as compared with the case of using RDMA.

We clarified the process and the minimum packet forwarding delay for IP packet forwarding of L3 switch. In order to reduce the packet forwarding delay, FIBNIC is pipelined in the minimum number of stages. It is an IP packet forwarding method that does not use packet buffers. When we implemented this method on a commercial FPGA board, we confirmed that the forwarding delay even with 410,000 IPv4 routes is constant. Our latency corresponds to 20∼25% of what when we measured 64 byte frame with a commercial Gigabit L3 switch.

When we applied FIBNIC and IP-NUMA, to memcached, they were 4 times as efficient in transaction rates on a local IP network compared to Berkeley sockets. An application scope of IP protocol as an interconnect such as cloud or HPC by using commodity network will be expected.

Keywords: Data-center, Low-latency, RDMA, IP, memcached, FPGA.

CONTACT To obtain the dissertation, please contact;
MATSUYA, Takeshi ( macchan at )

MAUI Proejct
Last update:

Back to Project Home Page