KNOW MORE ABOUT RDMA OVER TCP, SOCKET DIRECT PROTOCOL (SDP) AND iSCSI EXTENSIONS FOR RDMA (iSER)
It has already been proved that RDMA can improve the performance of commercial applications, but none of these RDMA-enabled applications is commercially successful. This is mostly due to the fact that today (2003) RDMA-capable network cards are not interoperable and thus add costs for owning and managing RDMA-enabled applications. Therefore in May 2002 several companies founded the RDMA Consortium to standardize the RDMA protocol suite. The consortium has standardized all interfaces required to implement the software and the hardware for RDMA over TCP. In addition to that, it has been defined two upper layer protocols – the Socket Direct Protocol (SDP) and the iSCSI Extension for RDMA (iSER) – which exploit RDMA for fast and CPU light communication. The consortium has forwarded all specifications to the IETF and intends to complete its activity when these standards have been ratified as Internet standards.
RDMA over TCP offloads much of TCP protocol processing overhead from the CPU to the Ethernet network card. Furthermore, each incoming network packet has enough information, thus its payload can be placed directly to the proper destination memory location, even when packets arrive out of order. That means RDMA over TCP gains the benefits of the Virtual Interface Architecture whilst it uses the existing TCP/IP/Ethernet network infrastructure. RDMA over TCP is layered on top of TCP, needs no modification of the TCP/IP protocol suite and thus can benefit from underlying protocols like IPsec. RDMAover TCP has some advantages in comparison to TCP/IPOffload Engines (TOEs). TOEs move the load for TCP protocol processing from the CPU to the network card, but
the zero copy of incoming data streams is very proprietary in the TOE design, the operating systems interfaces, and the applications communication model. Thus in many cases, TOEs do not support a zero copy model for incoming data. RDMA over TCP benefits from its superior specification, thus a combination of TCP Offload Engines and RDMA provides the optimal architecture for high speed networking by reducing the CPU load and avoiding the need for copying data buffers. The RDMA consortium expects that the first RDMA-enabled network interface controllers (RNIC) will enter the market in 2004. RDMA over TCP, VI Architecture and InfiniBand each specify a form of RDMA, but these are not exactly the same. The aim of VI Architecture is to specify a form of RDMA
without specifying the underlying transport protocol. On the other hand, InfiniBand species an underlying transmission technique which is optimized to support RDMA semantics. Finally, RDMA over TCP specifies a layer which will interoperate with the standard TCP/IP protocol stack. As a result, the protocol verbs of each RDMA variant are slightly different, thus these RDMA variants are not interoperable. However, the RDMA Consortium specified two upper layer protocols which utilize RDMA over TCP. The Socket Direct Protocol (SDP) represents an approach to accelerate TCP/IP communication. SDP maps the socket API of TCP/IP onto RDMA over TCP so that protocols based upon TCP/IP such as NFS and CIFS can benefit from RDMA without being modified. SDP benefits from offloading much of the TCP/IP protocol processing burden from CPU and its ability to avoid copying packets from buffer to buffer. It is very interesting to observe that applications using SDP think that they are using native TCP when the real transport of the data is performed by an integration of RDMA and TCP. iSCSI Extension for RDMA (iSER) is the second upper layer protocol specified by the RDMA consortium. It is an extension of the iSCSI protocol (Section 3.5.1) which enables
iSCSI to benefit from RDMA eliminating TCP/IP processing overhead on generic RNICs. This is important as Ethernet and therefore iSCSI approach 10 GBit/s in 2004. iSER is not a replacement for iSCSI, it is complementary. iSER requires iSCSI components such as login negotiation, discovery, security and boot. It only changes the data mover model of iSCSI. It is expected the iSCSI end nodes and iSCSI/iSER end nodes will be interoperable. During iSCSI login both end nodes will exchange characteristics, thus each node is clearly aware of the other's node transport capabilities. RDMA is not yet widespread in current applications; however, it opens up new possibilities for the implementation of distributed synchronization mechanisms for caching and locking in databases and file systems. We expect that the completed standardization of RDMA over TCP will boost the adoption of RDMA-enabled applications in the next years. All distributed applications will benefit from RDMA-enabled transport via SDP and iSER whilst the applications itself remain unchanged. Furthermore, communication intensive applications will be adapted to utilize the native RDMA communication, for instance, file systems, databases, and applications in parallel computing. Section 4.2.5 shows in the example of the Direct Access File System (DAFS) how RDMA changes the design of network file systems.
into system buses, host I/O buses and I/O buses. The most important I/O buses for servers are SCSI, Fibre Channel and the family of IP storage protocols. SCSI makes it possible to address storage devices in a block-oriented manner via targets and LUNs. The SCSI protocol is also encountered in Fibre Channel and IP storage: these two new transmission technologies replace the SCSI cable by a serial network and continue to use the SCSI protocol over this network. Fibre Channel is a new transmission technology that is particularly well suited to storage networks. With point-to-point, arbitrated loop and fabric it defines three different network topologies that – in the case of the fabric – can connect together up to 15.5 million servers and storage devices. IP storage takes a similar
approach to Fibre Channel. However, in contrast to Fibre Channel it is based upon the tried and tested TCP/IP, and thus mainly upon Ethernet. Anyone today (2003) who wants to implement block-oriented storage networks must take Fibre Channel as the basis. In the near future, IP storage will probably establish itself as an alternative to Fibre Channel. The most important host I/O bus technology today is the PCI bus. However, PCI is slowly coming up against its physical limits, which means that it can no longer keep up with the throughput of networks such as Fibre Channel and Ethernet. InfiniBand, which can probably replace the PCI bus in high-end servers with a serial network, can help here. The Virtual Interface Architecture (VIA) represents a technology that allows distributed
applications to exchange data quickly and in a manner that lessens the load on the CPU by bypassing the operating systems of the computers. Finally, the standardization of RDMA over TCP and its application protocols SDP and iSER will adapt the TCP protocol and the iSCSI protocol for the requirements of a 10 GBit/s network technology. With the disk subsystems discussed in the previous chapter and the Fibre Channel and IP storage I/O techniques discussed in this chapter we have introduced the technologies that are required to build storage-centric IT systems. However, intelligent disk subsystems
and storage networks represent only the physical basis for storage-centric IT systems. Ultimately, software that exploits the new storage-centric infrastructure and thus fully develops its possibilities will also be required. Therefore, in the next chapter we show how intelligent disk subsystems and storage networks can change the architecture of file systems.
No comments:
Post a Comment