
reassembly buffer requirements for TCP depending on
framing solutions adopted. The draft presents analysis
demonstrating that MPA allows reassembly buffer size to
scale as a function of maximum segment size rather than as
a function of bandwidth delay product. Another important
b
enefit of DDP/MPA is that it ensures the presence of an
iSCSI PDU (or more generally, any upper layer protocol
PDU) at a fixed offset in the first DDP segment of a DDP
Message. If the location of the iSCSI PDU had no
relationship to the TCP segment boundaries, the receiver
complexity is largely increased. The “Analysis of MPA
over TCP Operations” [12] also provides an illustration for
the benefit of header alignment in streamlining the DMA
operations and computational load of the receiver.
The iWARP protocol suite is implemented in an RNIC
which is capable of providing the direct data placement
function without requiring firmware or hardware
customizations in the RNIC for each ULP. The iWARP
protocol suite is depicted in Figure 1.
Internet Protocol (IP)
Transmission Control
Protocol (TCP)
Markers with PDU Alignment
(MPA)
Direct Data Placement
Protocol (DDP)
RDMA Protocol (RDMAP)
iWARP
Figure 1 iWARP protocol suite over TCP/IP
The primary unit of information exchange on an iWARP
connection is a DDP Segment that is self-describing from a
data placement perspective. The MPA framing support is
used by the RNIC to locate the DDP control information of
the current DDP segment in the TCP stream, whether or not
the TCP packets arrive in order. The DDP control
information supports the direct data placement and enables
the RNIC to steer the payload to its final memory location.
The RDMA support is provided in the RDMAP control
information and it enables the ULP to specify RDMA Read,
RDMA Write, or Send semantics to transfer the data.
A ULP interacts with an RNIC via the RNIC Interface (RI)
using a set of functional interfaces called “Verbs” [9].
From this perspective, the RI is also called a “Verbs
provider” and the ULP a “Verbs consumer”. Verbs,
covering various functional aspects of the RI, are defined in
the iWARP architecture. The Verbs consumer accesses the
RI by creating one or more Queue Pairs (QPs), each of
w
hich consists of two Work Queues (WQs): a Send Queue
(SQ) and a Receive Queue (RQ) as shown in Figure 2. This
QP is associated with a TCP connection (one to one
mapping) by the Verbs consumer for carrying out the send
and receive operations. Each request to the RI by the
consumer takes the form of a Work Request (WR) that the
consumer posts to the SQ or RQ as appropriate by invoking
Verbs in order to convey the request to the RI. All
outbound RDMA operations such as RDMA Read, RDMA
Write, and Send are initiated via WRs posted to the SQ.
All inbound solicited data carried by the RDMA Write and
RDMA Read Response messages is stored by the RNIC
directly into the ULP buffer(s). The control and un-solicited
data operations such as receiving incoming Send Messages
are satisfied via Work Requests posted to the RQ. Each
WQ (SQ or RQ) is associated with a Completion Queue
(CQ) that notifies the Consumer of the completion of the
requested operation via a Work Completion (WC).
Verbs
Consumer
Verbs
Provider
RI
SQ RQ
CQ
CQ
WR
WR
QP
WC WC
Legend
WR – Work Request
WC- Work Completion
RI – RNIC Interface
SQ – Send Queue
RQ – Receive Queue
CQ – Completion Queue
QP – Queue Pair
Figure 2 iWARP functional interface
As RNICs support a diversified set of applications, the use
of RNICs promotes the convergence on the server hardware
and networking requirements. As such it creates larger
economic thrust for creating a larger RNIC market than
several application-specific markets for iSCSI NICs, IPC
NICs, TOE NICs etc. An RNIC employs the iWARP
protocol suite defined by the RDMA Consortium that have
been accepted as drafts by the IETF Remote Direct Data
Placement Working Group (RDDP WG) and are expected
to be supported by a broad list of hardware and software
vendors.
Proceedings ot the ACM SIGCOMM 2003 Workshops 211 August 2003