CS 838: NetFPGA Tutorial
Theophilus Benson
Outline
Background: What is the NetFPGA?
Life cycle of a packet through a NetFPGA
Demo
What is the NetFPGA?
Networking
Software CPU Memory
running on a
standard PC
PCI
A hardware
1GE
accelerator
built with Field FPGA 1GE
Programmable
1GE
Gate Array
Memory
driving Gigabit 1GE
network links
NetFPGA Router
Function
4 Gigabit Ethernet ports
Fully programmable
FPGA hardware
Open-source FPGA hardware --
Verilog base design
Open-source Software -- Linux user Level
Drivers in C and C++
NetFPGA Platform
Major Components
Interfaces
4 Gigabit Ethernet Ports
PCI Host Interface
Memories
36Mbits Static RAM
512Mbits DDR2 Dynamic RAM
FPGA Resources
Block RAMs
Configurable Logic Block (CLBs)
Memory Mapped Registers
NetFGPA: Router Design
Pipeline of modules
FIFO queues between each module
Inter module communication
CTRL: Send on ctrl bus (8 bits)
Metadata about the data being send
DATA: Send on data bus (64 bits)
RDY: Signifies ready to receive packet (1 bit)
WR: Signifies packet being send(1bit)
NetFPGA
Software
Linux user-level
processes
Linux
Processes
Verilog on
Hardware
NetFPGA PCI board
FGPA FGPA
Modules 1 Modules 2
Example: An IP Router on NetFPGA
Management
Software
& CLI
Linux user-level
Routing processes
Exception Protocols
Processing Routing
Table
Verilog on
Hardware
NetFPGA PCI board
Forwarding
Switching
Table
Life of a Packet through the hardware
192.168.10 192.168.10
1.x port0 port2 2.y
Router Stages
MAC CPU MAC CPU MAC CPU MAC CPU
RxQ RxQ RxQ RxQ RxQ RxQ RxQ RxQ
Input Arbiter
Output Port Lookup
Output Queues
MAC CPU MAC CPU MAC CPU MAC CPU
TxQ TxQ TxQ TxQ TxQ TxQ TxQ TxQ
Inter-module Communication
Using Module Headers:
Ctrl Word Data Word
(8 bits) (64 bits)
x Module Hdr Contain information
such as packet length,
input port, output port,
y Last Module Hdr
0 Eth Hdr
0 IP Hdr
0
0x10 Last word of packet
Inter-module Communication
data
ctrl
wr
rdy
MAC Rx Queue
Rx Queue
Pkt length,
0xff
input port = 0
Eth Hdr:
0 Dst MAC = port 0,
Ethertype = IP
IP Hdr:
0 IP Dst: 192.168.2.3, TTL:
64, Csum:0x3ab4
0 Data
Input Arbiter
Pkt
Pkt
Pkt
Output Port Lookup
Output Port Lookup 5- Add output
1- Check input
port matches Dst port module
MAC
0x04 output port = 4
2- Check TTL, 6- Modify MAC
Pkt length,
checksum 0xff Dst and Src
input port = 0
EthHdr:
EthHdr: Dst Dst
MAC MAC =0
= nextHop addresses
3- Lookup next 0 SrcSrc MAC
MAC = x, 4,
= port
hop IP & output Ethertype = IP
port (LPM) IP Hdr: 7-Decrement TTL
0 IP Dst: 192.168.2.3, TTL: and update
64,
63, Csum:0x3ab4
Csum:0x3ac2 checksum
4- Lookup next 0 Data
hop MAC
address (ARP)
Output Queues
OQ0
OQ4
OQ7
MAC Tx Queue
MAC Tx Queue
0x04 output port = 4
Pkt length,
0xff
input port = 0
EthHdr: Dst MAC = nextHop
0 Src MAC = port 4,
Ethertype = IP
IP Hdr:
0 IP Dst: 192.168.2.3, TTL:
64,
63, Csum:0x3ab4
Csum:0x3ac2
0 Data
NetFPGA-Host Interaction
Linux driver interfaces with hardware
Packet interface via standard Linux network stack
Register reads/writes via ioctl system call (with
convenience wrapper functions)
readReg(nf2device *dev, int address, unsigned *rd_data)
writeReg(nf2device *dev, int address, unsigned *wr_data)
eg:
readReg(&nf2, OQ_NUM_PKTS_STORED_0, &val);
NetFPGA-Host
Register access
Interaction
2. Driver
PCI Bus
performs PCI
memory
read/write
1. Software makes ioctl call on
network socket. ioctl passed to
driver.
NetFPGA-Host Interaction
Packet transfers shown using DMA interface
Alternative: use programmed IO to transfer
packets via register reads/writes
slower but eliminates the need to deal with
network sockets
DEMO: Life of a Packet through the
hardware
192.168.1.x 192.168.2.y
port0 port2
Programming the FPGA with your code
nf2_download NF2/bitfiles/reference_router.bit
Mirror linux arp
./NF2/projects/router_kit/sw/rkd
Helpful tool
./NFlib/C/router/cli
Shows forwarding tables {arp table, ip table}
Allows to modify tables
Useful Links
NetFPGA Website
NetFPGA Wiki
NetFPGA Guide
Walkthrough the Reference Designs
The Verilog Golden Reference Guide
Questions
Verilog
Hardware
Concurrent Description Languages
By Default, Verilog statements
evaluated concurrently
Express fine grain parallelism
Allows gate-level parallelism
Provides Precise Description
Eliminates ambiguity about operation
Synthesizable
Generates hardware from description
Verilog Data Types
reg [7:0] A; // 8-bit register, MSB to LSB
// (Preferred bit order for NetFPGA)
reg [0:15] B; // 16-bit register, LSB to MSB
B = {A[7:0],A[0:7]}; // Assignment of bits
reg [31:0] Mem [0:1023]; // 1K Word Memory
integer Count; // simple signed 32-bit integer
integer K[1:64]; // an array of 64 integers
time Start, Stop; // Two 64-bit time variables
From: CSCI 320 Computer Architecture
Handbook on Verilog HDL, by Dr. Daniel C. Hyde :
http://eesun.free.fr/DOC/VERILOG/verilog-manual.html
Signal Multiplexers
Two input multiplexer (using if / else)
reg y;
always @*
if (select)
y = a;
else
y = b;
Two input multiplexer (using ternary operator ?:)
wire t = (select ? a : b);
From: http://eesun.free.fr/DOC/VERILOG/synvlg.html
Larger Multiplexers
Three input multiplexer
reg s;
always @*
begin
case (select2)
2'b00: s = a;
2'b01: s = b;
default: s = c;
endcase
end
From: http://eesun.free.fr/DOC/VERILOG/synvlg.html
Synchronous Storage
Values change at times
Din
Elements
D Q Dout
governed by clock
Clock
Clock Clock 1 Clock Transition
Input to circuit 0
t=0 t=1 t=2 time
Clock Event
Example: Rising edge
Din A B C
t=0
Flip/Flop Clock Transition
Transfers Value From Din to Dout S0 A B
Dout on Clock event t=0
Inputs (X)
Copyright 2001, John W. Lockwood, All Rights Reserved
S(t)
Com binational Logic
Q D
...
Q D
Finite State Machines
S(t+1)=
(X,S(t))
Outputs (Z)
(X,S(t))
[Mealy]
-or-
(S(t))
[Moore]
Next
State
State Storage
Synthesizable Verilog : Delay Flip/Flops
D-type flip flop
reg q;
always @ (posedge clk)
q <= d;
D type flip flop with data enable
reg q;
always @ (posedge clk)
if (enable)
q <= d;
From: http://eesun.free.fr/DOC/VERILOG/synvlg.html
More on NetFPGA System
NetFPGA System
Web & Browser
Monitor
CAD Video & Video
Software
Tools Server Client
User Space
Linux Kernel
Packet Forwarding Table
PCI PCI-e
VI
VI
VI
VI
NIC
NetFPGA Router
Hardware
GE
GE
GE
GE
GE
GE
(nf2c0 .. 3) (eth1 .. 2)
NetFPGA System Implementation
NetFPGA Blocks
Virtex-2 Pro FPGA
4.5MB ZBT SRAM
64MB DDR2 DRAM
PCI Host Interface
4 Gigabit Ethernet ports
Intranet Test Ports
Dual or Quad Gigabit
Etherents on PCI-e
Internet
Gigabit Ethernet
on Motherboard
Processor
Dual-Core CPU
Operating System
Linux CentOS 4.4
NetFPGA Lab Setup
Client GE Eth2 : Server
PCI-e
Dual
(eth1 NIC
.. 2) Eth1 : Local host
GE
Server
CPU x2 Net-FPGA GE Nf2c3 : Adj. Server
NetFPGA GE Nf2c2 : Local Host
PCI
Internet
Control SW Router GE Nf2c1 : Adjacent
Hardware Nf2c0 : Adjacent
GE
CAD Tools
Exception Path
Exception Packet
Example: TTL = 0 or TTL = 1
Packet has to be sent to the CPU which will
generate an ICMP packet as a response
Difference starts at the Output Port lookup
stage
Exception Packet Path
Software
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
PCI Bus
nf2_reg_grp
CPU CPU CPU CPU CPU CPU CPU CPU
RxQ TxQ RxQ TxQ RxQ TxQ RxQ TxQ
NetFPGA user data path
MAC MAC MAC MAC MAC MAC MAC MAC
TxQ RxQ TxQ RxQ TxQ RxQ TxQ RxQ
Ethernet
Output Port Lookup
1- Check input
port matches Dst
MAC
0x04 output port = 1
2- Check TTL, Pkt length,
checksum 0xff
input port = 0
EXCEPTION! EthHdr: Dst MAC = 0,
0 Src MAC = x,
Ethertype = IP
3- Add output IP Hdr:
0 IP Dst: 192.168.2.3, TTL:
port module
1, Csum:0x3ab4
0 Data
Output Queues
OQ0
OQ1
OQ2
OQ7
CPU Tx Queue
CPU Tx Queue
0x04 output port = 1
Pkt length,
0xff
input port = 0
EthHdr: Dst MAC = 0,
0 Src MAC = x,
Ethertype = IP
IP Hdr:
0 IP Dst: 192.168.2.3, TTL:
1, Csum:0x3ab4
0 Data
ICMP Packet
For the ICMP packet, the packet arrives at the
CPU Rx Queue from the PCI Bus
Follows the same path as a packet from the MAC
until the Output Port Lookup.
The OPL module seeing the packet is from the
CPU Rx Queue 1, sets the output port directly to
0.
The packet then continues on the same path as
the non-exception packet to the Output Queues
and then MAC Tx queue 0.
ICMP Packet Path
Software
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
PCI Bus
nf2_reg_grp
CPU CPU CPU CPU CPU CPU CPU CPU
RxQ TxQ RxQ TxQ RxQ TxQ RxQ TxQ
NetFPGA user data path
MAC MAC MAC MAC MAC MAC MAC MAC
TxQ RxQ TxQ RxQ TxQ RxQ TxQ RxQ
Ethernet
NetFPGA-Host
NetFPGA to host packet transfer
Interaction
1. Packet arrives
forwarding table sends
to CPU queue
2. Interrupt
PCI Bus
notifies 3. Driver sets up
driver of and initiates DMA
packet arrival transfer
NetFPGA-Host
NetFPGA to host packet transfer (cont)
Interaction
5. Interrupt
4. NetFPGA signals
PCI Bus
transfers completion
packet via of DMA
DMA
6. Driver passes packet to
network stack
NetFPGA-Host
Host to NetFPGA packet transfers
Interaction
2. Driver sets up 3. Interrupt
signals
PCI Bus
and initiates DMA
transfer completion
of DMA
1. Software sends packet via
network sockets. Packet
delivered to driver.