Netdev 0x13
Netdev 0x13
table ip filter {
ct timeout test-tcp {
protocol tcp;
l3proto ip policy = {
established: 180,
close_wait: 10,
close: 10
}
}
chain output {
ip protocol tcp dport 8000 ct timeout set "test-tcp"
}
}
nftlb: Load balancing
● Load balancer userspace daemon: nftlb (Version 0.4)
– https://github.com/zevenet/nftlb
● 4 modes:
– SNAT: actually masquerade + dnat (emulates proxy behaviour)
– DNAT: just dnat
– Direct Server Return (DSR) [IMPROVED]
– Stateless DNAT [NEW]
● Schedulers: Weight, RR, Hash, symhash.
● IPv4/IPV6 support.
● REST API.
● JSON configuration file.
● Automated testbed.
nftlb: Updates
● Automatic DSR configuration.
– Fetch MAC and interfaces from netlink.
● New Stateless DNAT support.
● Support for load balancing with helpers, eg. FTP.
● Support for packet marking (per farm/backend).
● Blacklist and whitelists.
● Userspace queueing via nfqueue.
● Established connection ratelimiting (connlimit)
● Configurable tuple in hash-based load balancing.
Flowtable bypass
Flowtable bypass (2)
●
For each packet, extract tuple and perform look up at the flowtable.
● Miss: Let the packet follow the classic forwarding path.
●
Hit:
– Attach route from flowtable entry (… flowtable is acting as a cache).
– NAT mangling, if any.
– Decrement TTL.
– Send packet via neigh_xmit(...).
● Exceptions (any of them, forces slow path):
– If packet is over MTU, pass it up to classic forwarding path.
– Secpath info is available.
– IP Options available.
●
Garbage collector:
– Expire flows if we see no more packets after N seconds.
– TCP reset and fin packets are passed up to slow path.
Flowtable bypass (3)
● Configure flow bypass through one single rule:
table ip x {
flowtable f {
hook ingress priority 0; devices = { eth0, eth1};
}
chain y {
type filter hook forward priority 0;
ip protocol tcp flow add @f
}
}
● Conntrack entries are owned by the flowtable:
# cat /proc/net/nf_conntrack
ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392
dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443
dport=36392 [OFFLOAD] mark=0 zone=0 use=2
Flowtable bypass (4)
● Flow offload forward PoC (from 2018) in
software is ~2.75 faster in software:
● pktgen_bench_xmit_mode_netif_receive.sh to dummy
device to exercise the forwarding path
– One single CPU
– Smallest packet size (worst case)
● Performance numbers:
– Classic forwarding path (baseline): 1848888pps
– Flow offload forwarding: 5155382pps
Flowtable bypass (5)
●
Upstream since 4.16 (January 2018).
●
Recent patches:
– Tear down feature: send flows back to slow path
●
RST and FIN packets.
●
Limited pickup time.
● Only for TCP and UDP by now.
– Fix offloading of SNAT+DNAT flows
– Fix: Don’t remove offload when other netns's interface is down.
– Fix interaction with VRF.
– Fix interaction with helpers and /proc/sys/net/netfilter/nf_conntrack_helper
set to 1.
– Attach dst to skbuff.
Flowtable bypass (6)
● Hardware offload infrastructure (~200 LOC)
available at public git branch:
– https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
/log/?h=flow-offload-hw-v3
● Not yet upstream, waiting for a driver :-(
● User enables explicitly “offload” flag to enable
hardware offload.
● New ndo hook for offloads or generalise existing
ndo for this purpose.
Flowtable bypass (7): IPsec
● Patch to add IPSec support (not tested):
– https://patchwork.ozlabs.org/patch/982747/
● Setup entry in flowtable from first packet.
– Needs explicit configuration from user.
Policy HW offload
https://lwn.net/Articles/766695/
Policy HW offload (2)
● Top-level flow_rule:
– flow_match
● flow_dissector (enum FLOW_DISSECTOR_KEY_*)
● mask (opaque)
● key (opaque)
– flow_action
● num_actions
● actions[]
Policy HW offload (3)
● Supported actions (from what drivers can do):
– Accept, drop, trap, goto
– Redirect and mirred
– VLAN: push, pop, mangle
– Tunnel: Encapsulation, decapsulation
– Packet: mangle, add
– Checksum
– Mark
– Wake-on-lan (ethtool)
– Queue (ethtool: packet steering)
Policy HW offload (4)
● Drivers using this infrastructure:
– Broadcom: Bnxt (flower), bcm_sf2 (ethtool)
– Chelsio: cxgdb4 (flower)
– Intel: i40eia, iavf, igb (flower)
– Mellanox: mlx5, mlxsw (flower)
– Nettronome: nfp (flower)
– Qlogic: qede (ethtool + flower)