0% found this document useful (0 votes)
49 views

Netdev 0x13

The document discusses updates to various networking tools including iptables, ipset, connection tracking, NAT, and nftables. It also covers the nftlb load balancing daemon and ongoing work on flowtable bypassing and hardware offloading of networking policies.

Uploaded by

sergeyspivak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Netdev 0x13

The document discusses updates to various networking tools including iptables, ipset, connection tracking, NAT, and nftables. It also covers the nftlb load balancing daemon and ongoing work on flowtable bypassing and hardware offloading of networking policies.

Uploaded by

sergeyspivak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Netfilter mini-workshop

Pablo Neira Ayuso


<[email protected]>
NetDev 0x13
Prague, Czeckia
Index
● Upstream updates since last NetDev
– iptables + ipset
– connection tracking
– NAT
– nftables
● nftlb: userspace daemon for load balancing
● Flowtable updates
● Policy HW offload
iptables + ipset: updates
● iptables:
– Connlimit fixes after summer updates.
– netns removal fixes: RATEEST and CLUSTERIP.
● ipset:
– New netlink commands: Deprecate
set/getsockopt() interface. Bump protocol version 7.
– Match on input MAC address.
conntrack: updates
● Honor CTA_MARK_MASK from ctnetlink update path.
● UDP tracker timeout updates:
– Stream timeout after 2s: DNS parallel IPv4/IPv6 request => non-
evictable entries
– Shrink default timeout from 180 to 120 seconds.
● Enable conntrack hooks by default via modparam
– modprobe nf_conntrack enable_hooks=1
● Helpers:
– SIP: Support for a few new scenarios / corner cases.
– Amanda: support for protocol v3.4
● Remove helper hook
● Remove indirections
NAT: updates
● Use random for port allocation offset.
● Limit attempts to find spare port (soft lockup):
– 128 attempts
– Divide by 2 attempts if initial offset is busy.
● Remove indirections.
● Merge IPv4/Ipv6.
nftables: updates
● Fixes for rule replacement:
– Duplicate rule handles fix.
– Refcount leak to chain.
● Fixes for nft_compat:
– Remove list to reuse extensions.
– Fix module refcount leaks: Use .release_ops
– Fix use-after-free from rule destroy path.
● Fixes for optional chain counter
– RCU splats: suspicious usage.
– Unsafe counter replacement.
● Fix fixed size hashtable lookup on big endian w/32-bit keys
● Bogus ENOENT and EBUSY in transactions
nftables: updates (2)
● Selective rule netlink dump per table
● Faster stateful object lookups by handle
(rhashtable)
● Direct calls for built-in extensions from core.
● Rule insertion at relative position from batch
– NFTA_RULE_POSITION_ID
● Match on the “interface kind” string
● Match on tunnel mode: RX or TX.
nftables: userspace
● 236 commits since July 2018: many fixes
● Dynamic set flag:
– nft add set t s1 { type ipv4_addr ; flags dynamic ; }
● Add --literal to print hostnames
● tproxy support
● OS fingerprint support (osf)
● ipsec (xfrm) support
● Priorities as literals
– add chain ip x r { type filter hook prerouting \
priority filter + 10; }
– Display priorities numerically via -y option
nftables: userspace (2)
● honor /etc/services
● json fixes
● Misspelling suggestions:
nft add chain filtre test
Error: No such file or directory; did you mean
table ‘filter’ in family ip?
add chain filtre test
^^^^^^
● igmp support
● iifkind and oifkind support
nftables: userspace (3)
● Native set syntax to replace meters:
table ip x {
set xyz {
type ipv4_addr;
size 65535;
flags dynamic,timeout;
timeout 1h;
}
chain y {
type filter hook output priority filter; policy accept;
update @xyz { ip daddr counter } counter
}
}
nftables: userspace (4)
● Custom conntrack timeout support:

table ip filter {
ct timeout test-tcp {
protocol tcp;
l3proto ip policy = {
established: 180,
close_wait: 10,
close: 10
}
}

chain output {
ip protocol tcp dport 8000 ct timeout set "test-tcp"
}
}
nftlb: Load balancing
● Load balancer userspace daemon: nftlb (Version 0.4)
– https://github.com/zevenet/nftlb
● 4 modes:
– SNAT: actually masquerade + dnat (emulates proxy behaviour)
– DNAT: just dnat
– Direct Server Return (DSR) [IMPROVED]
– Stateless DNAT [NEW]
● Schedulers: Weight, RR, Hash, symhash.
● IPv4/IPV6 support.
● REST API.
● JSON configuration file.
● Automated testbed.
nftlb: Updates
● Automatic DSR configuration.
– Fetch MAC and interfaces from netlink.
● New Stateless DNAT support.
● Support for load balancing with helpers, eg. FTP.
● Support for packet marking (per farm/backend).
● Blacklist and whitelists.
● Userspace queueing via nfqueue.
● Established connection ratelimiting (connlimit)
● Configurable tuple in hash-based load balancing.
Flowtable bypass
Flowtable bypass (2)

For each packet, extract tuple and perform look up at the flowtable.
● Miss: Let the packet follow the classic forwarding path.

Hit:
– Attach route from flowtable entry (… flowtable is acting as a cache).
– NAT mangling, if any.
– Decrement TTL.
– Send packet via neigh_xmit(...).
● Exceptions (any of them, forces slow path):
– If packet is over MTU, pass it up to classic forwarding path.
– Secpath info is available.
– IP Options available.


Garbage collector:
– Expire flows if we see no more packets after N seconds.
– TCP reset and fin packets are passed up to slow path.
Flowtable bypass (3)
● Configure flow bypass through one single rule:
table ip x {
flowtable f {
hook ingress priority 0; devices = { eth0, eth1};
}
chain y {
type filter hook forward priority 0;
ip protocol tcp flow add @f
}
}
● Conntrack entries are owned by the flowtable:
# cat /proc/net/nf_conntrack
ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392
dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443
dport=36392 [OFFLOAD] mark=0 zone=0 use=2
Flowtable bypass (4)
● Flow offload forward PoC (from 2018) in
software is ~2.75 faster in software:
● pktgen_bench_xmit_mode_netif_receive.sh to dummy
device to exercise the forwarding path
– One single CPU
– Smallest packet size (worst case)

● Performance numbers:
– Classic forwarding path (baseline): 1848888pps
– Flow offload forwarding: 5155382pps
Flowtable bypass (5)

Upstream since 4.16 (January 2018).

Recent patches:
– Tear down feature: send flows back to slow path

RST and FIN packets.

Limited pickup time.
● Only for TCP and UDP by now.
– Fix offloading of SNAT+DNAT flows
– Fix: Don’t remove offload when other netns's interface is down.
– Fix interaction with VRF.
– Fix interaction with helpers and /proc/sys/net/netfilter/nf_conntrack_helper
set to 1.
– Attach dst to skbuff.
Flowtable bypass (6)
● Hardware offload infrastructure (~200 LOC)
available at public git branch:
– https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
/log/?h=flow-offload-hw-v3
● Not yet upstream, waiting for a driver :-(
● User enables explicitly “offload” flag to enable
hardware offload.
● New ndo hook for offloads or generalise existing
ndo for this purpose.
Flowtable bypass (7): IPsec
● Patch to add IPSec support (not tested):
– https://patchwork.ozlabs.org/patch/982747/
● Setup entry in flowtable from first packet.
– Needs explicit configuration from user.
Policy HW offload

https://lwn.net/Articles/766695/
Policy HW offload (2)
● Top-level flow_rule:
– flow_match
● flow_dissector (enum FLOW_DISSECTOR_KEY_*)
● mask (opaque)
● key (opaque)
– flow_action
● num_actions
● actions[]
Policy HW offload (3)
● Supported actions (from what drivers can do):
– Accept, drop, trap, goto
– Redirect and mirred
– VLAN: push, pop, mangle
– Tunnel: Encapsulation, decapsulation
– Packet: mangle, add
– Checksum
– Mark
– Wake-on-lan (ethtool)
– Queue (ethtool: packet steering)
Policy HW offload (4)
● Drivers using this infrastructure:
– Broadcom: Bnxt (flower), bcm_sf2 (ethtool)
– Chelsio: cxgdb4 (flower)
– Intel: i40eia, iavf, igb (flower)
– Mellanox: mlx5, mlxsw (flower)
– Nettronome: nfp (flower)
– Qlogic: qede (ethtool + flower)

You might also like