02 AccelUPF Accelerating The 5G User Plane
02 AccelUPF Accelerating The 5G User Plane
programmable hardware
Abhik Bose∗ , Shailendra Kirtikar∗ , Shivaji Chirumamilla∗ , Rinku Shah+ , Mythili
Vutukuru∗
Indian Institute of Technology Bombay∗ , Indraprastha Institute of Information Technology Delhi+
India
{abhik,shailendra,shivaji}@cse.iitb.ac.in,[email protected],[email protected]
ABSTRACT CCS CONCEPTS
The latest generation of 5G telecommunication networks • Networks → In-network processing; Programmable
are expected to provide high throughput and low latency networks; Network performance analysis; Mobile net-
while catering to diverse applications like mobile broadband, works.
dense IoT, and self-driving cars. A high performance User
Plane Function (UPF), the main element in the 5G user plane, KEYWORDS
is critical to achieving these performance goals. This paper 5G core, 5G user plane, programmable networks, in-network
presents AccelUPF, a 5G UPF that offloads functionality to computation
programmable dataplane hardware for performance accelera-
tion. While prior work has proposed accelerating the UPF by ACM Reference Format:
offloading its data forwarding functionality to programmable Abhik Bose∗ , Shailendra Kirtikar∗ , Shivaji Chirumamilla∗ , Rinku
hardware, the Packet Forwarding Control Protocol (PFCP) Shah+ , Mythili Vutukuru∗ . 2022. AccelUPF: Accelerating the 5G
messages from the control plane that configure the hardware user plane using programmable hardware. In The ACM SIGCOMM
data forwarding rules were still processed in software. We Symposium on SDN Research (SOSR) (SOSR ’22), October 19–20, 2022,
show that only offloading data forwarding and not PFCP Virtual Event, USA. ACM, New York, NY, USA, 15 pages. https:
//doi.org/10.1145/3563647.3563651
message processing leads to suboptimal performance in the
UPF for applications like IoT that have a much higher ra-
tio of PFCP messages to data traffic, due to a bottleneck 1 INTRODUCTION
at the software control plane that configures the hardware
packet forwarding rules. In contrast to prior work, AccelUPF The mobile packet core connects the wireless radio access
offloads both PFCP message processing as well as data for- network (with base stations and mobile users) to external
warding to programmable hardware. AccelUPF overcomes networks. The packet core consists of several control plane
several technical challenges pertaining to the processing of components that process signaling messages from mobile
the complex variable-sized PFCP messages within the mem- users (e.g., for authentication, setting up sessions to transfer
ory and compute constraints of programmable hardware data, handling mobility-related events) and the User Plane
platforms. Our evaluation of AccelUPF implemented over Function (UPF) on the data plane that forwards user traffic to
a Netronome programmable NIC and an Intel Tofino pro- and from external networks. The two planes communicate
grammable switch demonstrates performance gains over the using PFCP (Packet Forwarding Control Protocol) messages
state-of-the-art UPFs for real-world traffic scenarios. that are sent by the control plane to establish, modify, and
delete packet forwarding rules in the user plane, as shown
in Figure 1. The most recent fifth generation (5G) telecom-
Permission to make digital or hard copies of all or part of this work for
munication networks aim to support use cases with high
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
throughput (∼1 Gbps/user), very low processing latencies
this notice and the full citation on the first page. Copyrights for components (<1 ms), stringent quality of service (QoS), and diverse traf-
of this work owned by others than ACM must be honored. Abstracting with fic characteristics, e.g., enhanced mobile broadband, dense
credit is permitted. To copy otherwise, or republish, to post on servers or to deployments of IoT devices, self-driving cars, AR/VR, high-
redistribute to lists, requires prior specific permission and/or a fee. Request speed entertainment in a moving vehicle, and delay-sensitive
permissions from [email protected].
SOSR ’22, October 19–20, 2022, Virtual Event, USA
video applications [27, 42]. A high performance and low cost
© 2022 Association for Computing Machinery.
UPF is necessary for meeting these requirements.
ACM ISBN 978-1-4503-9892-3/22/10. . . $15.00 Most state-of-the-art UPFs today are built as multicore-
https://doi.org/10.1145/3563647.3563651 scalable software packet processing appliances running over
1
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
2
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
3
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
deletion request, to establish, modify, and delete sessions PFCP messages in the software control plane, installs the
at the UPF respectively. After processing these messages, various rules in the hardware, and offloads only the GTP
the UPF sends back the corresponding response messages user plane traffic handling to the programmable hardware.
over PFCP as well, indicating the status (success or failure) PFCP message structure. A PFCP message has a highly
of the request. A single signaling procedure of the UE such complex structure. A PFCP message has several Information
as a PDU session establishment can trigger multiple PFCP Elements (IEs), which are used to create, modify, or delete the
request/response exchanges between the SMF and the UPF. packet forwarding rules at the UPF. For example, a PFCP ses-
For example, we show a simplified UE initial PDU session sion establishment message contains the following IEs [3]: a
establishment callflow in Figure 2. During this procedure, node identifier of the SMF which sent this message, a unique
the SMF first sends a PFCP session establishment to the session identifier (SEID) that identifies the session, followed
UPF to setup the uplink GTP tunnel, and later, after further by one or more IEs to create PDRs, FARs, BARs, QERs, and
communication with the base station, sends a PFCP session URRs. Now, while some IEs like the node ID, SEID, and the
modification message to setup the downlink GTP tunnel. Sev- IEs to create PDR and FAR are mandatory, some other IEs
eral other signaling procedures like the AN release, service are optional and need not always be specified. Furthermore,
request, and handover will also involve one or more PFCP a PFCP session establishment message can have a variable
request/response messages exchanged between the SMF and number of IEs to create PDRs, and each of these PDRs can
UPF, e.g., to mark a session as idle/active or to switch the cross reference the same or different FARs, BARs, and so
tunnel to another base station. The rate of PFCP messages on. Each of these IEs to create PDRs and other rules have a
received at a UPF will depend on the applications running nested structure with several smaller IEs contained within,
on the UEs being served by the UPF. Several new use cases which can further have mandatory and optional elements. To
of 5G like dense IoT or high speed mobility are expected complicate things further, the 3GPP standards allow IEs to
to generate high rate of PFCP messages. For example, a UE be present in any order inside a message. Therefore, parsing
running an IoT application will frequently establish sessions, and processing a PFCP message is a highly complicated op-
go idle and become active again, while transferring small eration that is hard to fully implement within the restricted
amounts of data in between, leading to a relatively higher processing available in hardware. This is the reason why no
proportion of PFCP messages in its generated traffic. prior work that uses programmable hardware to accelerate
UPF processing. The UPF primarily handles two types of in- UPF proposes processing PFCP messages in hardware.
coming traffic: PFCP messages that setup, modify and delete Programmable hardware. Before the introduction of pro-
various packet forwarding rules corresponding to UE data grammable data plane hardware, a high performance packet
sessions at the UPF, and user plane (GTP) traffic that is then processing network element like the UPF was either devel-
handled as per these established rules. There are several oped as a fixed function hardware appliance or as a software
types of rules at the UPF, as shown in Figure 2. Packet Detec- packet processing application running over commodity hard-
tion Rules (PDRs) help match the traffic of a session based on ware. While a hardware implementation provided higher and
packet header fields, e.g., source/destination IP address/port more deterministic performance, a software implementation
number, or GTP TEIDs. With each PDR, we have other associ- had the benefit of easy programmability to add new features.
ated rules that specify the action to be taken on the traffic that In contrast to fixed function hardware, programmable dat-
matches the PDR: Forward Action Rules (FARs) specify the aplane hardware can be easily programmed (and quickly
forwarding action to be applied on a packet (e.g., GTP TEIDs reprogrammed) to perform complex packet processing func-
to use for encapsulation and decapsulation), QoS Enforce- tions, via code written in a high-level language like P4 [21].
ment Rules (QERs) specify the QoS that must be enforced Therefore, programmable data planes provide the best of both
(e.g., maximum bit rate allowed for the session), Buffering worlds, with the performance of a hardware implementation
Action Rules (BARs) specify buffering requirements when and the flexibility of a software implementation. Packet pro-
the UE is idle, and Usage Reporting Rules (URRs) specify how cessing specifications written in a high-level language like P4
usage reporting should be performed for billing and charging. are compiled to a variety of targets, e.g., programmable hard-
These various PDRs and their associated FARs, QERs, BARs, ware ASICs [1, 2, 36, 37], NPUs [13, 45], and FPGAs [10, 17].
and URRs are established, modified, and deleted via PFCP Languages like P4 have several limitations put in place in
messages from the SMF to the UPF. Once these rules are in order to ensure linerate processing of the software speci-
place at the UPF, user plane GTP traffic is handled by finding fication. They have limited expressiveness in terms of the
a PDR that matches the received packet, and executing the supported instruction set and programming constructs. The
actions specified by the associated FARs, QERs, BARs, and packets cannot stall during the switch pipeline processing—
URRs. Prior work that uses programmable data plane hard- they have to be either forwarded or dropped. The amount of
ware to accelerate the UPF [5, 7, 8, 18, 22, 24, 33] proceses on-board memory on such hardware is limited in capacity
4
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
5
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
6
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
7
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
one has to perform other kinds of complex matching, e.g., a session is initially created in hardware, but needs to fallback
PDR specifies a prefix and we must perform a longest prefix to the software slowpath midway due to reasons such as: (i)
match, such matching cannot be performed using register a session that was being handled by the hardware fastpath
arrays. AccelUPF handles sessions with such packet rules starts sending data at a rate beyond its configured maximum
in the software slowpath. In addition to forwarding actions, bit rate, and must fallback to the software for buffering, or
AccelUPF must also enforce other rules corresponding to (ii) we receive a PFCP session modification request for an
QoS enforcement, buffering data for idle users, and usage already established session that was being handled in the
reporting. Of such rules, our implementation currently sup- hardware, and this PFCP message has a complex structure,
ports the enforcement of a session-wide aggregate maximum e.g., a rule that requires longest prefix matching or one where
bit rate (AMBR) by computing per-flow rates using ingress multiple PDRs refer to the same FAR, and must therefore
timestamps and interpacket gaps. Support for more complex be processed in software. For such scenarios, all subsequent
policies is deferred to future work. If a session exceeds its PFCP message processing and GTP forwarding of the session
configured AMBR, its data packets have to be buffered, which must migrate from hardware to software.
is once again handled in the slowpath. This state migration is accomplished as follows. When
What if packets belonging to two different sessions hash to the hardware realizes that it can no longer process a certain
the same index within the match array? Our current imple- session in the fastpath, it marks all the session states in the
mentation stores two entries in a hash bucket using dual- register arrays as invalid and under migration. All subse-
width registers and other such mechanisms available via P4 quent PFCP messages and user plane packets are forwarded
extern units on most programmable switches. We can also to the software slowpath as a fallback. When a software
use techniques such as multiple hash functions to find alter- slowpath receives a PFCP session modification or deletion
nate indices [38]. However, we will eventually face a hash message, or a GTP user plane packet, but does not find corre-
collision, where two different sessions are contending for the sponding state in its data structures, it probes the hardware
same entry in the register array. Hash collisions are handled to find if this is a case of a session being migrated from hard-
in AccelUPF by handling all traffic of the colliding session in ware to software after initially being created in hardware. If
the software slowpath. it finds the corresponding state in the hardware register ar-
rays as marked for migration, it copies this state to software,
and deletes the corresponding invalid entry in the hardware
3.5 Software fallback register array data structures. All subsequent packets of this
Any PFCP message that cannot be handled within the hard- session will find the session and packet forwarding state in
ware fastpath in AccelUPF is redirected to a software UPF software and will be correctly handled in the slowpath. If the
running in host userspace, as shown in Figure 3. Examples software UPF does not find the state to process a PFCP/GTP
of such PFCP messages include: (i) node-related PFCP mes- packet either in its own software data structures or after
sages that are not on the critical path of user-perceived per- probing the hardware data structures, it drops the packet.
formance; (ii) PFCP messages that contain a large number We note that an alternate design is possible where we
of PDRs and associated action rules, which cannot be eas- handle complex PFCP messages in software and install the
ily parsed in hardware; (iii) sessions that require complex session rules directly to the hardware, allowing us to pro-
algorithms like longest prefix matching to match incoming cess future traffic of such sessions in the fastpath. How-
data traffic to packet forwarding rules; (iv) sessions which ever, this approach requires more frequent updates to the
hash to the same index in the register arrays. All such PFCP hardware rules from the software slowpath. Considering the
messages are redirected to the slowpath software UPF, which hardware-software communication bottleneck and limited
handles them normally, and creates suitable state in the form performance gains only for a small set of complex PFCP ses-
of packet forwarding rules in software. All subsequent PFCP sions, AccelUPF has not implemented this hybrid approach.
session modification/deletion messages or GTP user plane
packets of this session will also not find a matching rule
in hardware and will thus be forwarded to, and correctly 3.6 Fault tolerance of switch state
processed in, the software slowpath. AccelUPF stores packet forwarding rules in register arrays,
How are the session state and packet forwarding rules shared which are not persistent across switch failures. While a soft-
correctly across the hardware fastpath and software slowpath? ware UPF, or even a hardware accelerated UPF that only
Note that for most sessions, the state is created, modified, and offloads GTP user plane forwarding, can use software-based
deleted exclusively either within the hardware fastpath or in mechanisms to replicate and persist session state across
software. Therefore, the question of ownership of state is triv- switch/host failures, AccelUPF cannot rely on software repli-
ial to resolve in most cases. The only tricky scenario is when cation for hardware switch state. Therefore, AccelUPF relies
8
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
9
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
10
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
PFCP GTP
UPF design
Tput (msg/s) msg/s/USD msg/s/Watt RTT (us) Tput (Mpps) Kpps/USD Kpps/Watt RTT (us)
SoftwareUPF 8309 85.51 949.60 40 11.93 17.53 194.77 85
GTPOffload Netronome 1953 6.39 78.12 1470 10.51 17.20 210.20 71
GTPOffload Tofino 499 1.91 31.23 447 11.94 22.91 373.12 49
AccelUPF Netronome 794849 2601.80 31793.96 114 4.83 7.91 96.60 115
AccelUPF Tofino 4389254 16841.26 274328.37 35 11.94 22.91 373.12 49
11
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
12
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
also use programmable hardware or specialized processing and algorithms that ensure strong consistency and fault-
engines to offload some part of the UPF processing to hard- tolerance for an in-network key-value store. Choi et al. [23]
ware. Few proposals [7, 8, 18, 22, 24] offload the GTP en- and SwiSh [49] introduce new replication protocols for in-
cap/decap based forwarding to hardware, while some [31] network state. Redplane [30] implements a fault-tolerant
offload packet steering to cores via deep packet inspection state store that ensures consistent application state access
(DPI) of the inner IP header. Kaloom [5] offloads a subset of even if the switch fails or traffic is rerouted to another switch,
QoS processing (bit rate policing) along with GTP processing while offering two consistency modes; strong consistency
to the programmable hardware. TurboEPC [40] offloads the and bounded-inconsistency.
subset of 4G core signaling messages to the programmable
hardware, but the proposed changes are not standards com-
pliant. uP4 [33] offloads the 5G UPF user plane processing 7 CONCLUSION
to programmable hardware and uses microservices that run
This paper presented the design, implementation, and eval-
on commodity hardware to process the corresponding PFCP
uation of AccelUPF, a programmable data plane hardware
signaling messages.
accelerated 5G user plane function. Prior work on using pro-
Our previous position paper [20] evaluated the costs and
grammable hardware to accelerate the mobile packet core
benefits of multiple 5G UPF designs (with and without hard-
user plane was restricted to offloading only the GTP user data
ware offload) and quantified the performance gains of user
forwarding functionality to hardware, while continuing to
plane traffic offload. The work also identifies the PFCP pro-
process PFCP messages that configure the packet forwarding
cessing bottleneck of the programmable data plane acceler-
rules in software. These designs perform badly when applica-
ated UPF when only data handling is offloaded, and proposes
tions frequently reconfigure packet forwarding rules while
the offload of PFCP processing as well to programmable
sending little data in between (e.g., IoT applications), because
hardware.
the software control plane APIs that reconfigure hardware
Control plane offload. Much like AccelUPF, prior work has
rules have a limited capacity. To overcome this bottleneck,
also proposed offloading the control plane logic of network
AccelUPF offloads the processing of most PFCP messages
functions (and not just the data plane) to programmable
to the programmable hardware as well, carefully working
hardware, and highlighted the challenges with the same.
around the memory and compute constraints of the hard-
Mantis [48] designs a control plane architecture over pro-
ware platforms when processing the complex PFCP messages.
grammable switches that can react to data center network
Our experiments show that AccelUPF significantly improves
conditions within tens of 𝜇s to resolve congestion events that
UPF packet processing performance as compared to previous
are microscopic in duration. Molero et al. [35] achieves line
offload-based UPF designs, especially when the traffic has
rate for internet routing by processing failure detection, dis-
a high proportion of PFCP messages. Our work highlights
tributed path-vector computations (shortest-path and BGP-
the challenges in processing a complex protocol like PFCP
like policies), and forwarding state updates, entirely within
in programmable hardware. Given the significant perfor-
the data plane. D2R [43] implements fast reroute during
mance gains that accrue from processing PFCP messages
network failure by performing route computation without
in programmable hardware at the UPF, our work provides
control plane intervention. Lucid [41] presents a framework
guidance on how the future versions of PFCP for 6G and
that simplifies the in-network implementation of control
beyond can evolve to make them amenable for acceleration
plane constructs such as stateful table data structures, pe-
using programmable dataplane platforms.
riodic event triggers and event handler processing, packet
buffering, traffic shaping, and synchronized state writes. Lu-
cid also proposes a high-level language for writing control
function code, and the compiler translates this code to the ACKNOWLEDGEMENTS
optimized target code for Intel Tofino switches. AccelUPF is We thank our shepherd Hyojoon Kim, and the anonymous
complementary to, and strengthens the case for, frameworks reviewers, for their insightful feedback. We thank the 5G
like Lucid. testbed project, funded by the Department of Telecommu-
Fault-tolerance of switch state. With many stateful ap- nications, Govt. of India, for access to the various 5G core
plications offloaded to the programmable data planes, pro- components. We thank Dr. Venkanna U. and his research
tecting application state under switch failure conditions and team at IIIT Naya Raipur, especially Suvrima Datta, for pro-
concurrent state access is essential. Prior work has proposed viding access to their hardware setup during our initial work.
state replication and fault tolerance solutions for such ap- We also thank the Fast Forward Initiative Hardware Grant
plications, some of which we leverage for fault tolerance of Program by Intel® Connectivity Research Program (ICRP)
switch state in AccelUPF. Netchain [29] proposes protocols for their grant of a programmable switch.
13
SOSR ’22, October 19–20, 2022, Virtual Event, USA Abhik Bose, et al.
14
AccelUPF: Accelerating the 5G user plane using programmable hardware SOSR ’22, October 19–20, 2022, Virtual Event, USA
Research (SOSR). [46] Netronome systems. 2022. Agilio CX 2x40GbE SmartNIC. https://
[41] John Sonchack, Devon Loehr, Jennifer Rexford, and David Walker. colfaxdirect.com/store/pc/viewPrd.asp?idproduct=2871
2021. Lucid: A Language for Control in the Data Plane. In Proceedings [47] Pablo B. Viegas, Ariel G. de Castro, Arthur F. Lorenzon, Fábio D. Rossi,
of the ACM SIGCOMM Conference. and Marcelo C. Luizelli. 2021. The Actual Cost of Programmable
[42] Gábor Soós, Ferenc Nándor Janky, and Pál Varga. 2019. Distinguishing SmartNICs: Diving into the Existing Limits. In Advanced Information
5G IoT Use-Cases through Analyzing Signaling Traffic Characteristics. Networking and Applications.
In 2019 42nd International Conference on Telecommunications and Signal [48] Liangcheng Yu, John Sonchack, and Vincent Liu. 2020. Mantis: Reactive
Processing (TSP). Programmable Switches. In Proceedings of the Annual Conference of the
[43] Kausik Subramanian, Anubhavnidhi Abhashkumar, Loris D’Antoni, ACM Special Interest Group on Data Communication on the Applications,
and Aditya Akella. 2021. D2R: Policy-Compliant Fast Reroute. In Technologies, Architectures, and Protocols for Computer Communication.
Proceedings of the ACM SIGCOMM Symposium on SDN Research (SOSR). [49] Lior Zeno, Dan R. K. Ports, Jacob Nelson, Daehyeok Kim, Shir Landau-
[44] UNSW Sydney. 2021. IOT TRAFFIC TRACES. https:// Feibish, Idit Keidar, Arik Rinberg, Alon Rashelbach, Igor De-Paula, and
iotanalytics.unsw.edu.au/iottraces.html Mark Silberstein. 2022. SwiSh: Distributed Shared State Abstractions
[45] Netronome systems. 2020. Agilio CX 2x10GbE SmartNIC. https: for Programmable Switches. In 19th USENIX Symposium on Networked
//www.netronome.com/media/documents/PB_Agilio_CX_2x10GbE- Systems Design and Implementation (NSDI 22).
7-20.pdf
15