AI Driven VNF Splitting in O RAN For Enhancin Esmaeil Amiri
AI Driven VNF Splitting in O RAN For Enhancin Esmaeil Amiri
Esmaeil Amiri
August 2023
Summary
In this thesis, the focus is on addressing the challenges faced by Radio Access Net-
works (RANs) in adapting to dynamic demands without manual intervention. The
Open RAN (O-RAN) architecture is introduced, which enables programmability,
openness, virtualization, and disaggregation principles. The base station functions
are implemented as Virtual Network Functions (VNF) and split across O-RAN
nodes, including the Radio Unit (RU), Distributed Unit (DU), and Centralized Unit
(CU).
One of the main objectives of this thesis is to achieve load balance VNF splitting
by intelligently distributing the workload across CUs and network links, thereby pre-
venting network congestion and overload. This is addressed by proposing a heuristic
algorithm. Additionally, Artificial Intelligence (AI)-based methods are employed to
intelligently manage resource allocation for dynamic VNF splitting with different
objectives. These objectives are robust VNF splitting to minimize frequent VNF
reconfigurations, energy-efficient VNF splitting, and edge-AI empowered dynamic
VNF splitting for network slicing. These objectives are formulated mathematically
and incorporated into Deep Reinforcement Learning (DRL) and federated DRL
frameworks, where reward functions are defined to guide the learning process.
The thesis presents significant contributions in proposing diverse O-RAN sys-
tem designs, and evaluating the proposed solutions using abstract and real network
topologies. The simulation results demonstrate that the heuristic solution effectively
achieves load balance, with a small gap of ≤ 2% compared to optimal solutions for
small network scales. Moreover, by fine-tuning the AI hyperparameters, the perfor-
mance gap of the edge-AI enabled solution can be reduced by 3% compared to the
optimal solution. The proposed solution for robust dynamic VNF splitting reduces
the overhead of VNF reconfigurations by up to 76%, with a minor increase of up
to 23% in computational costs. Additionally, the solution for energy-efficient VNF
splitting achieves noteworthy energy savings, with up to 56% reduction compared
to non-VNF splitting solutions.
2
Acknowledgements
I am deeply grateful to my supervisor, Prof. Ning Wang, for their guidance, exper-
tise, and unwavering support throughout my journey. I extend my sincere thanks
to Dr. Mohammad Shojafar for their valuable insights and encouragement.
A special appreciation goes to Prof. Rahim Tafazolli, Head of the 5G/6G Centre,
for their visionary leadership. I also thank my family and friends for their constant
encouragement.
This work would not have been possible without the collective contributions of
these individuals, and for that, I am truly thankful.
Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Technical Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Training Attended . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
3.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Network Constraints . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.3 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 The Proposed Heuristic Solution . . . . . . . . . . . . . . . . . . . . . 33
3.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2
5.4.1 Decision Variables . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.2 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . 69
5.5 Seq2Seq-A2C Decision Algorithm . . . . . . . . . . . . . . . . . . . . 70
5.5.1 Basics of A2C and Seq2Seq Model . . . . . . . . . . . . . . . . 70
5.5.2 Seq2Seq-A2C . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.6.2 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.6.3 Comparison Benchmarks . . . . . . . . . . . . . . . . . . . . . 76
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4
5.2 The architecture of the proposed Seq2Seq-A2C algorithm. . . . . . . . . . 72
5.3 Normalized Traffic (NT) condition for business and residential area [3]. . . 75
5.4 Training of the proposed DRL: (a) Normalized Energy Consumption
(NEC), and (b) Normalized Penalty Cost (NPC). . . . . . . . . . . . 76
5.5 Comparison benchmarks for Average Energy Consumption (AEC) per sec-
ond for residential area week-day traffic . . . . . . . . . . . . . . . . . . 77
5.6 Comparison benchmarks for Average Energy Consumption (AEC) per sec-
ond for business area week-day traffic . . . . . . . . . . . . . . . . . . . 77
5.7 Comparison benchmarks for Average Energy Consumption (AEC) per sec-
ond for residential area weekend traffic . . . . . . . . . . . . . . . . . . 77
6.1 The proposed O-RAN system model architecture leverages federated learn-
ing at the edge network to enable dynamic VNF splitting for network slices. 83
6.2 The aggregated fronthaul link traffic dataset used in our simulations,
representing the traffic for eMBB slices of each agent . . . . . . . . . 87
6.3 Training process of F-DQN vs L-DQN for agent 1 and 2 . . . . . . . 87
6.4 Training process for agent 1 with varying penalty coefficients (P) . . 88
6.5 Training process for agent 2 with varying penalty coefficients (P) . . 89
6.6 Performance gap comparison between different learning rates for F-
DQN, C-DQN, and L-DQN approaches . . . . . . . . . . . . . . . . . 89
6.7 VNF reconfigurations with varying α values . . . . . . . . . . . . . . 90
6.8 Edge sites utilization with varying α values . . . . . . . . . . . . . . . 90
List of Tables
Introduction
1.1 Overview
In this chapter, we delve into the motivations and objectives of the thesis, explor-
ing the significant challenges that have arisen in traditional Radio Access Networks
(RANs) and the potential solutions presented by the Open RAN (O-RAN) archi-
tecture. The ever-increasing demand for data and the proliferation of connected
devices have placed tremendous strain on traditional RANs, calling for a more flexi-
ble, scalable, and efficient approach. O-RAN, with its virtualization and disaggrega-
tion techniques, provides a promising solution to meet these demands and optimize
network operations.
In this thesis, we set forth a wide-ranging set of objectives that span critical as-
pects of O-RAN optimization. At the forefront of our approach, we acknowledge the
significance of traditional optimization methods, which include heuristic algorithms,
in addressing the complexities of RAN optimization. These time-tested techniques
form a sturdy foundation for our research, offering reliable and efficient solutions to
enhance the performance of O-RAN networks.
Furthermore, the thesis explores the application of AI-driven network manage-
ment and machine learning/artificial intelligence (ML/AI) techniques to enhance
overall network performance. By leveraging these advanced methodologies, we aim
to optimize decision-making processes and adaptively allocate resources in real-time,
ultimately leading to improved efficiency and responsiveness in O-RAN environ-
ments.
1
1.2 Motivations and Objectives
Radio Access Networks (RANs) traditionally rely on integrated, inflexible hardware
at base stations, posing challenges in quickly reconfiguring networks to adapt to dy-
namic demands without manual intervention. To address this, the Open RAN (O-
RAN) architecture introduces programmability, openness, virtualization, and disag-
gregation principles [5]. Disaggregation involves separating base station functions
into different O-RAN nodes: the Open Radio Unit (O-RU), Open Distributed Unit
(O-DU), and Open Centralized Unit (O-CU). These nodes feature open interfaces
for seamless interoperability among vendors. Virtual Network Functions (VNFs)
now implement legacy RAN operations, enabling adaptability to dynamic network
environments. The RAN Intelligent Controller (RIC) acts as a centralized decision-
maker based on real-time or non-real-time network intelligence.
RAN disaggregation empowers operators to determine VNF split points, decid-
ing which functions reside in O-CUs or O-DUs. This flexibility caters to diverse
requirements in 5G and beyond. However, choosing optimal split points presents
technical challenges due to variations in delay, bandwidth, and computational load
across O-CUs and O-DUs [4, 6]. Fixed split points struggle to sustainably optimize
performance in dynamic traffic conditions, where data processing loads and traffic
demands fluctuate along the network links. Recent studies have introduced an effi-
cient optimization scheme that addresses various objectives, including mobility man-
agement and resource allocation, while considering optimized VNF placement and
traffic steering in O-RAN environments. Nonetheless, more adaptive solutions are
needed to enable dynamic re-optimization of traffic and load distributions through
VNF migration and dynamic traffic steering between O-CUs and O-DUs. It is crucial
to maintain moderate network reconfiguration operations to avoid excessive costs
and traffic instability.
To address the challenges mentioned earlier, a combination of AI-driven net-
work management, ML/AI techniques, and traditional optimization methods, such
as heuristic algorithms, can be employed. AI-driven network management lever-
ages advanced analytics, automated decision-making, and adaptive learning to opti-
mize resource allocation, traffic management, and network slicing in real-time. The
objectives of this thesis, focused on enhancing VNF splitting in O-RAN, can be
summarized as follows:
• Load Balance VNF Splitting: The goal is to distribute the workload across
CUs and network links to prevent network congestion and overload on O-RAN
2
nodes. A heuristic solution will be employed to intelligently allocate resources
and optimize traffic distribution, ensuring effective load balancing.
3
Moreover, as part of our solution, we leverage the power of Artificial Intelligence
(AI). By incorporating AI techniques into our approach, we can effectively tackle
the complex constraints associated with VNF splitting in O-RAN. AI enables us
to dynamically optimize processing capacity, satisfy end-to-end delay requirements,
and efficiently manage mid-haul link bandwidth. This strategic integration of AI
empowers our solution to adapt in real-time to varying network conditions and
demands, thus achieving improved network efficiency and overall performance. Here
are the technical challenges associated with each objective:
4
• Edge AI Empowered Dynamic VNF Splitting (Chapter 6):
– Performance evaluation shows that the gap between optimal and heuristic
solutions does not exceed 2%.
– An in-depth analysis of different centralization levels shows that using
multi-CUs could reduce the total bandwidth usage by up to 20%.
– The network constraints defined in this chapter serve as a reference point
for the subsequent chapters.
5
– We evaluate the impact of centralizing computing resources on network
performance and vary the weights of different terms in the multi-objective
optimization problem.
– Our performance evaluation highlights the significant improvements of
the proposed solution, including a reduction of up to 76% in the over-
head of VNF reconfigurations, despite a slight increase of up to 23% in
computational costs.
– When compared to the most robust O-RAN system that doesn’t require
VNF reconfigurations, which is Centralized RAN (C-RAN), our solution
offers up to 76% savings in bandwidth while showing up to 27% overpro-
visioning of CPU.
– The proposed DRL-based solution in this chapter serves as a reference
point for the subsequent chapters.
6
approach. Our approach involves each edge site acting as an independent
agent, utilizing a federated DRL approach for dynamic VNF splitting in
each network slice. By training models locally, we accelerate the decision-
making process, reduce latency, improve responsiveness, and eliminate
single points of failure. The agents then share their local models with
a near-real-time RIC (near-RT RIC), which aggregates them to create a
global model. This federated approach enhances agent performance while
maintaining decentralized control.
– The objective of dynamic VNF splitting is to maximize resource utiliza-
tion at edge sites while minimizing the overhead of VNF reconfigurations
due to dynamic traffic conditions.
– The performance evaluation shows the superiority of the proposed solu-
tion over distributed DRL. We explore network Key Performance Indica-
tors (KPIs) by adjusting the reward function weighting factor in DRL.
Additionally, fine-tuning the learning rate narrows the performance gap
with the optimal solution by 3%.
• Chapter 3: The focus of this chapter is on load balancing VNF splitting across
CUs and network links, taking into account delay and bandwidth requirements.
7
• Chapter 5: In this chapter, we propose an innovative energy-efficient RAN
disaggregation and virtualization method for O-RAN. This method effectively
tackles the challenges posed by dynamic traffic conditions.
• Amiri, Esmaeil, Ning Wang, Mohammad Shojafar, and Rahim Tafazolli. ”Energy-
Aware Dynamic VNF Splitting in O-RAN Using Deep Reinforcement Learn-
ing.” IEEE Wireless Communications Letters (2023) [7].
8
• Amiri, Esmaeil, Ning Wang, Mohammad Shojafar, and Rahim Tafazolli. ”Edge-
AI Empowered Dynamic VNF Splitting in O-RAN Slicing: A Federated DRL
Approach” IEEE Communications Letters (2023), (Submitted on Nov. 2023).
1.8 Summary
In this chapter, we delved into the motivations, objectives, technical challenges,
and key technical contributions of the thesis. The motivations arise from the chal-
lenges faced by traditional RANs and the opportunities presented by the O-RAN
architecture, which introduces virtualization, disaggregation, and AI-driven network
management. The objectives revolve around load-balancing VNF splitting, robust
VNF splitting, energy-efficient VNF splitting, and delegation decision-making. Each
objective poses technical challenges, such as network modeling, problem formulation,
and AI integration. The thesis addresses these challenges through the proposal of O-
RAN system designs, the development of heuristic algorithms, and the exploration
of centralized and distributed AI approaches. The proposed solutions are evaluated
using both abstract and real network topologies, demonstrating remarkable results
and the effectiveness of the approaches.
9
Chapter 2
2.1 Overview
In this chapter, we will delve into the realm of Open Radio Access Networks (O-
RAN) and explore the potential of incorporating Machine Learning (ML) and Arti-
ficial Intelligence (AI) techniques within this architecture. We begin by providing an
overview of the O-RAN architecture in Section 2.2, highlighting its key components
and functionalities. Next, in Section 2.3, we delve into the realm of software-based
ML techniques, discussing various approaches and algorithms that can be applied
within the O-RAN context.
Section 2.5 focuses on the practical applications of ML and AI within the O-RAN
framework. Here, we explore how ML algorithms can be leveraged to enhance various
aspects of O-RAN operations, such as traffic classification, routing optimization,
QoS/QoE prediction, and resource management. We delve into the specifics of each
application, highlighting relevant research papers and methodologies.
2.2 O-RAN
Legacy Radio Access Networks (RANs) [10] have traditionally relied on black-box
hardware deployed at base stations. In this setup, network functions are tightly
integrated physically, making it challenging to quickly reconfigure the network in
response to changing demands without manual operations at the site [11–15]. How-
ever, the emergence of the Open RAN (O-RAN) solution has brought about a new
architecture that addresses these limitations.
10
Figure 2.1: O-RAN architecture overview [1]
The O-RAN architecture [16], as shown in Fig .2.1, is built on the principles
of programmability, openness, virtualization, and disaggregation. Disaggregation
refers to the separation of base station functions into different types of O-RAN
nodes: the Open Radio Unit (RU), Open Distributed Unit (DU), and open Cen-
tralized Unit (CU). The CU further enhances the disaggregation by dividing the
control plane (CU-CP) and user plane (CU-UP) functionalities. The CU-CP han-
dles control and management tasks, including radio resource management, handover
decisions, and network optimization. It acts as the intelligent controller that orches-
trates and coordinates the various elements of the network. The CU-UP, on the
other hand, focuses on processing user data traffic, providing functions such as en-
cryption, decryption, and traffic routing. These nodes are designed to have open
interfaces between them, ensuring interoperability across vendors. By decoupling
the functions and introducing standardized interfaces, O-RAN enables greater flex-
ibility and vendor diversity in RAN deployments. The main O-RAN interfaces as
shown in Fig .2.1 are:
11
• O2 Interface: Connects the SMO to the ORAN O-Cloud, managing infras-
tructure and deployment services. Infrastructure management handles cloud
infrastructure deployment and management, while deployment management
oversees life cycle management on the cloud infrastructure.
With the adoption of the O-RAN architecture, the operations of legacy RANs
are now implemented through Virtual Network Functions (VNFs). Softwareization
of RAN functions offers exciting opportunities to program VNFs to dynamically
respond to changes in the network environment. This programmability allows for
adaptive configurations, enabling operators to optimize network performance and
resource allocation based on real-time conditions.
The RIC plays a pivotal role in O-RAN deployments, enabling intelligent man-
agement and decision-making. Serving as a centralized software controller, the RIC
leverages its equipped network intelligence to make informed decisions, catering to
both real-time and non-real-time scenarios. This allows for dynamic and adaptive
network operations within the O-RAN architecture.
Software-Defined Networking (SDN) [17–30] and the RIC are closely intercon-
nected. SDN revolutionizes network management by separating the control plane
from the data plane, providing flexibility and programmability. By leveraging SDN
principles, RIC optimizes network performance, orchestrates resources efficiently,
and facilitates the deployment of innovative services.
The RIC acts as a coordination point, facilitating efficient management and or-
chestration of the diverse O-RAN nodes. By analyzing network data and leveraging
ML algorithms, the RIC can intelligently adapt and respond to changing network
conditions, ensuring optimal utilization of network resources.
In O-RAN deployments, the RIC architecture consists of two distinct compo-
nents: the near-real-time RIC (near-RT RIC) and the non-real-time RIC (non-RT
12
RIC). The near-RT RIC focuses on real-time decision-making, enabling quick re-
sponses to dynamic network events. It handles critical operations that require im-
mediate actions, such as load balancing, congestion control, and network slicing. The
near-RT RIC ensures efficient and reliable real-time network operation, enhancing
the overall user experience.
On the other hand, the non-RT RIC is responsible for non-real-time decision-
making processes. It handles tasks that require a broader network view and long-
term planning, such as network optimization, capacity planning, and policy enforce-
ment. The non-RT RIC utilizes historical data and predictive analytics to make
strategic decisions that optimize network performance and resource allocation over
an extended time horizon.
In addition to the RIC, O-RAN deployments can incorporate external applica-
tions known as xApps and rApps. xApps run on top of the RIC platform and utilize
its network intelligence and interfaces to provide specialized functionalities. These
can include intelligent traffic steering, network slicing management, or interference
mitigation. xApps allow operators to customize and enhance the capabilities of the
RIC to suit their specific network requirements.
rApps are software applications that run directly on the O-RAN nodes, such
as the DU or CU. These applications focus on localized optimization and control
of specific network elements, such as radio resource management or interference
cancellation. By running directly on the O-RAN nodes, rApps enable efficient and
localized decision-making, reducing the need for centralized control and minimizing
latency.
By embracing the O-RAN architecture, operators can benefit from increased
agility, interoperability, and flexibility. The programmability and disaggregation of
network functions allow for more rapid deployment of new services, improved scal-
ability, and efficient resource utilization. Moreover, the open interfaces promote
competition among vendors, leading to innovation, cost reduction, and faster tech-
nology evolution in the RAN domain.
13
Figure 2.2: VNF split options in O-RAN [2]
However, selecting the optimal split point poses technical challenges due to vari-
ations in delay, bandwidth, and computational load between CUs and DUs [2]. The
characteristics and requirements of different services may vary significantly, making
it difficult to determine a fixed split point that can consistently optimize network
performance under dynamic traffic conditions. The dynamic nature of data process-
ing loads at CUs and DUs, as well as the traffic demand along the network links
connecting them, further complicates the optimization process [4, 6].
In recent research, an efficient optimization scheme has been proposed to deter-
mine the optimized placement of VNFs and traffic steering in O-RAN environments,
with a specific focus on load balancing [6]. This work aims to achieve an optimal
distribution of traffic and loads by strategically placing VNFs and steering traffic
between CUs and DUs. By dynamically adapting to changing network conditions,
this approach enhances overall network performance and resource utilization.
However, while existing optimization schemes provide initial solutions, there
is a need for more adaptive and dynamic approaches to achieve continuous re-
optimization of traffic and load distributions. This can be accomplished through
the migration of O-RAN VNFs and dynamic traffic steering between CUs and DUs.
By constantly monitoring network conditions, such as traffic patterns and resource
utilization, operators can make informed decisions to dynamically reconfigure the
network and optimize performance in real-time.
It is important to note that network reconfiguration operations should be con-
ducted with moderation to avoid excessive reconfiguration costs and traffic insta-
14
bility. Frequent and drastic reconfigurations may lead to disruptions in service,
suboptimal resource allocation, and additional operational expenses. Therefore, a
careful balance needs to be struck between achieving dynamic re-optimization and
maintaining network stability and reliability.
To address these challenges, ongoing research is exploring various innovative
approaches. Here are eight different splits proposed in the literature:
• Functional Split: This approach involves the differentiation between user plane
and control plane functions, allowing for more specialized and optimized pro-
cessing. By separating these functions, the network can achieve better scala-
bility, flexibility, and efficiency. Research studies have explored optimal func-
tional splits considering various factors, such as traffic demands, latency re-
quirements, and resource utilization [36–38].
15
Service-based splitting allows for customized resource allocation to meet di-
verse service needs. Researchers have investigated the benefits of service-based
splitting for different 5G and 6G use cases [44–46].
By exploring these split options and developing advanced algorithms and mech-
anisms, operators can achieve more granular control over their networks, optimize
performance, and effectively address the challenges posed by dynamic traffic condi-
tions in disaggregated RAN environments.
16
• Supervised learning: This is a type of ML technique where the learning
process is guided by labeled data. In the training step, the goal is to discover
an unknown function f : x → y that maps input data to known output labels.
This process requires a labeled training dataset to train the model. In the
testing step, when a new input is provided to the system, the learned function
f is used to predict the expected output. Commonly used supervised learning
algorithms include Logistic Regression (LR), K-Nearest Neighbors (K-NN),
Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM),
Bayes’ theory, and hidden Markov models.
• DRL: DRL differs from supervised and unsupervised learning as it does not
involve explicit training and testing steps. DRLis focused on an agent learning
an optimal policy through interactions with an environment. The interaction
is defined by the agent taking actions, receiving observations, and obtaining
rewards from the environment. The agent’s goal is to maximize the cumula-
tive rewards over time by learning which actions to take in different states.
Q-learning is one of the most well-known and effective algorithms used in rein-
forcement learning. Deep RL (DRL) as shown in Fig. 2.3 further extends RL
by incorporating deep neural networks as function approximators, enabling the
handling of high-dimensional and complex state spaces. This allows deep RL
17
Figure 2.3: DRL architecture overview
18
2.4 Traditional vs AI/ML Methods for Network
Optimization
Traditional optimization methods, such as heuristic methods, rely on pre-defined
rules and heuristics to find the optimal solution to a problem, while ML, and par-
ticularly DRL, uses data and experience to learn the optimal solution through trial
and error. On the other hand, ML techniques, such as DRL, have the potential to
learn more complex patterns and can potentially provide a better performance, but
they require large amounts of data and training time. DRL has been successfully
applied to many optimization problems, particularly in dynamic and complex sys-
tems where traditional methods may not be effective. In DRL, the goal is to learn
an optimal strategy to get the maximum reward from an environment by using neu-
ral networks. This optimal strategy is learned periodically by interacting with the
environment and observing the result. Applications of DRL in a RAN environment
include traffic engineering and routing [51], resource allocation [52–54], and energy
consumption [55, 56]. For instance, in [51], an DRL -based method for optimal split
points for VNF has been proposed that minimizes routing and VNF placement costs.
Similarly, in [56], the authors proposed a Deep Q-Network (DQN)-based algorithm
for functional splits in O-RAN to minimize the end-to-end delay and deployment
cost of DUs/DUs. Overall, while traditional optimization methods can be useful
for simpler problems, DRL has the potential to provide better solutions for more
complex and dynamic problems in RAN environments.
Recently, a novel method called neural combinatorial optimization, which uses
a neural network model to learn near-optimal solutions to combinatorial optimiza-
tion problems through DRL, has been proposed [57]. TheDRLapproach is used
to iteratively update the neural network weights. Studies have shown that neural
combinatorial optimization can achieve near-optimal solutions for typical combina-
torial problems [58, 59]. In recent years, the effectiveness of this approach in solving
various optimization problems in communications and networks has been demon-
strated [51, 60–62].
19
2.5 Optimizing O-RAN: Enhancing Efficiency and
Performance
In the context of Open Radio Access Networks (O-RAN), RIC plays a crucial role in
network management. Similar to the SDN controller, the RIC provides a global view
of the network, enabling efficient management and optimization. ML techniques can
be applied in the O-RAN architecture to enhance network performance and decision-
making. Let’s explore how ML can be applied in O-RAN.
20
data plane and control plane by maximizing resource utilization. Resource
allocation in the data plane and control plane is an active research area in O-
RAN, with ML techniques offering promising solutions for optimizing resource
allocation and utilization in a dynamic network environment [40, 52].
21
on factors such as location, processing power, and energy efficiency, leading to opti-
mized dynamic function splitting and enhanced service delivery to users.
The use of VNF in O-RAN systems leads to the development of different split
points, which refer to the division of network functions between the CU and the
DU. The white paper [2] discusses the different split points for VNF based on 3GPP
terminology, while authors in [49] review the gains and requirements for each split
point option proposed by 3GPP. In [73], the energy consumption variation caused
by different split point options using a real-time prototype is studied, while authors
in [4] propose an optimization problem to design Multi-cloud vRAN, aiming to
find the optimal number of CUs and their locations, as well as the optimal split
point for each flow in the network. The paper [74] proposes an optimal split point
for VNF that maximizes the centralization of vRAN while minimizing computing
resources, using CPLEX [75], which is an optimization tool, to solve the problem.
These studies demonstrate the importance of split point optimization in open RAN
systems to improve performance, energy efficiency, and resource utilization.
Recently, there have been several studies that explored various aspects of O-RAN
optimization, including energy efficiency [76], network slicing [77], resource alloca-
tion [53, 56], and load balancing [6]. In [6], an optimization problem is proposed
to determine the optimal split points for VNF in the O-RAN system for each flow
to balance the load across CUs and midhaul links while taking into account delay,
bandwidth, and computational load on CUs/DUs. To make the 5G architecture
more dynamic, scalable, and flexible, the authors in [40] proposed a hierarchical
RIC that uses Containerized Network Functions (CNFs) instead of VNF, which are
lighter and can be implemented through a microservices architecture. Addition-
ally, [78] proposes FlexNSP, which provides a flexible split point selection for each
network slice in the O-RAN system, and solves the proposed optimization problem
using Gurobi [79]. These studies highlight the importance of O-RAN optimization
for improving various aspects of the system, such as performance, efficiency, and
flexibility.
22
allocation, addressing challenges associated with limited resources, scalability, and
the dynamic nature of the O-RAN environment is essential to attain optimal en-
ergy efficiency. Overcoming these challenges will enable effective management of
resources and maximize energy efficiency while maintaining performance levels in
O-RAN networks.
The study presented in [80] introduces an energy-aware scheduling method for
virtualized Base Stations (vBS) in O-RAN. The method leverages online learning
and adversarial bandit learning to optimize scheduling policies that reduce energy
consumption and improve vBS performance. The proposed approach, implemented
through a Policy Decider application in the non-RT RIC, adapts policies based
on network conditions and user needs. Experimental results based on real-world
traffic traces and testbed measurements demonstrate that the proposed method
outperforms state-of-the-art approaches, achieving energy savings between 35.5%
and 74.3%.
23
2.5.5 Traffic Steering
Research on O-RAN traffic steering aims to enhance network traffic management
by intelligently directing traffic based on real-time network conditions and user re-
quirements. This involves analyzing various network parameters, such as traffic
load, congestion, and user behavior, to determine the most optimal path for traffic
flow. Traffic steering techniques can dynamically distribute traffic across different
network functions and resources in real-time, resulting in improved network perfor-
mance and reduced latency. ML algorithms have a significant role to play in traffic
steering research, providing real-time intelligence and decision-making capabilities
to optimize traffic flow and enhance overall network performance. However, chal-
lenges related to scalability, complexity, and limited resources need to be addressed
to achieve optimal traffic steering in O-RAN environments.
In the study conducted by Kazemifard et al. [40], the O-RAN architecture sepa-
rates the Control Plane (CP) from the User Plane (UP) using the E1 interface, which
is derived from the software-defined network (SDN) framework. This decoupling al-
lows for increased flexibility in network programming. The CP is implemented in
hierarchical RICs and manages radio resource functions through the A1 and E2 in-
terfaces. The authors propose utilizing hierarchical RICs to minimize the end-to-end
delay of data plane traffic by efficiently placing Containerized Network Functions
(CNFs). CNFs, which are lighter than Virtual Network Functions (VNFs) and can
be implemented using microservices architecture, enable a dynamic, scalable, and
flexible architecture that aligns with the requirements of 5G networks [82].
Furthermore, in the paper [83], the application of ML techniques is explored
to achieve modular and flexible implementations of O-RAN in 6G networks, with
a specific focus on the traffic steering use case and O-RAN xApps. The authors
highlight various ML algorithms suitable for traffic steering, including decision trees,
k-nearest neighbor (KNN), and neural networks. Additionally, they discuss the
potential of DRL in training an agent to make real-time traffic steering decisions.
On the other hand, Erdöl et al. propose a federated meta-learning approach for
traffic steering in O-RAN systems in [84]. This approach facilitates multiple Radio
Access Technologies (RATs) to learn from each other without sharing sensitive data.
The authors present a neural network architecture that employs meta-learning to
adapt to different RATs and acquire the ability to steer traffic in a decentralized
manner.
The utilization of Non-Orthogonal Multiple Access (NOMA) in O-RAN systems
to enhance radio resource efficiency is proposed in the paper by Akhtar et al. [85].
24
They present a resource allocation algorithm that adaptively assigns radio resources
by considering the traffic demand and channel conditions of users. By incorpo-
rating NOMA, multiple users can efficiently share the same radio resources while
maintaining a high-quality user experience.
In the paper by Kavehmadavani et al. [86], a traffic steering approach is intro-
duced to ensure effective coexistence between eMBB and uRLLC services within
O-RAN systems. They address this through a multi-objective optimization problem
formulation and a traffic steering algorithm that dynamically directs traffic based on
the network load and user requirements, while also guaranteeing a minimum Quality
of Service (QoS) level for both eMBB and uRLLC users.
2.6 Summary
This chapter provided an overview of the O-RAN architecture, highlighting disag-
gregated nodes (RU, DU, and CU), and explored ML/AI techniques’ applications in
traffic classification, routing optimization, QoS/QoE prediction, and resource man-
agement within O-RAN deployments. While these efforts have optimized network
performance and user experiences, there are gaps in dynamic VNF splitting in O-
RAN environments.
One critical gap is the lack of simultaneous load balancing across RAN nodes and
links, leading to network congestion and server overloading. Existing research has
focused on specific aspects like energy efficiency and resource allocation, necessitat-
ing a more holistic approach. To address this, our thesis proposes a comprehensive
and integrated solution employing multi-objective optimization to enhance resource
utilization and network performance while considering both RAN nodes and links.
Furthermore, another gap is a solution that minimizes frequent VNF reconfig-
urations under dynamic traffic conditions and requires sustained operation under
varying conditions to mitigate costs and traffic instability, ensuring a stable network
environment. The solution needs to leverage DRL and advanced neural network
architectures to handle the complexity of dynamic traffic conditions, enabling more
effective on-demand resource management. Additionally, the resulting solution could
be extended to other objectives such as energy efficiency.
Additionally, in the context of dynamic VNF splitting in O-RAN environments,
there exists a significant gap in offloading decision-making processes for changing
VNF splitting under varying traffic conditions to the edge networks. The current
state of research lacks an approach that leverages the potential of federated DRL at
25
the network edge, which could enable efficient local and real-time decision-making.
By introducing federated DRL, the intelligence and decision-making capabilities
can be distributed across the edge, allowing for more agile and responsive network
management.
In conclusion, this thesis’s contributions lie in addressing the gaps in dynamic
VNF splitting in O-RAN environments through an integrated and comprehensive
approach. By considering load balancing, employing DRL for real-time decisions,
and ensuring robustness under varying traffic conditions, we aim to enhance O-RAN
system performance while advancing the field of AI-driven network management.
26
Chapter 3
3.1 Overview
In this chapter, we propose an optimization problem for selecting split points in
the O-RAN architecture. We start by introducing a system model that effectively
captures the VNF placement in the O-RAN setup. Subsequently, we formulate the
problem with the primary objective of achieving load balancing between CUs and
midhaul links while considering essential factors like delay and bandwidth require-
ments, as well as the computing capacity of DUs/CUs.
To assess the complexity of the optimization problem, we demonstrate its NP-
hardness, highlighting the inherent challenges in finding an optimal solution. To
tackle this, we devise a novel heuristic algorithm that efficiently solves the problem
while maintaining a performance gap of less than 2% compared to the optimal
solutions.
Furthermore, we conduct performance evaluations considering different central-
ization levels, which represent the distribution of total computing capacity across
DUs/CUs. Our results indicate that with multiple CUs, there can be up to a 20%
reduction in total bandwidth usage, demonstrating the potential benefits of decen-
tralized computing resources. Additionally, by implementing multipath routing,
we improve load balancing between midhaul links, albeit with a slight increase in
bandwidth utilization.
In conclusion, our proposed optimization approach addresses the challenges of
split point selection in O-RAN, aiming to achieve efficient load balancing and re-
source utilization. Through our heuristic algorithm and performance evaluations,
27
we demonstrate the effectiveness of the proposed solution, paving the way for more
efficient and responsive O-RAN networks.
3.2 Motivation
The motivation behind this chapter’s exploration and proposal of an optimization
problem for split point selection within the O-RAN architecture is rooted in ad-
dressing critical challenges within modern wireless networks. The ongoing evolution
towards O-RAN aims for increased flexibility, efficiency, and cost-effectiveness in
network infrastructures. However, the efficient placement of VNFs and the opti-
mization of split points pose significant challenges. The primary motivation is to
ensure the effective utilization of resources, such as computing capacity, bandwidth,
and minimizing latency, while achieving load balancing among critical network ele-
ments like CUs and midhaul links.
The C-RAN was proposed to implement the network functions of base stations
in common hardware, which would permit different functional splits [2]. The split
specifications are detailed in [87], while [88] discussed the split requirements and
gains. Despite this, only a few works have optimized split selection [89]. The adap-
tive RAN, which can switch between two different centralized options at runtime,
was examined in [90]. A joint RAN slicing and functional split was developed by
the authors in [91] to optimize centralization degree (CD) and throughput.
Neither the works above consider multiple CUs nor load balancing between them,
despite the fact that it’s a vital consideration in vRAN design. [4] and [92] investigate
the minimization of the cost of splits in tree networks with fixed CUs. The authors
of [93] identify CU locations and formulate a minimization problem for the CU
locations. Despite considering multiple CUs, [94] and [95] don’t balance the load
between CUs. Co-locating DUs with CUs in [96] and [73] aims to reduce energy
costs; therefore, the assignment decisions do not affect load balancing. Hence, in this
chapter, we decide to deploy multiple CUs besides DUs, using the new architecture
of the Open-RAN proposed in [97], and balance the load between the CUs and
midhaul links.
28
Table 3.1: Performance gains of different splits [4]
Split VNF in O-DU VNF in O-CU UL BW (Mbps) Delay (ms) ρdp ρcp
1 f1 → f2 → f3 None λ 30 ρ1 + ρ2 + ρ3 0
2 f1 → f2 f3 λ 30 ρ1 + ρ2 ρ3
3 f1 f2 → f3 1.02λ + 1.5 2 ρ1 ρ2 + ρ3
4 None f1 → f2 → f3 2500 0.25 0 ρ1 + ρ2 + ρ3
proposed in [97]. In this architecture, the traffic flows are aggregated at only two
points of the RAN, RUs, and CUs. This means that from RUs to CUs, there is no
traffic aggregation. Due to this assumption, the network functions are placed per
RUs in DUs and CUs.
We consider a chain of network functions (f 0 → f 1 → f 2 → f 3). f 0 is the RF
network function, which is implemented in RUs, similar to [3,4], while the remaining
network functions can be placed in CU or DU. f 1 is relevant to the RHY function,
f 2 is relevant to RLC and MAC functions, and f 3 is relevant to PDCP and RLC
functions. Table 3.1 shows that there are four possible splits (S0 to S3). Moving
from S0 to S3, more network functions are placed in CU. Table 3.1 also shows
that putting more network functions in CU requires more bandwidth demand. For
example, the bandwidth demand grows from λ Mbps (payload) to 2.5 Gbps from
split S0 to S3. Furthermore, the delay requirement is reduced from 30 to 0.25 ms.
The O-RAN is modeled as a graph G = (I, E) where I is the set of nodes and
E is the set of edges. The nodes include RUs, DUs, CUs, routers, and a core node
(EPC). As shown in Fig. 3.1, multiple RUs are connected to each DU-n by fronthaul
links. The DUs are connected to CUs by routers and midhaul links, and then CUs
have a direct interface to the core node (EPC). We define M, N , R, and Q as the
set of RUs, DUs, routers, and CUs respectively (M = |M|, N = |N |, R = |R|,
Q = |Q|). We assume that Q ≪ N . The edges include network links, and each link
(i, j) has a capacity (Cij ). Each RU-m is connected to CU-q through a set the paths
(Pmq ). For each p ∈ Pmq an end-to-end delay (dpmq ) is defined as well. The network
functions (f 1, f 2, f 3) are implemented in DUs and CUs as virtual machines (VMs).
The processing capacity (cycles/s) of DU-n and CU-q are denoted as Hn and Hq ,
respectively. It is noted that Hq > Hn . Computational load (cycles/Mbps) of f 1,
f 2 and f 3 are denoted as ρ1, ρ2 and ρ3 per traffic unit, respectively. Table 3.1
shows the computational load of CU (ρcs ) and DU (ρds ) result in each split calculated
from ρ1 , ρ2 and ρ3 . We represent only the downlink traffic case in this chapter. In
this way, the traffic flows are aggregated in RU-m, m ∈ M , with the demand of λm
(Mbps), and the O-RAN needs to provide routes for M different flows. Table 4.3
29
Figure 3.1: The network architecture overview
Notation Definition
M, M, m Set of RUs, number of RUs, RU index
N , N, n Set of DUs, number of DUs, DU index
Mn Set of RUs connected to DU-n
Q, |Q|, q Set of CUs, number of CUs, CU index
R Set of routers
Cij Capacity of link i,j
Pmq Set of paths from RU-m to CU-q
dpmq Delay of path p from RU-m to CU-q
ρcs , ρds Computational load of CU and DU result in each split s
ρ1, ρ2, ρ3 Computational load of f 1, f 2 and f 3
Hn , Hq Processing capacity (cycles/s) of DU-n and CU-q
λm Traffic demand of RU-m (Mbps)
30
CUs as well as midhaul links. In the first place, it leaves free bandwidth on mid-
haul links, which avoids network congestion. Secondly, keeping servers from being
overloaded. The following is the problem formulation for this objective.
3
X
xs,m = 1 ∀m ∈ M (3.1)
s=0
Constraint (3.1) ensures that only one of splits is considered for each RU-m. The
total computational load of RUs results in splitting the network functions in each
DU-n needs to be smaller than the processing capacity of DU-n, hence:
X 3
X
λm xs,m ρds ≤ Hn ∀n ∈ N (3.3)
m∈Mn s=0
where ρds is the computational load of RU-m in the result of split s in DU-n. Hn
is the processing capacity of DU-n. λm is the aggregated traffic flow from RU-m to
the core node (EPC).
The total computational load of RUs results in splitting the network functions
in each CU-q needs to be smaller than the processing capacity of CU-q, hence:
X 3
X
λm φmq xs,m ρcs ≤ αH q ∀q ∈ Q (3.4)
m∈M s=0
0<α ≤1 (3.5)
where ρcs is the computational load in the result of split s at CU and λm is the
aggregated traffic flow from RU-m to the core node (EPC). α is the maximum rate
of load in CUs and performs load balancing among CUs. φmc is a binary variable
that indicates whether RU-m is connected to CU-q. It is noted that the aggregated
traffic flow of each RU-m to the core node (EPC) needs to go through only one CU:
31
X
φmq = 1 ∀m ∈ M (3.6)
q∈Q
To send the aggregated traffic flow from RU-m to the core node (EPC), only one
path from available paths from RU-m to CU-q (Pmq ) needs to be selected, hence:
X
ϕpmq = 1 ∀m ∈ M, q ∈ Q (3.8)
p∈Pmq
where ϕpmq is a binary variable which indicates whether path p is selected for con-
nection between RU-m and CU-q. Next, we define rm as the bandwidth demand
between RU-m and CU-q due to split s and calculated as follow:
0<β ≤1 (3.12)
where Ipij indicates whether path p includes link (i,j). This constraint ensures that
the bandwidth demand of RUs assigned to link (i,j) does not exceed its capacity,
and coefficient β is the maximum utilization rate of midhaul links and performs
load balancing between midhaul links. The delay of selected path (dpmq ) from set of
available paths (Pmq ) need to meet the delay required for the selected split s (given
in Table 3.1), hence:
X X 3
X
φmq ϕpmq dpmq ≤ xs,m ds ∀m ∈ M (3.13)
q∈Q p∈Pmq s=0
3.4.2 Objective
Now we can define the load balance problem for midhaul links and CUs as problem
P:
32
P : M in{α + β}
(3.14)
S.t. (1) to (13)
33
Algorithm 1 Put as many as possible of NFs in DUs
1: Input: Sn : set of RUs connected to DU-n, Hn , λm , ρf
2: Output: The split for each RU
3: Set split S3 for all RUs
4: for each DU-n do
5: Sort Sn by decreasing order of λm
6: f ←1
7: while f ≤ 3 do
8: for each RU-m in Sn do
9: if λm ρf ≤ Hn then
10: Hn ← Hn − λm ρf
11: Update split of RU-m
12: else
13: Remove RU-m from Sn
14: end if
15: end for
16: f ←f +1
17: end while
18: end for
34
Algorithm 2 Put aggregated unassigned NFs in CUs
1: Input: S: Set of RUs, Q: Set of CUs, L: Set of links, Rq : Set of RUs connected
to the CU-q, Bm : Set of feasible CUs for RU-m, Tl : Set of RUs using link l
(initially empty)
2: Output: Select a CU for each aggregated unassigned NF
3: while S ̸= ∅ do
4: for m ∈ S do
5: if |Bm | = 1 then
6: Rq ← Rq ∪ {m}
7: S ← S − {m}
8: end if
9: for l ∈ L do
10: Update Tl
11: end for
12: end for
13: m ← arg maxi∈S (λi ρcs )
14: for q ∈ Bm do P c
j∈Ri ∪{q} λj ρs
15: αq ← arg maxi∈Q Hi
16: Pmq ← Feasible path from RU-m to CU-q with max free capacity of
bottleneck link
17: for l ∈ L do
18: if l ∈ Pmq then
19: Tl′ ← Tl ∪ {m}
20: else
21: Tl′ ← Tl
22: end if
23: end for P
j∈T ′ rj
24: βq ← arg maxi∈L ci
i
35
functions for all RUs. It is noted that this procedure could find a feasible solution
because it is assumed that there are sufficient resources to place all the network
functions in DUs and CUs. The details are given in Algorithm II.
The final phase of Algorithm II is the improvement phase. In this phase, better
solutions may be achieved by replacing the RUs in the CUs. Thus, in the final
phase, these availabilities are checked. In this way, we set the solution found as the
best solution. Then, the midhaul link with the lowest free capacity is selected, and
the RUs use this midhaul link. For each of these RUs, check the alternative CUs
and paths with lower α + β. Reassigned the RUs to the best alternative if possible.
This process is done continuously so that no improvement is achieved. Finally, the
complexity of Algorithm II is O |L| |S|2 where |S| is the number of RUs and |L|
36
Table 3.3: Comparison of Algorithm II and optimal solution
600
CU load
DU load
500
CU & DU loads
400
300
200
100
λm
100 200 300 400 500
37
CU f1 DU f2
DU f1 CU f3
CU f2 DU f3
250
200
Number of NFs
150
100
50
λm
100 200 300 400 500
N N
N N
O L Q N V O R D G G L V W U L E X W L R Q
O L Q N V O R D G G L V W U L E X W L R Q
N N
1 X P E H U R I &