0% found this document useful (0 votes)
19 views117 pages

AI Driven VNF Splitting in O RAN For Enhancin Esmaeil Amiri

Uploaded by

Yese Solomon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views117 pages

AI Driven VNF Splitting in O RAN For Enhancin Esmaeil Amiri

Uploaded by

Yese Solomon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 117

AI-Driven VNF Splitting in O-RAN for

Enhancing Resource Allocation Efficiency

Esmaeil Amiri

Submitted for the Degree of


Doctor of Philosophy
from the
University of Surrey

Supervisors: Professor Ning Wang, and Dr Mohammad Shojafar

5G/6G Innovation Center


Faculty of Engineering and Physical Sciences
University of Surrey

August 2023
Summary
In this thesis, the focus is on addressing the challenges faced by Radio Access Net-
works (RANs) in adapting to dynamic demands without manual intervention. The
Open RAN (O-RAN) architecture is introduced, which enables programmability,
openness, virtualization, and disaggregation principles. The base station functions
are implemented as Virtual Network Functions (VNF) and split across O-RAN
nodes, including the Radio Unit (RU), Distributed Unit (DU), and Centralized Unit
(CU).
One of the main objectives of this thesis is to achieve load balance VNF splitting
by intelligently distributing the workload across CUs and network links, thereby pre-
venting network congestion and overload. This is addressed by proposing a heuristic
algorithm. Additionally, Artificial Intelligence (AI)-based methods are employed to
intelligently manage resource allocation for dynamic VNF splitting with different
objectives. These objectives are robust VNF splitting to minimize frequent VNF
reconfigurations, energy-efficient VNF splitting, and edge-AI empowered dynamic
VNF splitting for network slicing. These objectives are formulated mathematically
and incorporated into Deep Reinforcement Learning (DRL) and federated DRL
frameworks, where reward functions are defined to guide the learning process.
The thesis presents significant contributions in proposing diverse O-RAN sys-
tem designs, and evaluating the proposed solutions using abstract and real network
topologies. The simulation results demonstrate that the heuristic solution effectively
achieves load balance, with a small gap of ≤ 2% compared to optimal solutions for
small network scales. Moreover, by fine-tuning the AI hyperparameters, the perfor-
mance gap of the edge-AI enabled solution can be reduced by 3% compared to the
optimal solution. The proposed solution for robust dynamic VNF splitting reduces
the overhead of VNF reconfigurations by up to 76%, with a minor increase of up
to 23% in computational costs. Additionally, the solution for energy-efficient VNF
splitting achieves noteworthy energy savings, with up to 56% reduction compared
to non-VNF splitting solutions.

2
Acknowledgements

I am deeply grateful to my supervisor, Prof. Ning Wang, for their guidance, exper-
tise, and unwavering support throughout my journey. I extend my sincere thanks
to Dr. Mohammad Shojafar for their valuable insights and encouragement.
A special appreciation goes to Prof. Rahim Tafazolli, Head of the 5G/6G Centre,
for their visionary leadership. I also thank my family and friends for their constant
encouragement.
This work would not have been possible without the collective contributions of
these individuals, and for that, I am truly thankful.
Contents

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivations and Objectives . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Technical Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Training Attended . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Background and Literature Review 10


2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 O-RAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 RAN Disaggregation . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 AI/ML Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Traditional vs AI/ML Methods for Network Optimization . . . . . . 19
2.5 Optimizing O-RAN: Enhancing Efficiency and Performance . . . . . . 20
2.5.1 O-RAN Resource Management . . . . . . . . . . . . . . . . . 21
2.5.2 Dynamic Function Split . . . . . . . . . . . . . . . . . . . . . 21
2.5.3 Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.4 Mobility Management . . . . . . . . . . . . . . . . . . . . . . 23
2.5.5 Traffic Steering . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Load Balance VNF Splitting in O-RAN 27


3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1
3.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Network Constraints . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.3 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 The Proposed Heuristic Solution . . . . . . . . . . . . . . . . . . . . . 33
3.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Robust Dynamic VNF Splitting in O-RAN 41


4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.2 VNF Placement . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.3 Bandwidth Demand . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.4 End-to-End Delay . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.5 Overall Optimization Objective . . . . . . . . . . . . . . . . . 47
4.4.6 NP-hardness of the Proposed Problem . . . . . . . . . . . . . 49
4.5 Policy Optimization with Neural Network . . . . . . . . . . . . . . . 50
4.5.1 Neural Network Architecture . . . . . . . . . . . . . . . . . . . 52
4.5.2 Policy Gradient Optimization with Baseline . . . . . . . . . . 53
4.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6.2 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6.3 Comparison Benchmarks . . . . . . . . . . . . . . . . . . . . . 59
4.6.4 Constraint Dissatisfaction . . . . . . . . . . . . . . . . . . . . 60
4.6.5 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.6 Network KPIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Energy Efficient Dynamic VNF Splitting in O-RAN 65


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2
5.4.1 Decision Variables . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4.2 Energy Consumption . . . . . . . . . . . . . . . . . . . . . . . 69
5.5 Seq2Seq-A2C Decision Algorithm . . . . . . . . . . . . . . . . . . . . 70
5.5.1 Basics of A2C and Seq2Seq Model . . . . . . . . . . . . . . . . 70
5.5.2 Seq2Seq-A2C . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.6.2 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.6.3 Comparison Benchmarks . . . . . . . . . . . . . . . . . . . . . 76
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6 Edge AI Empowered Dynamic VNF Splitting in O-RAN Slicing:


A Federated DRL Approach 79
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Cost of Distributed and Federated Learning . . . . . . . . . . . . . . 81
6.4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 O-RAN Slicing with F-DQN . . . . . . . . . . . . . . . . . . . . . . . 83
6.5.1 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.5.2 Action space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.5.3 Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.5.4 DQN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5.5 F-DQN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6.1 Simulation settings . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6.2 Model training . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6.3 Comparison with optimal solution . . . . . . . . . . . . . . . . 89
6.6.4 Network KPIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7 Conclusion and Future Works 91


7.1 Summary of Achievements . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
List of Figures

2.1 O-RAN architecture overview [1] . . . . . . . . . . . . . . . . . . . . . . 11


2.2 VNF split options in O-RAN [2] . . . . . . . . . . . . . . . . . . . . . . 14
2.3 DRL architecture overview . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 The network architecture overview . . . . . . . . . . . . . . . . . . . 30


3.2 Total DU and CU loads vs λm . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Total number of NFs placed in DU and CU vs λm . . . . . . . . . . . 38
3.4 The link load distribution vs number of CUs . . . . . . . . . . . . . . 38
3.5 Total bandwidth usage vs centralization Ratio (CR): (a) and (b) for
different number of CUs, (c) and (d) for different k values. . . . . . . 39

4.1 The proposed O-RAN architecture . . . . . . . . . . . . . . . . . . . 45


4.2 The proposed deep reinforcement learning architecture overview . . . 51
4.3 Real case study (a) possible area for O-RAN deployment in Bristol,
UK, and, (b) traffic demands of five 4G sites (RUs) in 24 hours. . . . 58
4.4 Training of the proposed deep RL for studied topology with (a) sufficient
available resources and (b) constrained resources. . . . . . . . . . . . . 59
4.5 Comparison of our proposed solution with two commonly used meth-
ods, DQN and AC, based on three criteria over 2000 epochs: (a) total
O-RAN cost, (b) penalization cost, and (c) Lagrangian cost. . . . . . 60
4.6 The heat map displays the number of constraint dissatisfactions in
the studied topology during five different time slots. . . . . . . . . . . 61
4.7 The network KPIs with respect to the Centralization Ratio (CR) in
five time slots, (a) bandwidth usage, (d) CPU utilization in DUs, and
(c) average CPU utilization in CU . . . . . . . . . . . . . . . . . . . 64

5.1 Proposed O-RAN system architecture overview. It consists of three key


components: RUs, edge sites containing DUs, and a cloud site containing
CUs, along with a transport layer. . . . . . . . . . . . . . . . . . . . . . 68

4
5.2 The architecture of the proposed Seq2Seq-A2C algorithm. . . . . . . . . . 72
5.3 Normalized Traffic (NT) condition for business and residential area [3]. . . 75
5.4 Training of the proposed DRL: (a) Normalized Energy Consumption
(NEC), and (b) Normalized Penalty Cost (NPC). . . . . . . . . . . . 76
5.5 Comparison benchmarks for Average Energy Consumption (AEC) per sec-
ond for residential area week-day traffic . . . . . . . . . . . . . . . . . . 77
5.6 Comparison benchmarks for Average Energy Consumption (AEC) per sec-
ond for business area week-day traffic . . . . . . . . . . . . . . . . . . . 77
5.7 Comparison benchmarks for Average Energy Consumption (AEC) per sec-
ond for residential area weekend traffic . . . . . . . . . . . . . . . . . . 77

6.1 The proposed O-RAN system model architecture leverages federated learn-
ing at the edge network to enable dynamic VNF splitting for network slices. 83
6.2 The aggregated fronthaul link traffic dataset used in our simulations,
representing the traffic for eMBB slices of each agent . . . . . . . . . 87
6.3 Training process of F-DQN vs L-DQN for agent 1 and 2 . . . . . . . 87
6.4 Training process for agent 1 with varying penalty coefficients (P) . . 88
6.5 Training process for agent 2 with varying penalty coefficients (P) . . 89
6.6 Performance gap comparison between different learning rates for F-
DQN, C-DQN, and L-DQN approaches . . . . . . . . . . . . . . . . . 89
6.7 VNF reconfigurations with varying α values . . . . . . . . . . . . . . 90
6.8 Edge sites utilization with varying α values . . . . . . . . . . . . . . . 90
List of Tables

3.1 Performance gains of different splits [4] . . . . . . . . . . . . . . . . . 29


3.2 Notation Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Comparison of Algorithm II and optimal solution . . . . . . . . . . . 37

4.1 Performance gains of different splits [4] . . . . . . . . . . . . . . . . . 43


4.2 Summary of network model notations . . . . . . . . . . . . . . . . . . 46
4.3 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 CPU overprovisioning (OVP) vs bandwidth (BW) saving . . . . . . . 62
4.5 Different Centralization Ratios (CRs) . . . . . . . . . . . . . . . . . . 63
4.6 Trade-off between computational cost and the number of VNF recon-
figurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1 Bandwidth requirement of different splits [4] . . . . . . . . . . . . . . . . 67


5.2 Network and Seq2Seq-A2C parameters . . . . . . . . . . . . . . . . . . . 74

6.1 Network and F-DQN parameters . . . . . . . . . . . . . . . . . . . . . . 88


Chapter 1

Introduction

1.1 Overview
In this chapter, we delve into the motivations and objectives of the thesis, explor-
ing the significant challenges that have arisen in traditional Radio Access Networks
(RANs) and the potential solutions presented by the Open RAN (O-RAN) archi-
tecture. The ever-increasing demand for data and the proliferation of connected
devices have placed tremendous strain on traditional RANs, calling for a more flexi-
ble, scalable, and efficient approach. O-RAN, with its virtualization and disaggrega-
tion techniques, provides a promising solution to meet these demands and optimize
network operations.
In this thesis, we set forth a wide-ranging set of objectives that span critical as-
pects of O-RAN optimization. At the forefront of our approach, we acknowledge the
significance of traditional optimization methods, which include heuristic algorithms,
in addressing the complexities of RAN optimization. These time-tested techniques
form a sturdy foundation for our research, offering reliable and efficient solutions to
enhance the performance of O-RAN networks.
Furthermore, the thesis explores the application of AI-driven network manage-
ment and machine learning/artificial intelligence (ML/AI) techniques to enhance
overall network performance. By leveraging these advanced methodologies, we aim
to optimize decision-making processes and adaptively allocate resources in real-time,
ultimately leading to improved efficiency and responsiveness in O-RAN environ-
ments.

1
1.2 Motivations and Objectives
Radio Access Networks (RANs) traditionally rely on integrated, inflexible hardware
at base stations, posing challenges in quickly reconfiguring networks to adapt to dy-
namic demands without manual intervention. To address this, the Open RAN (O-
RAN) architecture introduces programmability, openness, virtualization, and disag-
gregation principles [5]. Disaggregation involves separating base station functions
into different O-RAN nodes: the Open Radio Unit (O-RU), Open Distributed Unit
(O-DU), and Open Centralized Unit (O-CU). These nodes feature open interfaces
for seamless interoperability among vendors. Virtual Network Functions (VNFs)
now implement legacy RAN operations, enabling adaptability to dynamic network
environments. The RAN Intelligent Controller (RIC) acts as a centralized decision-
maker based on real-time or non-real-time network intelligence.
RAN disaggregation empowers operators to determine VNF split points, decid-
ing which functions reside in O-CUs or O-DUs. This flexibility caters to diverse
requirements in 5G and beyond. However, choosing optimal split points presents
technical challenges due to variations in delay, bandwidth, and computational load
across O-CUs and O-DUs [4, 6]. Fixed split points struggle to sustainably optimize
performance in dynamic traffic conditions, where data processing loads and traffic
demands fluctuate along the network links. Recent studies have introduced an effi-
cient optimization scheme that addresses various objectives, including mobility man-
agement and resource allocation, while considering optimized VNF placement and
traffic steering in O-RAN environments. Nonetheless, more adaptive solutions are
needed to enable dynamic re-optimization of traffic and load distributions through
VNF migration and dynamic traffic steering between O-CUs and O-DUs. It is crucial
to maintain moderate network reconfiguration operations to avoid excessive costs
and traffic instability.
To address the challenges mentioned earlier, a combination of AI-driven net-
work management, ML/AI techniques, and traditional optimization methods, such
as heuristic algorithms, can be employed. AI-driven network management lever-
ages advanced analytics, automated decision-making, and adaptive learning to opti-
mize resource allocation, traffic management, and network slicing in real-time. The
objectives of this thesis, focused on enhancing VNF splitting in O-RAN, can be
summarized as follows:

• Load Balance VNF Splitting: The goal is to distribute the workload across
CUs and network links to prevent network congestion and overload on O-RAN

2
nodes. A heuristic solution will be employed to intelligently allocate resources
and optimize traffic distribution, ensuring effective load balancing.

• Robust Decision-Making for Dynamic VNF Splitting in O-RAN: In real-life


O-RAN environments, dynamic traffic conditions may necessitate VNF re-
configurations during runtime, leading to increased overhead costs and traffic
instability. Minimizing the overhead of VNF reconfigurations under such con-
ditions is critical for ensuring network stability. To address this challenge, the
proposed solution leverages AI algorithms to provide robust dynamic VNF
splitting effectively.

• Energy-Efficient VNF Splitting: The growing need for energy-efficient solu-


tions emphasizes the importance of optimizing dynamic VNF splitting for
improved energy efficiency. To achieve this goal, AI-based methods will be
employed to intelligently manage resource allocation, dynamically adjust VNF
assignments, and optimize power usage.

• Edge AI Empowered Dynamic VNF Splitting: Edge-AI enhances the O-RAN


architecture by enabling local data processing and faster decision-making at
the network edge. We propose a federated Deep Reinforcement Learning
(DRL) approach for dynamic VNF splitting for multiple network slices. The
objective is to maximize edge site resource utilization while minimizing VNF
reconfiguration overhead.

By addressing these objectives through the integration of AI-based solutions,


this thesis aims to enhance the efficiency, robustness, energy efficiency, and respon-
siveness of VNF splitting in O-RAN environments.

1.3 Technical Challenges


In addressing the technical challenges of this thesis, it is imperative to focus on the
mathematical formulation of objectives outlined in Section 1.1. The primary aim
is to propose robust algorithms and strategies for Virtual Network Function (VNF)
splitting in O-RAN, with the overarching goal of enhancing network performance
and resource utilization. It is essential to highlight that these objectives give rise to
multi-objective optimization problems, which are known to be NP-hard, presenting
significant computational complexity.

3
Moreover, as part of our solution, we leverage the power of Artificial Intelligence
(AI). By incorporating AI techniques into our approach, we can effectively tackle
the complex constraints associated with VNF splitting in O-RAN. AI enables us
to dynamically optimize processing capacity, satisfy end-to-end delay requirements,
and efficiently manage mid-haul link bandwidth. This strategic integration of AI
empowers our solution to adapt in real-time to varying network conditions and
demands, thus achieving improved network efficiency and overall performance. Here
are the technical challenges associated with each objective:

• Load Balance VNF Splitting (Chapter 3):

– Challenge 1: Determine a precise multi-objective problem formulation for


optimal load balancing strategies across CUs and network links, consider-
ing network constraints such as delay, bandwidth requirements, and the
capacities of CUs and DUs. Checking the NP-hardness of the resulting
optimization problem.
– Challenge 2: Propose a heuristic algorithm that efficiently allocates re-
sources to achieve load balancing while ensuring a small gap with the
optimal solution.

• Robust VNF Splitting (Chapter 4):

– Challenge 1: Determine a precise multi-objective problem formulation,


considering network constraints, for robust VNF splitting that minimizes
the overhead of VNF reconfigurations. Proof of the NP-hardness of the
resulting optimization problem.
– Challenge 2: Explore AI-based approaches to effectively handle the multi-
objective nature of the problem while meeting network constraints such
as delay, bandwidth requirements, and the capacities of CUs and DUs.

• Energy-Efficient VNF Splitting (Chapter 5):

– Challenge 1: Determine a precise multi-objective problem to present dy-


namic VNF splitting to minimize total energy consumption in the O-RAN
system.
– Challenge 2: Design an AI-based solution that effectively solves the pro-
posed optimization problem by considering network constraints such as
delay, and the capacities of CUs and DUs.

4
• Edge AI Empowered Dynamic VNF Splitting (Chapter 6):

– Challenge 1: Developing a novel system model that effectively integrates


edge-AI into the O-RAN architecture, leveraging federated learning as a
distributed learning approach.
– Challenge 2: Formulating a precise reward function in DRL that effec-
tively captures multiple objectives, aiming to maximize resource utiliza-
tion at edge sites while simultaneously minimizing the overhead of VNF
reconfigurations.

These challenges highlight the importance of network design, precise problem


formulation, and the integration of AI techniques to address load balancing, robust-
ness, energy efficiency, and decision-making aspects in VNF splitting for O-RAN
networks.

1.4 Technical Contributions


Here are the key technical contributions of this thesis:

• Load Balance VNF Splitting (Chapter 3):

– Performance evaluation shows that the gap between optimal and heuristic
solutions does not exceed 2%.
– An in-depth analysis of different centralization levels shows that using
multi-CUs could reduce the total bandwidth usage by up to 20%.
– The network constraints defined in this chapter serve as a reference point
for the subsequent chapters.

• Robust VNF Splitting (Chapter 4):

– We propose a multi-objective optimization problem that minimizes VNF


computational costs and overhead of periodical reconfigurations simulta-
neously.
– Our solution uses constrained combinatorial optimization with deep rein-
forcement learning, where an agent minimizes a penalized cost function
calculated by the proposed optimization problem.

5
– We evaluate the impact of centralizing computing resources on network
performance and vary the weights of different terms in the multi-objective
optimization problem.
– Our performance evaluation highlights the significant improvements of
the proposed solution, including a reduction of up to 76% in the over-
head of VNF reconfigurations, despite a slight increase of up to 23% in
computational costs.
– When compared to the most robust O-RAN system that doesn’t require
VNF reconfigurations, which is Centralized RAN (C-RAN), our solution
offers up to 76% savings in bandwidth while showing up to 27% overpro-
visioning of CPU.
– The proposed DRL-based solution in this chapter serves as a reference
point for the subsequent chapters.

• Energy-Efficient VNF Splitting (Chapter 5):

– Proposes an innovative energy-efficient Radio RAN disaggregation and


virtualization method for O-RAN that effectively addresses the challenges
posed by dynamic traffic conditions.
– The energy consumption is primarily formulated as a multi-objective op-
timization problem and then solved by integrating the Advantage Actor-
Critic (A2C) algorithm with a sequence-to-sequence model due to the
sequential nature of RAN disaggregation and long-term dependencies.
– The proposed solution contributes to the research on dynamic VNF split-
ting in O-RAN environments and highlights the importance of considering
the impact of dynamic traffic conditions on energy consumption. Ac-
cording to the results, our proposed solution for dynamic VNF splitting
outperforms approaches that do not involve VNF splitting, significantly
reducing energy consumption. The solution achieves up to 56% and 63%
for business and residential areas under traffic conditions, respectively.
– The proposed O-RAN architecture in this chapter serves as a reference
point for the subsequent chapter.

• Edge AI Empowered Dynamic VNF Splitting (Chapter 6):

– We propose a novel system model that integrates edge-AI into the O-


RAN architecture, leveraging federated learning as a distributed learning

6
approach. Our approach involves each edge site acting as an independent
agent, utilizing a federated DRL approach for dynamic VNF splitting in
each network slice. By training models locally, we accelerate the decision-
making process, reduce latency, improve responsiveness, and eliminate
single points of failure. The agents then share their local models with
a near-real-time RIC (near-RT RIC), which aggregates them to create a
global model. This federated approach enhances agent performance while
maintaining decentralized control.
– The objective of dynamic VNF splitting is to maximize resource utiliza-
tion at edge sites while minimizing the overhead of VNF reconfigurations
due to dynamic traffic conditions.
– The performance evaluation shows the superiority of the proposed solu-
tion over distributed DRL. We explore network Key Performance Indica-
tors (KPIs) by adjusting the reward function weighting factor in DRL.
Additionally, fine-tuning the learning rate narrows the performance gap
with the optimal solution by 3%.

These contributions demonstrate the effectiveness of the proposed approaches


in addressing resource allocation and VNF splitting challenges in O-RAN environ-
ments.

1.5 Structure of the Thesis


The remainder of the thesis can be summarized as follows:

• Chapter 2: This chapter provides a comprehensive overview of the research


background, including O-RAN architecture, VNF splitting in O-RAN, and the
application of ML/AI techniques in O-RAN.

• Chapter 3: The focus of this chapter is on load balancing VNF splitting across
CUs and network links, taking into account delay and bandwidth requirements.

• Chapter 4: This chapter addresses the challenge of dynamic traffic conditions


in real-life O-RAN environments by proposing a multi-objective optimization
problem. The objective is to minimize VNF computational costs and overhead
of periodical reconfigurations simultaneously.

7
• Chapter 5: In this chapter, we propose an innovative energy-efficient RAN
disaggregation and virtualization method for O-RAN. This method effectively
tackles the challenges posed by dynamic traffic conditions.

• Chapter 6: The focus of this chapter is on the utilization of edge AI to enhance


the O-RAN architecture. By enabling local data processing and faster decision-
making for dynamic VNF splitting, this approach contributes to the efficient
management of network slices.

• Chapter 7: This chapter is the conclusion and future works.

These chapters collectively contribute to a comprehensive understanding of VNF


splitting in O-RAN, addressing various challenges and proposing innovative solutions
for improved network performance and resource utilization.

1.6 Training Attended


• Welcome to your Doctorate - Virtual

• Demonstration in Laboratories - Virtual

• Confirmation Process - Virtual

• Viva Examination - Virtual

1.7 List of Publications


• Amiri, Esmaeil, Ning Wang, Mohammad Shojafar, and Rahim Tafazolli. ”Op-
timizing virtual network function splitting in open-RAN environments.” In
2022 IEEE 47th Conference on Local Computer Networks (LCN), pp. 422-
429. IEEE, 2022 [6].

• Amiri, Esmaeil, Ning Wang, Mohammad Shojafar, and Rahim Tafazolli. ”Energy-
Aware Dynamic VNF Splitting in O-RAN Using Deep Reinforcement Learn-
ing.” IEEE Wireless Communications Letters (2023) [7].

• Amiri, Esmaeil, Ning Wang, Mohammad Shojafar, Mutasem Q. Hamdan,


Chuan Heng Foh, and Rahim Tafazolli. ”Deep Reinforcement Learning for
Robust VNF Reconfigurations in O-RAN.” IEEE Transactions on Network
and Service Management (2023) [8].

8
• Amiri, Esmaeil, Ning Wang, Mohammad Shojafar, and Rahim Tafazolli. ”Edge-
AI Empowered Dynamic VNF Splitting in O-RAN Slicing: A Federated DRL
Approach” IEEE Communications Letters (2023), (Submitted on Nov. 2023).

I contributed to a collaborative survey paper focusing on traffic steering and


VNF splitting chapters:

• Hamdan, Mutasem Q., Haeyoung Lee, Dionysia Triantafyllopoulou, Rúben


Borralho, Abdulkadir Kose, Esmaeil Amiri, David Mulvey, Wenjuan Yu, Rafik
Zitouni, Riccardo Pozza, Bernie Hunt, Hamidreza Bagheri, Chuan Heng Foh,
Fabien Heliot, Gaojie Chen, Pei Xiao, Ning Wang and Rahim Tafazolli. ”Re-
cent Advances in Machine Learning for Network Automation in the O-RAN.”
Sensors 23, no. 21 (2023): 8792 [9].

1.8 Summary
In this chapter, we delved into the motivations, objectives, technical challenges,
and key technical contributions of the thesis. The motivations arise from the chal-
lenges faced by traditional RANs and the opportunities presented by the O-RAN
architecture, which introduces virtualization, disaggregation, and AI-driven network
management. The objectives revolve around load-balancing VNF splitting, robust
VNF splitting, energy-efficient VNF splitting, and delegation decision-making. Each
objective poses technical challenges, such as network modeling, problem formulation,
and AI integration. The thesis addresses these challenges through the proposal of O-
RAN system designs, the development of heuristic algorithms, and the exploration
of centralized and distributed AI approaches. The proposed solutions are evaluated
using both abstract and real network topologies, demonstrating remarkable results
and the effectiveness of the approaches.

9
Chapter 2

Background and Literature


Review

2.1 Overview
In this chapter, we will delve into the realm of Open Radio Access Networks (O-
RAN) and explore the potential of incorporating Machine Learning (ML) and Arti-
ficial Intelligence (AI) techniques within this architecture. We begin by providing an
overview of the O-RAN architecture in Section 2.2, highlighting its key components
and functionalities. Next, in Section 2.3, we delve into the realm of software-based
ML techniques, discussing various approaches and algorithms that can be applied
within the O-RAN context.
Section 2.5 focuses on the practical applications of ML and AI within the O-RAN
framework. Here, we explore how ML algorithms can be leveraged to enhance various
aspects of O-RAN operations, such as traffic classification, routing optimization,
QoS/QoE prediction, and resource management. We delve into the specifics of each
application, highlighting relevant research papers and methodologies.

2.2 O-RAN
Legacy Radio Access Networks (RANs) [10] have traditionally relied on black-box
hardware deployed at base stations. In this setup, network functions are tightly
integrated physically, making it challenging to quickly reconfigure the network in
response to changing demands without manual operations at the site [11–15]. How-
ever, the emergence of the Open RAN (O-RAN) solution has brought about a new
architecture that addresses these limitations.

10
Figure 2.1: O-RAN architecture overview [1]

The O-RAN architecture [16], as shown in Fig .2.1, is built on the principles
of programmability, openness, virtualization, and disaggregation. Disaggregation
refers to the separation of base station functions into different types of O-RAN
nodes: the Open Radio Unit (RU), Open Distributed Unit (DU), and open Cen-
tralized Unit (CU). The CU further enhances the disaggregation by dividing the
control plane (CU-CP) and user plane (CU-UP) functionalities. The CU-CP han-
dles control and management tasks, including radio resource management, handover
decisions, and network optimization. It acts as the intelligent controller that orches-
trates and coordinates the various elements of the network. The CU-UP, on the
other hand, focuses on processing user data traffic, providing functions such as en-
cryption, decryption, and traffic routing. These nodes are designed to have open
interfaces between them, ensuring interoperability across vendors. By decoupling
the functions and introducing standardized interfaces, O-RAN enables greater flex-
ibility and vendor diversity in RAN deployments. The main O-RAN interfaces as
shown in Fig .2.1 are:

• O1 Interface: Supports management entities within the Service Manage-


ment and Orchestration (SMO) framework, encompassing O-RAN managed
elements such as Operation and Maintenance (OAM), multi-vendor manage-
ment functions including FCAPS (Fault, Configuration, Accounting, Perfor-
mance, Security), and software management.

11
• O2 Interface: Connects the SMO to the ORAN O-Cloud, managing infras-
tructure and deployment services. Infrastructure management handles cloud
infrastructure deployment and management, while deployment management
oversees life cycle management on the cloud infrastructure.

• A1 Interface: Defined between non-RT RIC and near-RT RIC. It involves


the non-RT RIC providing operational guidance, such as policies for managing
machine learning models in xApps, orchestration, automation, and 5G gNB.

• E2 Interface: Forwards measurements from E2-nodes (DUs, CUs, O-RAN


compliant LTE eNBs) to the near-RT RIC and sends configuration commands
back to the DUs and CUs. Allows network control over ongoing operations
within the base station and supports monitoring, controlling, and data collec-
tion from these units via xApps.

With the adoption of the O-RAN architecture, the operations of legacy RANs
are now implemented through Virtual Network Functions (VNFs). Softwareization
of RAN functions offers exciting opportunities to program VNFs to dynamically
respond to changes in the network environment. This programmability allows for
adaptive configurations, enabling operators to optimize network performance and
resource allocation based on real-time conditions.
The RIC plays a pivotal role in O-RAN deployments, enabling intelligent man-
agement and decision-making. Serving as a centralized software controller, the RIC
leverages its equipped network intelligence to make informed decisions, catering to
both real-time and non-real-time scenarios. This allows for dynamic and adaptive
network operations within the O-RAN architecture.
Software-Defined Networking (SDN) [17–30] and the RIC are closely intercon-
nected. SDN revolutionizes network management by separating the control plane
from the data plane, providing flexibility and programmability. By leveraging SDN
principles, RIC optimizes network performance, orchestrates resources efficiently,
and facilitates the deployment of innovative services.
The RIC acts as a coordination point, facilitating efficient management and or-
chestration of the diverse O-RAN nodes. By analyzing network data and leveraging
ML algorithms, the RIC can intelligently adapt and respond to changing network
conditions, ensuring optimal utilization of network resources.
In O-RAN deployments, the RIC architecture consists of two distinct compo-
nents: the near-real-time RIC (near-RT RIC) and the non-real-time RIC (non-RT

12
RIC). The near-RT RIC focuses on real-time decision-making, enabling quick re-
sponses to dynamic network events. It handles critical operations that require im-
mediate actions, such as load balancing, congestion control, and network slicing. The
near-RT RIC ensures efficient and reliable real-time network operation, enhancing
the overall user experience.
On the other hand, the non-RT RIC is responsible for non-real-time decision-
making processes. It handles tasks that require a broader network view and long-
term planning, such as network optimization, capacity planning, and policy enforce-
ment. The non-RT RIC utilizes historical data and predictive analytics to make
strategic decisions that optimize network performance and resource allocation over
an extended time horizon.
In addition to the RIC, O-RAN deployments can incorporate external applica-
tions known as xApps and rApps. xApps run on top of the RIC platform and utilize
its network intelligence and interfaces to provide specialized functionalities. These
can include intelligent traffic steering, network slicing management, or interference
mitigation. xApps allow operators to customize and enhance the capabilities of the
RIC to suit their specific network requirements.
rApps are software applications that run directly on the O-RAN nodes, such
as the DU or CU. These applications focus on localized optimization and control
of specific network elements, such as radio resource management or interference
cancellation. By running directly on the O-RAN nodes, rApps enable efficient and
localized decision-making, reducing the need for centralized control and minimizing
latency.
By embracing the O-RAN architecture, operators can benefit from increased
agility, interoperability, and flexibility. The programmability and disaggregation of
network functions allow for more rapid deployment of new services, improved scal-
ability, and efficient resource utilization. Moreover, the open interfaces promote
competition among vendors, leading to innovation, cost reduction, and faster tech-
nology evolution in the RAN domain.

2.2.1 RAN Disaggregation


RAN disaggregation introduces the concept of choosing a split point for Virtual
Network Functions (VNFs) based on specific operator requirements [31–35]. As
shown in Fig. 2.2, there are eight different split options across DU and CU. This
flexibility is crucial to meet the diverse demands of various network applications in
the 5G and future networks.

13
Figure 2.2: VNF split options in O-RAN [2]

However, selecting the optimal split point poses technical challenges due to vari-
ations in delay, bandwidth, and computational load between CUs and DUs [2]. The
characteristics and requirements of different services may vary significantly, making
it difficult to determine a fixed split point that can consistently optimize network
performance under dynamic traffic conditions. The dynamic nature of data process-
ing loads at CUs and DUs, as well as the traffic demand along the network links
connecting them, further complicates the optimization process [4, 6].
In recent research, an efficient optimization scheme has been proposed to deter-
mine the optimized placement of VNFs and traffic steering in O-RAN environments,
with a specific focus on load balancing [6]. This work aims to achieve an optimal
distribution of traffic and loads by strategically placing VNFs and steering traffic
between CUs and DUs. By dynamically adapting to changing network conditions,
this approach enhances overall network performance and resource utilization.
However, while existing optimization schemes provide initial solutions, there
is a need for more adaptive and dynamic approaches to achieve continuous re-
optimization of traffic and load distributions. This can be accomplished through
the migration of O-RAN VNFs and dynamic traffic steering between CUs and DUs.
By constantly monitoring network conditions, such as traffic patterns and resource
utilization, operators can make informed decisions to dynamically reconfigure the
network and optimize performance in real-time.
It is important to note that network reconfiguration operations should be con-
ducted with moderation to avoid excessive reconfiguration costs and traffic insta-

14
bility. Frequent and drastic reconfigurations may lead to disruptions in service,
suboptimal resource allocation, and additional operational expenses. Therefore, a
careful balance needs to be struck between achieving dynamic re-optimization and
maintaining network stability and reliability.
To address these challenges, ongoing research is exploring various innovative
approaches. Here are eight different splits proposed in the literature:

• Functional Split: This approach involves the differentiation between user plane
and control plane functions, allowing for more specialized and optimized pro-
cessing. By separating these functions, the network can achieve better scala-
bility, flexibility, and efficiency. Research studies have explored optimal func-
tional splits considering various factors, such as traffic demands, latency re-
quirements, and resource utilization [36–38].

• Time-Domain Split: In this method, functions are separated based on their


latency requirements, with low-latency functions placed closer to the radio
interface. This time-domain splitting allows for better responsiveness in real-
time applications and enhances the overall quality of service. Recent research
efforts have focused on minimum delay-based splitting strategies and its impact
on network performance [39–41].

• Frequency-Domain Split: This approach involves allocating functions to dif-


ferent frequency bands or subcarriers, optimizing resource allocation and in-
terference management. By dividing functions based on frequency domains,
the network can achieve better spectral efficiency and reduced inter-cell inter-
ference. Studies have explored the impact of frequency-domain splitting on
network capacity and overall system performance [42].

• Load-Based Split: This dynamic approach splits functions based on workload


distribution, enabling load balancing across CUs and DUs. By adapting the
functional splitting based on traffic variations, the network can efficiently uti-
lize available resources and maintain a high quality of service during peak
periods. Research has addressed load-based splitting algorithms and their
impact on network performance [43].

• Service-Based Split: This approach involves allocating functions based on spe-


cific service requirements, tailoring the network architecture to different ap-
plications. Different services, such as enhanced mobile broadband and ultra-
reliable low-latency communications, have varying demands on the network.

15
Service-based splitting allows for customized resource allocation to meet di-
verse service needs. Researchers have investigated the benefits of service-based
splitting for different 5G and 6G use cases [44–46].

• Location-Based Split: In this approach, functions are placed based on geo-


graphical proximity to user clusters, optimizing latency and capacity for spe-
cific areas. By strategically distributing functions, the network can ensure
efficient service delivery to specific regions, such as urban areas or hotspots.
Studies have examined scalable location-based splitting schemes to cater to
varying geographical demands [47].

• Cost-Optimized Split: This approach involves determining split points consid-


ering both network performance and cost factors. By finding the right balance
between functionality and cost-effectiveness, the network can achieve better
efficiency in deploying O-RAN architectures. Research efforts have addressed
the trade-offs and optimization techniques for cost-optimized splitting [48].

• QoS-Aware Split: This approach involves adaptive splitting of functions based


on Quality of Service (QoS) requirements. By dynamically adjusting the func-
tional split based on QoS metrics, the network can ensure optimal service
delivery for different applications and user demands. Research studies have
explored QoS-aware splitting algorithms and their impact on network perfor-
mance and user experience [49].

By exploring these split options and developing advanced algorithms and mech-
anisms, operators can achieve more granular control over their networks, optimize
performance, and effectively address the challenges posed by dynamic traffic condi-
tions in disaggregated RAN environments.

2.3 AI/ML Methods


ML is a subset of AI that focuses on enabling systems to learn patterns and struc-
tures from data. ML techniques typically involve two main steps: training and
testing. During the training step, the system utilizes a training dataset to apply
ML algorithms and learn a model that captures the underlying patterns in the data.
In the testing step, the trained model is used to predict outputs for new input data.
ML algorithms can be classified into four main categories: supervised learning, un-
supervised learning, semi-supervised learning, and reinforcement learning [50].

16
• Supervised learning: This is a type of ML technique where the learning
process is guided by labeled data. In the training step, the goal is to discover
an unknown function f : x → y that maps input data to known output labels.
This process requires a labeled training dataset to train the model. In the
testing step, when a new input is provided to the system, the learned function
f is used to predict the expected output. Commonly used supervised learning
algorithms include Logistic Regression (LR), K-Nearest Neighbors (K-NN),
Random Forest (RF), Decision Tree (DT), Support Vector Machine (SVM),
Bayes’ theory, and hidden Markov models.

• Unsupervised learning: This is a type of ML technique that does not rely


on labeled data for training. In the training step, the objective is to classify
or cluster unlabeled data based on similarity measures. Unsupervised learning
algorithms aim to discover patterns or structures in the data without prior
knowledge of the output labels. Popular unsupervised learning algorithms
include k-means, hierarchical clustering, Gaussian Mixture Models (GMM),
and self-organizing maps.

• Semi-Supervised Learning: Semi-supervised learning combines elements


of both supervised and unsupervised learning. In many real-world scenarios,
obtaining labeled data can be expensive or challenging, while acquiring un-
labeled data is often more accessible. Semi-supervised learning leverages the
availability of unlabeled data along with a smaller set of labeled data to im-
prove learning accuracy. It offers an effective approach in situations where
labeled data is limited. Common semi-supervised learning algorithms include
Pseudo Labeling, Expectation Maximization (EM), co-training, transductive
SVM, and graph-based methods.

• DRL: DRL differs from supervised and unsupervised learning as it does not
involve explicit training and testing steps. DRLis focused on an agent learning
an optimal policy through interactions with an environment. The interaction
is defined by the agent taking actions, receiving observations, and obtaining
rewards from the environment. The agent’s goal is to maximize the cumula-
tive rewards over time by learning which actions to take in different states.
Q-learning is one of the most well-known and effective algorithms used in rein-
forcement learning. Deep RL (DRL) as shown in Fig. 2.3 further extends RL
by incorporating deep neural networks as function approximators, enabling the
handling of high-dimensional and complex state spaces. This allows deep RL

17
Figure 2.3: DRL architecture overview

to tackle challenging tasks, such as playing complex games or controlling so-


phisticated systems, by leveraging the representation power of neural networks
to approximate the optimal policy more efficiently and effectively.

• Federated learning (FD): FD is a decentralized machine learning approach


where models are trained across multiple decentralized devices or servers hold-
ing local data, without the need to share that data centrally. This innovative
technique enables training machine learning models collectively while keeping
sensitive information on individual devices, thus addressing privacy concerns.
Instead of transferring raw data to a central server, only model updates are
shared, allowing each device to learn from the collective knowledge while pre-
serving data privacy and security. Federated learning has gained prominence
in various fields, including healthcare, finance, and telecommunications, offer-
ing a privacy-preserving way to enhance machine learning models’ accuracy
and efficiency across distributed environments.

In summary, ML techniques encompass various approaches that enable systems


to learn from data. Supervised learning utilizes labeled data to learn a function that
maps inputs to known outputs. Unsupervised learning discovers patterns or struc-
tures in unlabeled data without relying on predefined output labels. Semi-supervised
learning combines labeled and unlabeled data to improve learning accuracy. Rein-
forcement learning involves an agent learning optimal actions through interactions
with an environment. Each category of ML algorithms has its own strengths and
applications, making them suitable for different types of problems and domains.

18
2.4 Traditional vs AI/ML Methods for Network
Optimization
Traditional optimization methods, such as heuristic methods, rely on pre-defined
rules and heuristics to find the optimal solution to a problem, while ML, and par-
ticularly DRL, uses data and experience to learn the optimal solution through trial
and error. On the other hand, ML techniques, such as DRL, have the potential to
learn more complex patterns and can potentially provide a better performance, but
they require large amounts of data and training time. DRL has been successfully
applied to many optimization problems, particularly in dynamic and complex sys-
tems where traditional methods may not be effective. In DRL, the goal is to learn
an optimal strategy to get the maximum reward from an environment by using neu-
ral networks. This optimal strategy is learned periodically by interacting with the
environment and observing the result. Applications of DRL in a RAN environment
include traffic engineering and routing [51], resource allocation [52–54], and energy
consumption [55, 56]. For instance, in [51], an DRL -based method for optimal split
points for VNF has been proposed that minimizes routing and VNF placement costs.
Similarly, in [56], the authors proposed a Deep Q-Network (DQN)-based algorithm
for functional splits in O-RAN to minimize the end-to-end delay and deployment
cost of DUs/DUs. Overall, while traditional optimization methods can be useful
for simpler problems, DRL has the potential to provide better solutions for more
complex and dynamic problems in RAN environments.
Recently, a novel method called neural combinatorial optimization, which uses
a neural network model to learn near-optimal solutions to combinatorial optimiza-
tion problems through DRL, has been proposed [57]. TheDRLapproach is used
to iteratively update the neural network weights. Studies have shown that neural
combinatorial optimization can achieve near-optimal solutions for typical combina-
torial problems [58, 59]. In recent years, the effectiveness of this approach in solving
various optimization problems in communications and networks has been demon-
strated [51, 60–62].

19
2.5 Optimizing O-RAN: Enhancing Efficiency and
Performance
In the context of Open Radio Access Networks (O-RAN), RIC plays a crucial role in
network management. Similar to the SDN controller, the RIC provides a global view
of the network, enabling efficient management and optimization. ML techniques can
be applied in the O-RAN architecture to enhance network performance and decision-
making. Let’s explore how ML can be applied in O-RAN.

• Traffic classification: Traffic classification is essential for Quality of Service


(QoS) and network resource allocation. Traditional methods like TCP/UDP
port matching have limitations due to the use of private and dynamic ports.
Deep Packet Inspection (DPI) matches packet payloads with known patterns
to infer the application. However, DPI has constraints related to pattern avail-
ability and encrypted flows. ML-based methods are more practical, capable
of identifying encrypted flows and classifying traffic accurately. The RIC, act-
ing as an intelligent agent, can collect data from the data planes and apply
supervised and unsupervised ML techniques for traffic classification [63, 64].

• Routing optimization: In O-RAN, the RIC controls routing by setting rules in


the routing tables of the data planes. Routing decisions directly impact end-
to-end delay and throughput. Optimal routing decisions are crucial research
topics in network optimization. While shortest path and heuristic algorithms
are commonly used, ML-based routing optimization can provide near-optimal
solutions without precise mathematical models. Reinforcement learning has
been studied as an effective method in some papers, enabling the RIC to make
intelligent routing decisions [65–67].

• QoS/QoE prediction: Quality of Service (QoS) parameters such as packet


loss, jitter, throughput, and delay are used to analyze network performance.
With the increasing use of multimedia technologies, the concept of Quality of
Experience (QoE) has emerged, referring to user satisfaction with a service.
The RIC can collect data from the data planes and utilize ML techniques to
predict the required QoS/QoE, enabling proactive optimization and resource
allocation to meet user expectations [68].

• Resource management: Efficient network resource management is a crucial


aspect of O-RAN. The RIC can provide resource management for both the

20
data plane and control plane by maximizing resource utilization. Resource
allocation in the data plane and control plane is an active research area in O-
RAN, with ML techniques offering promising solutions for optimizing resource
allocation and utilization in a dynamic network environment [40, 52].

By leveraging ML techniques in O-RAN, the RIC can enhance network per-


formance, optimize resource allocation, improve QoS/QoE, and make intelligent
routing decisions. These advancements contribute to a more efficient and adaptive
O-RAN architecture, enabling operators to meet the evolving demands of modern
communication networks.

2.5.1 O-RAN Resource Management


In [69], on-policy and off-policy DRL methods were compared, with on-policy show-
ing better overall results, including simpler implementation and improved trade-offs
between user latency and energy consumption. In [70], resource allocation between
eMBB and URLLC services is addressed using DRL , achieving close performance to
an optimal solver with better fairness. The study in [71] introduced an DRL -based
framework for managing traffic flows in O-RAN, offering flexibility and scalability
for different numbers of flows. Finally, [72] focused on adaptive resource allocation
in the non-RT RIC domain, demonstrating the ML agent’s ability to learn a policy
that can dynamically adapt to changing environments while meeting energy-related
criteria.
Collectively, these studies contribute to the advancement of DRL techniques in
O-RAN environments, addressing resource allocation, fairness, traffic management,
and adaptability, and showcasing the potential of DRL for improving the efficiency
and performance of O-RAN systems.

2.5.2 Dynamic Function Split


Dynamic function splitting plays a crucial role in enhancing the efficiency and adapt-
ability of O-RAN by dividing network node or application functions into smaller
modular components and distributing them across various computing resources. By
incorporating ML, O-RAN can harness real-time intelligence and decision-making
capabilities. ML algorithms analyze network traffic patterns and resource utilization
to forecast future demands, enabling optimized allocation of computing resources
for improved resource utilization, reduced latency, and enhanced overall network
performance. Moreover, ML facilitates the selection of computing resources based

21
on factors such as location, processing power, and energy efficiency, leading to opti-
mized dynamic function splitting and enhanced service delivery to users.
The use of VNF in O-RAN systems leads to the development of different split
points, which refer to the division of network functions between the CU and the
DU. The white paper [2] discusses the different split points for VNF based on 3GPP
terminology, while authors in [49] review the gains and requirements for each split
point option proposed by 3GPP. In [73], the energy consumption variation caused
by different split point options using a real-time prototype is studied, while authors
in [4] propose an optimization problem to design Multi-cloud vRAN, aiming to
find the optimal number of CUs and their locations, as well as the optimal split
point for each flow in the network. The paper [74] proposes an optimal split point
for VNF that maximizes the centralization of vRAN while minimizing computing
resources, using CPLEX [75], which is an optimization tool, to solve the problem.
These studies demonstrate the importance of split point optimization in open RAN
systems to improve performance, energy efficiency, and resource utilization.
Recently, there have been several studies that explored various aspects of O-RAN
optimization, including energy efficiency [76], network slicing [77], resource alloca-
tion [53, 56], and load balancing [6]. In [6], an optimization problem is proposed
to determine the optimal split points for VNF in the O-RAN system for each flow
to balance the load across CUs and midhaul links while taking into account delay,
bandwidth, and computational load on CUs/DUs. To make the 5G architecture
more dynamic, scalable, and flexible, the authors in [40] proposed a hierarchical
RIC that uses Containerized Network Functions (CNFs) instead of VNF, which are
lighter and can be implemented through a microservices architecture. Addition-
ally, [78] proposes FlexNSP, which provides a flexible split point selection for each
network slice in the O-RAN system, and solves the proposed optimization problem
using Gurobi [79]. These studies highlight the importance of O-RAN optimization
for improving various aspects of the system, such as performance, efficiency, and
flexibility.

2.5.3 Energy Efficiency


In O-RAN, energy efficiency poses a notable challenge due to the intricate coor-
dination required among multiple components and technologies within its highly
flexible and scalable design. Balancing energy efficiency with other critical per-
formance metrics like latency and throughput is crucial. While dynamic function
splitting and ML-driven optimization techniques offer potential for efficient resource

22
allocation, addressing challenges associated with limited resources, scalability, and
the dynamic nature of the O-RAN environment is essential to attain optimal en-
ergy efficiency. Overcoming these challenges will enable effective management of
resources and maximize energy efficiency while maintaining performance levels in
O-RAN networks.
The study presented in [80] introduces an energy-aware scheduling method for
virtualized Base Stations (vBS) in O-RAN. The method leverages online learning
and adversarial bandit learning to optimize scheduling policies that reduce energy
consumption and improve vBS performance. The proposed approach, implemented
through a Policy Decider application in the non-RT RIC, adapts policies based
on network conditions and user needs. Experimental results based on real-world
traffic traces and testbed measurements demonstrate that the proposed method
outperforms state-of-the-art approaches, achieving energy savings between 35.5%
and 74.3%.

2.5.4 Mobility Management


Efficient mobility management plays a critical role in ensuring seamless cellular
communication and maintaining service quality for users in dynamic mobile envi-
ronments. The O-RAN architecture provides capabilities for collecting, storing, and
accessing historical traffic and radio data, enabling effective coordination of radio
resources. Real-time monitoring of traffic and radio conditions is facilitated by the
near-RT RIC framework, allowing the deployment of AI/ML applications to detect
and predict handover anomalies at the user level, thereby preventing communication
disruptions and ensuring a high level of service quality.
In [81], the authors propose a predictive handover method to reduce handover
failures by predicting target cells in advance. They utilize the Anomaly Detection,
Traffic Steering, and QoE Predictor xApps in the near-RT RIC platform to ana-
lyze UE data, detect anomalies, and make predictions based on past throughput
data. By incorporating mobile users’ RSRP measurements, the proposed method
achieves higher successful transmission rates compared to conventional handover al-
gorithms. This intelligent prediction approach enables the Traffic Steering xApp to
send commands, such as handover commands, to the RAN for improved handover
performance.

23
2.5.5 Traffic Steering
Research on O-RAN traffic steering aims to enhance network traffic management
by intelligently directing traffic based on real-time network conditions and user re-
quirements. This involves analyzing various network parameters, such as traffic
load, congestion, and user behavior, to determine the most optimal path for traffic
flow. Traffic steering techniques can dynamically distribute traffic across different
network functions and resources in real-time, resulting in improved network perfor-
mance and reduced latency. ML algorithms have a significant role to play in traffic
steering research, providing real-time intelligence and decision-making capabilities
to optimize traffic flow and enhance overall network performance. However, chal-
lenges related to scalability, complexity, and limited resources need to be addressed
to achieve optimal traffic steering in O-RAN environments.
In the study conducted by Kazemifard et al. [40], the O-RAN architecture sepa-
rates the Control Plane (CP) from the User Plane (UP) using the E1 interface, which
is derived from the software-defined network (SDN) framework. This decoupling al-
lows for increased flexibility in network programming. The CP is implemented in
hierarchical RICs and manages radio resource functions through the A1 and E2 in-
terfaces. The authors propose utilizing hierarchical RICs to minimize the end-to-end
delay of data plane traffic by efficiently placing Containerized Network Functions
(CNFs). CNFs, which are lighter than Virtual Network Functions (VNFs) and can
be implemented using microservices architecture, enable a dynamic, scalable, and
flexible architecture that aligns with the requirements of 5G networks [82].
Furthermore, in the paper [83], the application of ML techniques is explored
to achieve modular and flexible implementations of O-RAN in 6G networks, with
a specific focus on the traffic steering use case and O-RAN xApps. The authors
highlight various ML algorithms suitable for traffic steering, including decision trees,
k-nearest neighbor (KNN), and neural networks. Additionally, they discuss the
potential of DRL in training an agent to make real-time traffic steering decisions.
On the other hand, Erdöl et al. propose a federated meta-learning approach for
traffic steering in O-RAN systems in [84]. This approach facilitates multiple Radio
Access Technologies (RATs) to learn from each other without sharing sensitive data.
The authors present a neural network architecture that employs meta-learning to
adapt to different RATs and acquire the ability to steer traffic in a decentralized
manner.
The utilization of Non-Orthogonal Multiple Access (NOMA) in O-RAN systems
to enhance radio resource efficiency is proposed in the paper by Akhtar et al. [85].

24
They present a resource allocation algorithm that adaptively assigns radio resources
by considering the traffic demand and channel conditions of users. By incorpo-
rating NOMA, multiple users can efficiently share the same radio resources while
maintaining a high-quality user experience.
In the paper by Kavehmadavani et al. [86], a traffic steering approach is intro-
duced to ensure effective coexistence between eMBB and uRLLC services within
O-RAN systems. They address this through a multi-objective optimization problem
formulation and a traffic steering algorithm that dynamically directs traffic based on
the network load and user requirements, while also guaranteeing a minimum Quality
of Service (QoS) level for both eMBB and uRLLC users.

2.6 Summary
This chapter provided an overview of the O-RAN architecture, highlighting disag-
gregated nodes (RU, DU, and CU), and explored ML/AI techniques’ applications in
traffic classification, routing optimization, QoS/QoE prediction, and resource man-
agement within O-RAN deployments. While these efforts have optimized network
performance and user experiences, there are gaps in dynamic VNF splitting in O-
RAN environments.
One critical gap is the lack of simultaneous load balancing across RAN nodes and
links, leading to network congestion and server overloading. Existing research has
focused on specific aspects like energy efficiency and resource allocation, necessitat-
ing a more holistic approach. To address this, our thesis proposes a comprehensive
and integrated solution employing multi-objective optimization to enhance resource
utilization and network performance while considering both RAN nodes and links.
Furthermore, another gap is a solution that minimizes frequent VNF reconfig-
urations under dynamic traffic conditions and requires sustained operation under
varying conditions to mitigate costs and traffic instability, ensuring a stable network
environment. The solution needs to leverage DRL and advanced neural network
architectures to handle the complexity of dynamic traffic conditions, enabling more
effective on-demand resource management. Additionally, the resulting solution could
be extended to other objectives such as energy efficiency.
Additionally, in the context of dynamic VNF splitting in O-RAN environments,
there exists a significant gap in offloading decision-making processes for changing
VNF splitting under varying traffic conditions to the edge networks. The current
state of research lacks an approach that leverages the potential of federated DRL at

25
the network edge, which could enable efficient local and real-time decision-making.
By introducing federated DRL, the intelligence and decision-making capabilities
can be distributed across the edge, allowing for more agile and responsive network
management.
In conclusion, this thesis’s contributions lie in addressing the gaps in dynamic
VNF splitting in O-RAN environments through an integrated and comprehensive
approach. By considering load balancing, employing DRL for real-time decisions,
and ensuring robustness under varying traffic conditions, we aim to enhance O-RAN
system performance while advancing the field of AI-driven network management.

26
Chapter 3

Load Balance VNF Splitting in


O-RAN

3.1 Overview
In this chapter, we propose an optimization problem for selecting split points in
the O-RAN architecture. We start by introducing a system model that effectively
captures the VNF placement in the O-RAN setup. Subsequently, we formulate the
problem with the primary objective of achieving load balancing between CUs and
midhaul links while considering essential factors like delay and bandwidth require-
ments, as well as the computing capacity of DUs/CUs.
To assess the complexity of the optimization problem, we demonstrate its NP-
hardness, highlighting the inherent challenges in finding an optimal solution. To
tackle this, we devise a novel heuristic algorithm that efficiently solves the problem
while maintaining a performance gap of less than 2% compared to the optimal
solutions.
Furthermore, we conduct performance evaluations considering different central-
ization levels, which represent the distribution of total computing capacity across
DUs/CUs. Our results indicate that with multiple CUs, there can be up to a 20%
reduction in total bandwidth usage, demonstrating the potential benefits of decen-
tralized computing resources. Additionally, by implementing multipath routing,
we improve load balancing between midhaul links, albeit with a slight increase in
bandwidth utilization.
In conclusion, our proposed optimization approach addresses the challenges of
split point selection in O-RAN, aiming to achieve efficient load balancing and re-
source utilization. Through our heuristic algorithm and performance evaluations,

27
we demonstrate the effectiveness of the proposed solution, paving the way for more
efficient and responsive O-RAN networks.

3.2 Motivation
The motivation behind this chapter’s exploration and proposal of an optimization
problem for split point selection within the O-RAN architecture is rooted in ad-
dressing critical challenges within modern wireless networks. The ongoing evolution
towards O-RAN aims for increased flexibility, efficiency, and cost-effectiveness in
network infrastructures. However, the efficient placement of VNFs and the opti-
mization of split points pose significant challenges. The primary motivation is to
ensure the effective utilization of resources, such as computing capacity, bandwidth,
and minimizing latency, while achieving load balancing among critical network ele-
ments like CUs and midhaul links.
The C-RAN was proposed to implement the network functions of base stations
in common hardware, which would permit different functional splits [2]. The split
specifications are detailed in [87], while [88] discussed the split requirements and
gains. Despite this, only a few works have optimized split selection [89]. The adap-
tive RAN, which can switch between two different centralized options at runtime,
was examined in [90]. A joint RAN slicing and functional split was developed by
the authors in [91] to optimize centralization degree (CD) and throughput.
Neither the works above consider multiple CUs nor load balancing between them,
despite the fact that it’s a vital consideration in vRAN design. [4] and [92] investigate
the minimization of the cost of splits in tree networks with fixed CUs. The authors
of [93] identify CU locations and formulate a minimization problem for the CU
locations. Despite considering multiple CUs, [94] and [95] don’t balance the load
between CUs. Co-locating DUs with CUs in [96] and [73] aims to reduce energy
costs; therefore, the assignment decisions do not affect load balancing. Hence, in this
chapter, we decide to deploy multiple CUs besides DUs, using the new architecture
of the Open-RAN proposed in [97], and balance the load between the CUs and
midhaul links.

3.3 System Model


In this chapter, we present our system model based on 3GPP terminology [98]. We
use the VNF split options, which are described in [2], and the O-RAN architecture

28
Table 3.1: Performance gains of different splits [4]
Split VNF in O-DU VNF in O-CU UL BW (Mbps) Delay (ms) ρdp ρcp
1 f1 → f2 → f3 None λ 30 ρ1 + ρ2 + ρ3 0
2 f1 → f2 f3 λ 30 ρ1 + ρ2 ρ3
3 f1 f2 → f3 1.02λ + 1.5 2 ρ1 ρ2 + ρ3
4 None f1 → f2 → f3 2500 0.25 0 ρ1 + ρ2 + ρ3

proposed in [97]. In this architecture, the traffic flows are aggregated at only two
points of the RAN, RUs, and CUs. This means that from RUs to CUs, there is no
traffic aggregation. Due to this assumption, the network functions are placed per
RUs in DUs and CUs.
We consider a chain of network functions (f 0 → f 1 → f 2 → f 3). f 0 is the RF
network function, which is implemented in RUs, similar to [3,4], while the remaining
network functions can be placed in CU or DU. f 1 is relevant to the RHY function,
f 2 is relevant to RLC and MAC functions, and f 3 is relevant to PDCP and RLC
functions. Table 3.1 shows that there are four possible splits (S0 to S3). Moving
from S0 to S3, more network functions are placed in CU. Table 3.1 also shows
that putting more network functions in CU requires more bandwidth demand. For
example, the bandwidth demand grows from λ Mbps (payload) to 2.5 Gbps from
split S0 to S3. Furthermore, the delay requirement is reduced from 30 to 0.25 ms.
The O-RAN is modeled as a graph G = (I, E) where I is the set of nodes and
E is the set of edges. The nodes include RUs, DUs, CUs, routers, and a core node
(EPC). As shown in Fig. 3.1, multiple RUs are connected to each DU-n by fronthaul
links. The DUs are connected to CUs by routers and midhaul links, and then CUs
have a direct interface to the core node (EPC). We define M, N , R, and Q as the
set of RUs, DUs, routers, and CUs respectively (M = |M|, N = |N |, R = |R|,
Q = |Q|). We assume that Q ≪ N . The edges include network links, and each link
(i, j) has a capacity (Cij ). Each RU-m is connected to CU-q through a set the paths
(Pmq ). For each p ∈ Pmq an end-to-end delay (dpmq ) is defined as well. The network
functions (f 1, f 2, f 3) are implemented in DUs and CUs as virtual machines (VMs).
The processing capacity (cycles/s) of DU-n and CU-q are denoted as Hn and Hq ,
respectively. It is noted that Hq > Hn . Computational load (cycles/Mbps) of f 1,
f 2 and f 3 are denoted as ρ1, ρ2 and ρ3 per traffic unit, respectively. Table 3.1
shows the computational load of CU (ρcs ) and DU (ρds ) result in each split calculated
from ρ1 , ρ2 and ρ3 . We represent only the downlink traffic case in this chapter. In
this way, the traffic flows are aggregated in RU-m, m ∈ M , with the demand of λm
(Mbps), and the O-RAN needs to provide routes for M different flows. Table 4.3

29
Figure 3.1: The network architecture overview

summarizes the model parameters.

Table 3.2: Notation Table

Notation Definition
M, M, m Set of RUs, number of RUs, RU index
N , N, n Set of DUs, number of DUs, DU index
Mn Set of RUs connected to DU-n
Q, |Q|, q Set of CUs, number of CUs, CU index
R Set of routers
Cij Capacity of link i,j
Pmq Set of paths from RU-m to CU-q
dpmq Delay of path p from RU-m to CU-q
ρcs , ρds Computational load of CU and DU result in each split s
ρ1, ρ2, ρ3 Computational load of f 1, f 2 and f 3
Hn , Hq Processing capacity (cycles/s) of DU-n and CU-q
λm Traffic demand of RU-m (Mbps)

3.4 Problem Formulation


As mentioned, placing network functions in DUs reduces bandwidth usage of mid-
haul links, while placement in CUs is more cost-effective due to sharing computing
resources among all RUs. In this chapter, determination of the split point is formu-
lated as an optimization problem, and the objective is to balance the load across

30
CUs as well as midhaul links. In the first place, it leaves free bandwidth on mid-
haul links, which avoids network congestion. Secondly, keeping servers from being
overloaded. The following is the problem formulation for this objective.

3.4.1 Network Constraints


It is noted that the network function placement in DUs and CUs is done per RU.
In this way, we define binary variable xs,m , s ∈ {0, 1, 2, 3}; this variable indicates
which split, according to Table 3.1, is considered for RU-m.

3
X
xs,m = 1 ∀m ∈ M (3.1)
s=0

xs,m ∈ {0, 1} (3.2)

Constraint (3.1) ensures that only one of splits is considered for each RU-m. The
total computational load of RUs results in splitting the network functions in each
DU-n needs to be smaller than the processing capacity of DU-n, hence:

X 3
X
λm xs,m ρds ≤ Hn ∀n ∈ N (3.3)
m∈Mn s=0

where ρds is the computational load of RU-m in the result of split s in DU-n. Hn
is the processing capacity of DU-n. λm is the aggregated traffic flow from RU-m to
the core node (EPC).
The total computational load of RUs results in splitting the network functions
in each CU-q needs to be smaller than the processing capacity of CU-q, hence:

X 3
X
λm φmq xs,m ρcs ≤ αH q ∀q ∈ Q (3.4)
m∈M s=0

0<α ≤1 (3.5)

where ρcs is the computational load in the result of split s at CU and λm is the
aggregated traffic flow from RU-m to the core node (EPC). α is the maximum rate
of load in CUs and performs load balancing among CUs. φmc is a binary variable
that indicates whether RU-m is connected to CU-q. It is noted that the aggregated
traffic flow of each RU-m to the core node (EPC) needs to go through only one CU:

31
X
φmq = 1 ∀m ∈ M (3.6)
q∈Q

φmq ∈ {0, 1} (3.7)

To send the aggregated traffic flow from RU-m to the core node (EPC), only one
path from available paths from RU-m to CU-q (Pmq ) needs to be selected, hence:

X
ϕpmq = 1 ∀m ∈ M, q ∈ Q (3.8)
p∈Pmq

ϕpmq ∈ {0, 1} (3.9)

where ϕpmq is a binary variable which indicates whether path p is selected for con-
nection between RU-m and CU-q. Next, we define rm as the bandwidth demand
between RU-m and CU-q due to split s and calculated as follow:

rm = λm (x0m + x1m ) + x2m (1.02λm + 1.5) + 2500x3m (3.10)

The routing decisions is made with respect to the link capacities:


X X
rm ϕpmq φmq I ij
p
≤ βC ij ∀(i, j) ∈ E, p ∈ Pmq (3.11)
q∈Q m∈M

0<β ≤1 (3.12)

where Ipij indicates whether path p includes link (i,j). This constraint ensures that
the bandwidth demand of RUs assigned to link (i,j) does not exceed its capacity,
and coefficient β is the maximum utilization rate of midhaul links and performs
load balancing between midhaul links. The delay of selected path (dpmq ) from set of
available paths (Pmq ) need to meet the delay required for the selected split s (given
in Table 3.1), hence:

X X 3
X
φmq ϕpmq dpmq ≤ xs,m ds ∀m ∈ M (3.13)
q∈Q p∈Pmq s=0

3.4.2 Objective
Now we can define the load balance problem for midhaul links and CUs as problem
P:

32
P : M in{α + β}
(3.14)
S.t. (1) to (13)

3.4.3 Complexity Analysis


We characterize the complexity of problem P by mapping it as the balancing al-
location of customers in capacitated facility location problem (CFLP) [99], which
is an NP-hard problem. A CFLP is a problem to assign each customer to one of
the potential facilities to provide service concerning the capacity of facilities and
transportation costs. In this way, the objective is to minimize the total cost of
transportation. The problem P is like a multi-level CFLP because both DUs and
CUs serve the RUs. If we only consider the CUs as the facilities and RUs as the
customers, the objective of problem P is to balance the allocation of customers to
facilities as well as the loads on the transportation roads. This means that prob-
lem P is much harder than the balancing allocation of customers in CFLP, which
is an NP-hard problem. In the following, we propose a heuristic algorithm to solve
problem P.

3.5 The Proposed Heuristic Solution


The objective of the optimization problem is to minimize maximum load of the CUs
and the midhaul link, which means the network functions need to be placed in DUs
as many as possible concerning the processing capacity of DUs. In the next step,
the remaining network functions that we refer to as aggregated unassigned network
functions need to be placed in CUs concerning the processing capacity of CUs.
In the proposed heuristic algorithm, we start by placing functions (f 1, f 2, f 3)
as many as possible in DUs with respect to the processing capacity of DUs. In the
initial step, subordinate RUs for each DU-n are sorted in order of decreasing λm .
According to Table 3.1, placing f 1 and f 2 in CU requires more bandwidth compared
to the function f 3. Thus, start by placing f 1 for all RUs as many as possible and
then f 2 for those RUs already f 1 was placed. Finally, place f 3 for RUs already f 1
and f 2 were placed. The details are given in Algorithm 1.
After placing network functions in DUs according to algorithm I, the aggregated
unassigned network functions for each RU need to be placed in only one CU. In
other words, the splits for each RU according to Table 3.1 is given as the result of

33
Algorithm 1 Put as many as possible of NFs in DUs
1: Input: Sn : set of RUs connected to DU-n, Hn , λm , ρf
2: Output: The split for each RU
3: Set split S3 for all RUs
4: for each DU-n do
5: Sort Sn by decreasing order of λm
6: f ←1
7: while f ≤ 3 do
8: for each RU-m in Sn do
9: if λm ρf ≤ Hn then
10: Hn ← Hn − λm ρf
11: Update split of RU-m
12: else
13: Remove RU-m from Sn
14: end if
15: end for
16: f ←f +1
17: end while
18: end for

algorithm I. To place the aggregated unassigned network functions in CUs, we need


to define some notations. Feasible paths are the paths with enough free bandwidth
capacity to support the bandwidth demand and minimum delay requirement (are
given in Table 3.1) incurred by placing the aggregated unassigned network functions
for RU-m to CU-q. Feasible CUs for each RU-m are defined as the CUs with at least
one feasible path. We assume the same bandwidth for all midhaul links. In this
case, bottleneck link is defined as the link with minimum free bandwidth capacity
from the links located on the feasible path from RU-m to CU-q.
In the initial step of the algorithm, place the aggregated unassigned network
functions for the RUs with only one feasible CU to avoid that the feasible CU
resource from running out. For the second step, due to Table 3.1 and equation (7),
the bandwidth demands of RU-m (rm ) and computational load for split s (λm ρcs )
are on the same side, which means that the RU with higher bandwidth demand has
higher computational load at CUs as well. Thus, all RUs are sorted in decreasing
order of computational load for the aggregated unassigned network functions. Next,
start by picking the first item (RU-m). For each feasible CU, select the feasible path
with the maximum free capacity for the bottleneck link, and then calculate α + β
After calculation of α +β or all feasible CUs, select the CU with minimum α +β and
place the aggregated unassigned network functions of RU-m. Finally, return to the
initial step and continue this procedure to place all aggregated unassigned network

34
Algorithm 2 Put aggregated unassigned NFs in CUs
1: Input: S: Set of RUs, Q: Set of CUs, L: Set of links, Rq : Set of RUs connected
to the CU-q, Bm : Set of feasible CUs for RU-m, Tl : Set of RUs using link l
(initially empty)
2: Output: Select a CU for each aggregated unassigned NF
3: while S ̸= ∅ do
4: for m ∈ S do
5: if |Bm | = 1 then
6: Rq ← Rq ∪ {m}
7: S ← S − {m}
8: end if
9: for l ∈ L do
10: Update Tl
11: end for
12: end for
13: m ← arg maxi∈S (λi ρcs )
14: for q ∈ Bm do P c
j∈Ri ∪{q} λj ρs
15: αq ← arg maxi∈Q Hi
16: Pmq ← Feasible path from RU-m to CU-q with max free capacity of
bottleneck link
17: for l ∈ L do
18: if l ∈ Pmq then
19: Tl′ ← Tl ∪ {m}
20: else
21: Tl′ ← Tl
22: end if
23: end for P 
j∈T ′ rj
24: βq ← arg maxi∈L ci
i

25: end for


Find q ∈ Bm with min(αq + βq )
26: for l ∈ L do
27: Update Tl
28: end for
29: Rq ← Rq ∪ {m}
30: S ← S − {m}
31: end while

35
functions for all RUs. It is noted that this procedure could find a feasible solution
because it is assumed that there are sufficient resources to place all the network
functions in DUs and CUs. The details are given in Algorithm II.
The final phase of Algorithm II is the improvement phase. In this phase, better
solutions may be achieved by replacing the RUs in the CUs. Thus, in the final
phase, these availabilities are checked. In this way, we set the solution found as the
best solution. Then, the midhaul link with the lowest free capacity is selected, and
the RUs use this midhaul link. For each of these RUs, check the alternative CUs
and paths with lower α + β. Reassigned the RUs to the best alternative if possible.
This process is done continuously so that no improvement is achieved. Finally, the
complexity of Algorithm II is O |L| |S|2 where |S| is the number of RUs and |L|


is the number of links in the network.

3.6 Performance Evaluation


In this chapter, all system parameters are set the same as the testbed measurements
in [4] and [100]. This setting is based on 1 user/TTI, 2×2 MIMO, 20 MHz (100
PRB), 2 TBs of 75376 bits/subframe and IP MTU 1500B. For CPU capacity, an
Intel Haswell i7-4770 3.40GHz CPU as the reference core (RC) is used, and the
maximum processing capacity of DU and CU are different for each experiment. For
each split s (s ∈ 0, 1, 2, 3), ρds = {0.005, 0.004, 0.00325, 0} RCs per Mbps for
each DU and ρcs = {0, 0.001, 0.00175, 0.005} RCs per Mbps for each CU. The
standard store-and-forward model like [4] and [100] is used to calculate delay. It
is 12000/cij , 4µsecs/Km and 5µsecs for transmission, propagation, and processing
delay, respectively. The link capacity varies to 100 Gbps and the kth-shortest path
is used for routing.
In the first experiment, 120M bps < λn < 200M bps, parameter k in kth-shortest
is set 1, links capacity is set 5Gbps, and DU and CU capacities are 1 and 10 RC,
respectively. Algorithm I is straightforward and put as many as possible network
functions in DUs, while Algorithm II puts the aggregated unassigned network func-
tions in CUs. Algorithm II affects the performance of the proposed solution. Hence,
algorithm II and the optimal solution are compared in Table 4.3 for small-scale
networks. The optimal solution is calculated using SciPy library in Python. In
Table 3.3, Gap = |Optimal Solution − P roposed Heu Solution|
Optimal Solution
. We can see that for the
selected network size the gap does not exceed %2.
For the following experiments, we use Watts–Strogatz model [101] to generate

36
Table 3.3: Comparison of Algorithm II and optimal solution

Number of Nodes Time GAP


RU DU CU R Optimal Algorithm II
8 4 2 4 0.2 sec <0.1 sec % 1
10 5 2 5 0.4 sec 0.1 sec % 2
12 6 2 5 3.65 sec 0.1 sec % 2
14 7 2 6 68.8 sec 0.1 sec % 2
16 8 2 6 1037.8 sec 0.1 sec % 2
18 9 2 7 >>20000 sec 0.14 sec % 2

600
CU load
DU load
500
CU & DU loads

400

300

200

100

λm
100 200 300 400 500

Figure 3.2: Total DU and CU loads vs λm

the topology of midhaul links. Watts–Strogatz model gets three parameters as


input; n, k and p. n is the number of nodes, and each node is connected to k nearest
neighbours in a ring topology. Finally, p is the probability of rewiring each edge.
We create two different topologies; N1: n = 20, k = 2, p = 0.2 and N2: n=20, k =
4, p = 0.2. N1 has 25 links, while N2 has 46. In both topologies, n is the number of
routers and connect the midhaul links, and each DU and CU is directly connected
to only one of the routers. We consider 240 RUs, 60 DUs, 3 CUs and one core node.
Each DU is connected to 4 RUs, and each router is connected to 3 DUs.
In the second experiment, the total processing capacity used due to the splitting
of network function is studied. Hn =2RC, Hq varies to 100 RC and Q=3. Fig. 3.2.
shows the processing usage of the DUs and CUs with respect to different λm s.
Fig. 3.3. shows the number of each network function placed in DUs and CUs. For
λn = 100, all network functions could be placed in DUs, but for other λm s the
computational load of network functions exceeds DUs’ processing capacity, and the
aggregated unassigned network functions are placed in CUs.

37
CU f1 DU f2
DU f1 CU f3
CU f2 DU f3
250

200

Number of NFs
150

100

50

λm
100 200 300 400 500

Figure 3.3: Total number of NFs placed in DU and CU vs λm

 

N  N 

N  N 
OLQNVORDGGLVWULEXWLRQ

OLQNVORDGGLVWULEXWLRQ
N  N 

 N   N 

 

 

 

 
           

1XPEHURI&8V 1XPEHURI&8V

(a) Topology N1 (b) Topology N2

Figure 3.4: The link load distribution vs number of CUs

In the third experiment, the load distribution of midhaul links with respect to
the number of CUs for both topologies N1 and N2 are studied. The results are shown
for different values of k in kth-shortest path algorithm. 130M bps < λn < 150M bps,
Hn =2RC, Hq varies to 80 RC, and Cij = 20 Gbps. Fig. 2.4 depicts that the load
can’t be balanced over the links properly for a single CU, while a good result can be
achieved for deploying more than 4 and 3 CUs in topologies N1 and N2, respectively.
Furthermore, the k=1 (simple shortest path routing) has the worst result when it
comes to balancing the load among midhaul links. In addition, the load can be
better balanced in topology N2 in comparison to N1 since it includes more links.
In the next experiment, 130M bps < λm < 150M bps, k=3 and Hn is changed
from 0 to 3RC. The centralization ratio (CR) is defined as:

38
|Q|=1 2000 |Q|=1
|Q|=2 |Q|=2
2000
|Q|=3 1750 |Q|=3
Total BW usage (Gbps)

Total BW usage (Gbps)


|Q|=4 |Q|=4
|Q|=5 |Q|=5
1500
1500 |Q|=6 |Q|=6
|Q|=7 |Q|=7
1250

1000 1000

750

500 500

250
0 0
0 0.13 0.3 0.49 0.66 1 0 0.13 0.3 0.49 0.66 1
Centralization Ratio (CR) Centralization Ratio (CR)

(a) Topology N1 (b) Topology N2

k=1 k=1
k=2 2000 k=2
2000

Total BW usage (Gbps)


k=3 k=3
Total BW usage (Gbps)

k=4 1750 k=4

1500 1500

1250

1000 1000

750

500 500

250
0 0
0 0.13 0.3 0.49 0.66 1 0 0.13 0.3 0.49 0.66 1
Centralization Ratio (CR) Centralization Ratio (CR)
(c) Topology N1 (d) Topology N2

Figure 3.5: Total bandwidth usage vs centralization Ratio (CR): (a) and (b) for
different number of CUs, (c) and (d) for different k values.

T otal CU capacity usage


CR = (3.15)
T ottal CU capacity usage + T ottal DU capacity usage

When CR=0, all network functions could be placed in DUs (D-RANs), and CR=1
means C-RAN deployment. In this way, when CR increases fro 0 to 1, DU capacity
decreases and CUs are used for a part of the network functions. Fig. 2.5 (a) and (b)
show total bandwidth usage with respect to CR for different number of CUs (|Q|).
As we can see, both topologies N1 and N2 show a drastic increase in total bandwidth
usage when the CR exceeds 0.3. Furthermore, increasing the number of CUs could
result in a reduction of total bandwidth usage. In topology N1, the bandwidth usage
can be reduced 20% after moving from single-CU to 5 CUs deployments, but this
improvement decreases as we add more CUs. However, this benefit can be improved
continuously by adding more CUs for topology N2. Fig. 2.5 (c) and (d) show
total bandwidth usage with respect to CR for different k values. In the same way
as before, total bandwidth usage increases dramatically when the CR is increased
from 0.3. In addition, increasing the k-value will also increase bandwidth usage

39
since multiple-path routing uses alternative longer paths. Nevertheless, Fig. 2.4
illustrated how multiple-path routing could better balance the load across midhaul
links.

3.7 Summary
In this chapter, the optimization of VNF splitting in O-RAN was tackled, addressing
a significant gap in the literature. A standard-compatible model was employed to
present an optimization method that assigned RUs to CUs and identified optimal
split points for network functions. The objective was to balance the load between
CUs and midhaul links while considering delay requirements. The centralization
ratio (CR) was defined as the ratio of total load on CUs to the total computing loads
for performance evaluation. An analysis of different CR deployments demonstrated
that multi-CUs could reduce the total bandwidth usage by up to 20%. Furthermore,
it was found that multipath routing improved the result of load balancing between
midhaul links, albeit with an increase in bandwidth usage. The network constraints
outlined in this chapter serve as a fundamental reference for the following chapters;
nevertheless, the upcoming objectives will diverge from these constraints.

40
Chapter 4

Robust Dynamic VNF Splitting in


O-RAN

4.1 Overview
In the previous chapter, we address the load-balancing problem using a heuristic
approach. Now, in this chapter, our focus shifts towards tackling the challenge of
dynamic traffic conditions in real-life O-RAN environments. To achieve this, we
propose a novel VNF splitting solution by leveraging DRL, which has shown great
promise in the field of AI/ML.
Our proposed approach involves formulating a multi-objective optimization prob-
lem, aiming to simultaneously minimize VNF computational costs and the overhead
of periodical reconfigurations. This optimization considers crucial network con-
straints such as delay, bandwidth, and the computing capacity of DUs/CUs. To
effectively solve this problem, we employ constrained combinatorial optimization
with DRL, where an agent learns to minimize a penalized cost function based on
our proposed optimization framework.
During our evaluation, we analyze the impact of centralizing computing re-
sources, represented by the distribution of computing capacity across DUs/CUs,
on network performance. Additionally, we experiment with varying the weights of
different terms in the multi-objective optimization problem. The results of our per-
formance evaluation showcase significant improvements with our solution. Notably,
we observe a remarkable reduction of up to 76% in the overhead of VNF reconfig-
urations, despite a slight increase of up to 23% in computational costs. Moreover,
our solution outperforms the most robust O-RAN system, C-RAN, by offering up to
76% savings in bandwidth while demonstrating up to 27% overprovisioning of CPU.

41
In conclusion, our chapter contributes a powerful and efficient solution for dy-
namic VNF splitting in O-RAN environments through the integration of DRL and
multi-objective optimization. The demonstrated improvements in resource utiliza-
tion and network performance underscore the potential of our approach in advancing
the field of AI-driven network management.

4.2 Motivation
The development and optimization of VNF in O-RAN systems involve diverse split
points between O-CU and O-DU functions. Various papers [2, 4, 73, 74], investigate
different split point options’ impacts on network performance, energy consumption,
and resource utilization. These studies underscore the criticality of optimizing split
points in open RAN systems to enhance efficiency and overall system performance.
Additionally, recent research in O-RAN optimization, as exemplified by papers such
as [6, 40, 53, 56, 76–78], delves into various aspects, including energy efficiency, load
balancing, network slicing, and resource allocation, highlighting the need for system
enhancements across multiple fronts.
While traditional optimization methods rely on predetermined rules, recent fo-
cus has turned toward machine learning techniques, particularly DRL, which utilizes
experience and data to learn optimal solutions. DRL demonstrates potential in com-
plex and dynamic scenarios where conventional methods may fall short. Notably,
DRL applications in RAN environments, as evident in studies like [51–56], indi-
cate its efficacy in traffic engineering, resource allocation, and energy consumption.
While traditional methods suffice for simpler problems, DRL emerges as a promising
approach for more intricate and dynamic challenges within RAN environments.
Overall, there exists a gap in the existing research on VNF splitting or placement
optimization, as the current solutions do not address the specific needs of dynamic
traffic conditions. In dynamic traffic scenarios, the necessity of frequent network
reconfigurations introduces additional costs to the O-RAN system, which must be
considered in the optimization objectives. Therefore, we require a solution that can
minimize the frequency of VNF reconfigurations under varying traffic conditions and
ensure stable network performance. To achieve this, the proposed solution should
leverage DRL and advanced neural network architectures to effectively manage re-
sources on-demand and handle the complexities of dynamic traffic conditions. This
gap motivates this chapter, where we propose a generic system model and AI-based
solution to address the challenges of RAN disaggregation while taking into account

42
network constraints.

4.3 System Design


The VNF split options in this chapter are the same as in Chapter 3 for each split
requirement and are shown again in Table 4.1.

Table 4.1: Performance gains of different splits [4]


Split VNF in DU VNF in CU UL BW (Mbps) Delay (ms) ρdp ρcp
1 f1 → f2 → f3 None λ 30 ρ1 + ρ2 + ρ3 0
2 f1 → f2 f3 λ 30 ρ1 + ρ2 ρ3
3 f1 f2 → f3 1.02λ + 1.5 2 ρ1 ρ2 + ρ3
4 None f1 → f2 → f3 2500 0.25 0 ρ1 + ρ2 + ρ3

In O-RAN, the RIC is a software-defined controller that is logically centralized.


The RIC is responsible for hosting network applications and O-RAN nodes are
connected to it. The RIC is divided into two parts Non-Real-Time RIC (Non-RT
RIC) and Near Real-Time RIC (Near-RT RIC). The Non-RT RIC manages network
operations that have control loops longer than one second, while the Near-RT RIC
manages tasks that take between 10 milliseconds and one second. Applications of
Non-RT RIC and Near-RT RICs are called rApps and xApps, respectively. These
applications provide advanced features and functions to manage RANs. To optimize
O-RAN, the Non-RT RIC provides policy-based guidance and management models
to the Near-RT RIC. RAN optimization policies in Non-RT RICs are specified using
data analytics, AI, and ML techniques. The Service Management and Orchestration
(SMO) platform provides data collection and provisioning services for disaggregated
nodes. The Non-RT RIC and Near-RT RIC are connected to all RAN nodes via O1
and E2 interfaces, respectively.

4.3.1 Problem Statement


In our O-RAN system, the VNFs are placed in DU/CU per RU basis. This placement
strategy allows for efficient resource allocation and distribution among the network
elements. However, under dynamic traffic conditions at RUs, it is possible that the
processing capacity of the VNF may reach their limits. In such scenarios, it may
become necessary to perform VNF reconfigurations between DUs and CUs to ensure
smooth network operation. Adaptive network reconfiguration is a technique that can
help reduce network overprovisioning by improving the utilization of computing and
bandwidth resources in the O-RAN system. However, it should be noted that the

43
reconfiguration process itself incurs additional overhead costs and may lead to traffic
instability. The proposed solution in this chapter aims to minimize the overhead of
VNF reconfigurations and the total computational cost by identifying an optimal
VNF placement strategy that can adapt to varying traffic conditions. By taking into
account the processing capacity of VNF and the traffic load, the proposed solution
can dynamically adjust the VNF placement to ensure efficient resource utilization
and stable network performance.
The RIC plays a crucial role in the O-RAN system as it is responsible for network
configuration and reconfiguration. It functions as a logically centralized software-
defined component that controls and optimizes various O-RAN functions. The Non-
RT RIC’s primary objective is to provide policy-based guidance to optimize RAN
functions and support AI/ML models for near-RT RIC functions. Fig. 4.1 shows
that the Non-RT RIC has already been implemented to continuously monitor the
network. The RIC hosts rApps, which execute ML algorithms to provide policies
for network configuration and reconfiguration. Essentially, the rApp stores and
retrieves trained AI/ML models from the Non-RT RIC. This chapter uses a dataset
that includes traffic demand at RUs for different time slots of a single day. The data
collection procedure involves periodic transmission of data from the RUs to the ML
training hosts in the Non-RT RIC through the O1 interface. The FH interface is
responsible for sending the data, and the details of the data collection procedure are
provided in [102], although they are not part of this study. The VNF reconfigurations
are done by VNF migration to the target O-RAN node (i.e., DU). The O1 interface
enables the interworking between the rApp/Non-RT RIC and O-RAN nodes.

4.4 Problem Formulation


4.4.1 Network Model
We model the O-RAN system as an undirected graph G = (I, E), where I represents
the set of network nodes, and E represents the set of network edges. The network
nodes are RUs, DUs, CUs, and routers. The core nodes, which include the LTE
(EPC) and the 5G (5GNC), are not part of the nodes under this study. As shown
in Fig. 4.1, each DU serves multiple RUs connected by fronthaul (FH) links. Each
DU is connected to a router, and then the routers connect the DUs to a single
CU through midhaul (MH) links. The CU has a direct interface to the core nodes
(EPC and 5GNC). In the network graph, the set of RUs, DUs, and routers are

44
Figure 4.1: The proposed O-RAN architecture

defined as M, N , and R, respectively. Furthermore, we define M = |M|, N = |N |,


and R = |R| to describe the size of each set. The graph edges are midhaul (MH)
links, and Cij denotes the capacity of link (i, j). Path Pm with end-to-end delay
dm connects RU-m to the CU. Hn and H represent the computational capacity
(cycles/s) of DU-n and the single CU where H ≫ Hn . The quantities ρ1, ρ2, and
ρ3 are the computational load (cycles/Mbps) for VNF f 1, f 2, and f 3, respectively,
per traffic unit. ρcp and ρdp denote the computational load caused by split p ∈ P in
CU and DU, respectively, and are given in Table 4.1. Each VNF is deployed in DUs
or the CU as a Virtual Machine (VM). Using this model, all network traffic flows
are taken into account, regardless of the core as source (downlink) or destination
(uplink). This chapter derives the uplink case without losing generality. In this case,
the O-RAN system needs to accommodate VNF of all traffic flows from RUs to the
core network. It is noticed that the traffic flows are the aggregated traffic demands
at RUs. A summary of the network model parameters used in our study is given in
Table 4.2.
In the following sections, to formulate a robust VNF placement, we assume that
we have a dataset that includes traffic demand at RUs. The traffic demands are
split into T time slots based on the traffic dynamicity. Consequently, the goal is to
define VNF placement for each time slot τ ∈ T to minimize the total computational
cost of VNF placement as well as the number of VNF reconfigurations.

45
Table 4.2: Summary of network model notations

Notation Definition
M, M Set of RUs, Number of RUs
N,N Set of DUs, Number of DUs
Mn Set of RUs connected to DU-n
R Set routers
Cij Capacity of link (i, j)
Pm The path from RU-m to the CU
dm End-to-end delay of path Pm
ρcp , ρdp computational load of CU and DU for split p
ρ1, ρ2, ρ3 computational load of f 1, f 2, and f 3
Hn , H computational capacity of DU-n and the CU
λm Traffic demand at RU-m (Mbps)
T = {1, 2, . . . , T } Set of time slots
P = {1, 2, 3, 4} Set of splits

4.4.2 VNF Placement


As mentioned earlier, the VNFs are placed in DUs and the CU per RU. According
to Table 4.1, for RU-m, a set of VNF is assumed, and binary indicator variable
xτp,m ∈ {0, 1}, where ∀p ∈ P, ∀τ ∈ T is used to specify which split is selected at time
slot τ based on Table 4.1.

X
xτp,m = 1, ∀m ∈ M, τ ∈ T (4.1)
p∈P

The constraints (4.1) ensure that each RU can only have one split at time slot τ .
Next, we specify the constraints for VNF in DUs/CU. We first note that the total
computational load of VNF placed in each DU should not exceed its computational
capacity. That is

X X
λτm xτp,m ρdp ≤ Hn , ∀n ∈ N , τ ∈ T (4.2)
m∈Mn p∈P

where ρdp is the computational load of RU-m per traffic unit (Mbps) caused by split
p in DU-n. The traffic demand of RU-m at time slot τ is represented by λτm , and
Hn is the computational capacity of DU-n. Furthermore, the total computational
load of VNF placed in the CU should not exceed its computational capacity, hence:

X X
λτm xτp,m ρcp ≤ H, τ ∈T (4.3)
m∈M p∈P

46
where ρcp is the computational load of RU-m per traffic unit (Mbps) caused by split
p in the CU at time slot τ , and H is the computational capacity of the CU.

4.4.3 Bandwidth Demand


Assuming RU-m is connected to CU via path Pm (e.g., shortest path), the bandwidth
demand for RU-m at time slot τ , denoted as bτm , depends on the selected split. Based
on Table 4.1, bτm is calculated as follow:

bτm =λτm (xτ1,m + xτ2,m ) + xτ3,m (1.02 λτm + 1.5)


(4.4)
+ 2500 xτ4,m , ∀m ∈ M, τ ∈ T

In determining the routing decision, the link capacity is taken into account as

X
bτm IPijm ≤ Cij , ∀ (i, j) ∈ E, τ ∈ T (4.5)
m∈M

where IPijm ∈ {0, 1} indicates whether link (i, j) is used by path Pm . This constraint
ensures that the bandwidth usage of link (i, j) remains within its capacity.

4.4.4 End-to-End Delay


Let dτm be the end-to-end delay of path Pm from RU-m to the CU. It should be less
than the maximum required delay for the selected split p, given in Table 4.1. This
gives:

X
dτm ≤ xτp,m dp , ∀m ∈ M, τ ∈ T (4.6)
p∈P

where dp is given in Table 4.1.

4.4.5 Overall Optimization Objective


In this chapter, we define a specific O-RAN configuration instant at time slot τ as

χτ = xτp,m . The set of configurations across all time slots over a given period
of operation is denoted as X = {χτ }. The cost of configuration for all time slots
is modeled by function f (X ), which includes the sum of the computational cost of
DUs and CU. Additionally, the network system incurs VNF reconfiguration overhead

47
which is defined as h (X ). The total computational costs for DU-n and CU as a result
of configuration χτ are defined as:

X X
Dn (χτ ) = βn λτm xτp,m ρdp , τ ∈ T , ∀n ∈ N (4.7)
m∈Mn p∈P

X X
C (χτ ) = β0 λτm xτp,m ρcp , τ ∈ T (4.8)
m∈M p∈P

where βn and β0 are the computing cost (monetary units/cycle) at DU-n, and CU [4].
As a result, f (X ) is:
X X
f (X ) = ( Dn (χτ ) + C (χτ )) (4.9)
τ ∈T n∈N

The number of VNF reconfigurations at time slot τ compared to τ − 1 is counted


at both CU and DU, hence:

X X
−1
||χτ − χτ −1 || = 2 p(xτp,m − xτp,m ) ,
m∈M p∈P (4.10)
τ ∈T, p∈P

We define the VNF reconfigurations overhead as:

T
X
h (X ) = ||χτ − χτ −1 || (4.11)
τ =2

Thus, the objective is to minimize total cost:

min (αf (χ)) + (1 − α) h (X )) , s.t. (4.1)–(4.6) (4.12)


χτ ∈ X

We notice that the normalization of f (χτ ) and h (X ) are used in (4.12). The
parameter α where 0 < α < 1 is a weight constant of the computational cost
compared to the reconfiguration cost, and the setting of α can be based on specific
network management policies. To summarize the objective, we define problem P as:

P : min F (x) , s.t. (4.1)–(4.6) (4.13)


x ∈ X

Problem P is a combinatorial problem describing the placement of VNF in the


O-RAN system by considering the computational cost and the number of reconfig-

48
urations as a metric for robustness.

4.4.6 NP-hardness of the Proposed Problem


We shall now apply the polynomial reduction method to analyze the complexity
of problem P using the multi-choice multidimensional knapsack problem (MMKP)
as a well-known NP-hard problem. We first revisit the Knapsack problem in the
following.
The Knapsack problem: Assume a set of N items (N = {1, 2, . . . , n}), and
item i has a weight wi and value vi . The objective of the 0-1 knapsack problem is to
P
select the items xi ∈ {0, 1} , ∀i ∈ N that maximize total values i xi vi subject to
P
i xi wi ≤ R, where R is the maximum weight capacity. The 0-1 knapsack problem
is an NP-hard problem. In [103], a reduced pseudo-polynomial algorithm based on
dynamic programming is proposed with a complexity of O(nR) to derive a solution.
MMKP is a variant of the 0–1 knapsack problem. There are M groups of items
in the MMKP, and group i has li items. Item j from group i has value vij , and
its weight is given by a D-dimensional vector Wij = (wij1 , wij2 , . . . , wijD ), and the
knapsack has a D-dimensional capacity vector R = (r1 , r2 , . . . , rD ). The MMKP aim
is to pick exactly one item from each group, which means lj=1
Pi
xij = 1, xij ∈ {0, 1},
PM Pli
in order to maximize total values given by i=1 j=1 xij vij , subject to the weight
capacity constraints given by M
P Pli
i=1 j=1 xij wijd <rd , d ∈ {0, 1, ..., D}.
It is computationally harder to solve MMKP than the 0–1 knapsack problem.
The search space for a solution to MMKP is smaller than that for the 0–1 knapsack
problem, but exact algorithms are not applicable to many practical situations. This
is due to the limited selection of items from a group in MMKP [103]. Now, we shall
illustrate that our problem P is harder than MMKP.
Consider an unlimited midhaul link capacity without any delay requirements. In
this case, all paths between the DUs and CU will be eligible since constraints (6)
and (7) are always satisfied. It is then possible to map this problem to MMKP by
setting the following:

• M groups to M RUs

• Group i with li items to RU-m with four splits (Table 4.1)

• Item j from group i to split p for RU-m

• Wij to the computational loads on DU and CU

49
• D equals (N + 1) × T

• The knapsack capacity R = {r1 , r2 , . . . , rD } to computational capacities of


DUs and CU

• The value vij of item j from group i to the cost of VNF placement as well as
reconfiguration cost of split p for RU-m

• Maximization to minimization in the objective

As problem P also imposes constraints (6)-(7), it becomes harder than MMKP.


Therefore, if problem P can be solved in polynomial time, so as MMKP.
Problem P is a combinatorial optimization problem and in the subsequent sec-
tion we are going to use DRL to solve it. Choosing a combinatorial optimization
problem stems from its ability to navigate complex decision spaces and adapt to
dynamic environments. Combinatorial problems involve a vast search space with
numerous possible combinations, making it computationally challenging to find the
best solution. DRL offers a promising approach as it can autonomously learn op-
timal strategies through interactions with the problem environment. Its capacity
to handle high-dimensional spaces and intricate relationships between variables en-
ables it to effectively explore these complex decision spaces. Unlike traditional
methods, DRL doesn’t rely heavily on pre-defined heuristics, allowing it to adapt
and learn over time, improving its decision-making process and eventually finding
near-optimal or optimal solutions. Moreover, its adaptability to changing conditions
makes it suitable for scenarios where solutions might vary due to evolving constraints
or dynamic parameters, making DRL a compelling choice for tackling combinatorial
optimization problems.

4.5 Policy Optimization with Neural Network


This section addresses the robust VNF placement problem using the neural combina-
torial optimization paradigm. To infer the VNF placement policy, we leverage deep
neural network. The propose deep neural network is sequence-to-sequence model.
The sequence-to-sequence model is a powerful model for prediction and classifica-
tion with remarkable success [104]. The learning process is described as follows.
The agent first receives an input state s = (m, t) ∈ M × T that points RU-m at
time slot τ . It then chooses an output action as , as ∈ P, indicating VNF placement

50
according to Table 4.1. In this case, the neural network with parameter θp infers
the policy of VNF placement, πθp (as |s), for each RU in each time slot.

Figure 4.2: The proposed deep reinforcement learning architecture overview


In Fig. 3, we illustrate the proposed deep RL architecture for robust VNF place-
ment. Assume that the agent wishes to take an action for RU-3 at time slot 1,
which corresponds to state s3 = (3, 1). To do this, a state vector that embeds the
environment state (i.e., previously allocated RUs, s1 , and s2 , as well as the requested
placement of the VNF (s3 , depicted in red color) is created and then passed to the
agent. The agent generates an action for the corresponding RU indicating VNF
placement. In order to determine the quality of the placement decision (or action),
the environment uses problem formulation in Section 5.4 to compute a feedback
signal as a reward. The environment is described by problem P. The agent sees
the environment as a black box, meaning that the agent interacts with the environ-
ment and finds an optimal policy that solves problem P without understanding the
underlying system.
It is impossible to apply the neural combinatorial optimization to the constrained
optimization problems directly because the cost function cannot provide sufficient
reward. It is essential to deal with constraint dissatisfaction to develop an interaction
between the agent and the environment that enables the learning process. In this
chapter, we apply constraint relaxation techniques to the cost function of policy

51
gradient method in order to receive enough positive rewards. In summary, the
reward signal in this work includes the cost function (objective) of problem P and
the degree of constraint dissatisfaction. The degree of constraint dissatisfaction is
calculated using the Lagrange relaxation technique. Details are discussed later in
this section.

4.5.1 Neural Network Architecture


Our proposed solution is composed of two networks, as shown in Fig. 3: a policy
network and an auxiliary network. The policy network with weight θp generates
actions according to current states. The policy network includes the sequence-
to-sequence model and a fully connected layer. The sequence-to-sequence model
extracts action-related features from the state s. The fully connected layer takes
the outputs of the sequence-to-sequence model as input and maps them to the
probability of each action as ∈ P by SoftMax function. The auxiliary network with
weight θc has the same structure as the policy network. However, only one neuron
is used with weight θc to generate value-related features sent to the fully connected
layer to calculate the baseline.
To describe the sequence-to-sequence model, first, we need to explain LSTM
architecture. The LSTM is used in sequence prediction problems to learn long-term
dependencies [105]. In LSTM, a memory unit is proposed to replace the traditional
artificial neurons in the network’s hidden layer. As shown in Fig. 3, the memory unit
consists of a forget gate, an input gate, and an output gate. Forget gate determines
which data is forgotten and which will be used in the next step. The input gate
determines what information can be added to the memory cell based on the previous
output and current state. The output gate also determines the next hidden state.
Here is the relationship of different parts shown in Fig. 3 at time step t:

ft = σ(Wf × xt + Uf × ht−1 + bf )
it = σ(Wi × xt + Ui × ht−1 + bi )
ot = σ (Wo × xt + Uo × ht−1 + bo )
(4.14)
ĉt = tanh(Wc × xt + Uc × ht−1 + bc )
ct = σ (ft ⊙ ct−1 + it ⊙ ĉt )
ht = tanh (ct ) ⊙ ot

where σ (x) = 1+ 1e−x is the sigmoid function, and the operator ⊙ is the Hadamard
product. W ∈ Rh×d , U ∈ Rh×h and b ∈ Rh are weight matrices and bias vectors,

52
respectively, that are learned during the training. The quantities d and h are the
number of input features and the number of hidden units, respectively. As shown
in Fig. 3, the sequence-to-sequence model is performed by two LSTM cells named
encoder and decoder. The final states of encoder are passed to the initial states of
decoder.

4.5.2 Policy Gradient Optimization with Baseline


In this section, our objective is to utilize Policy Gradients to iteratively learn the pa-
rameters of the stochastic policy πθp (as |s). The goal of this policy is to assign higher
probabilities to the split ap ∈ P at a reduced cost, focusing on efficient resource al-
location in network tasks. The selection probability of action as is contingent upon
the preceding action set a(<m,t) , owing to constraints on limited and shared network
resources as defined in Eq. (4.13). For the sake of simplicity, let’s denote the state
by s = (m, t). While the state currently simplifies to s = (m, t), incorporating
multiple states in the environment would involve integrating the traffic demand for
RU-m during the time slot τ alongside the environment’s state. Consequently, the
state input for the agent will encompass these combined aspects [57]. Henceforth,
to streamline discussions, as will be represented as a for clarity and conciseness.

M
Y 
πθp (a|s) = πθp am a(<m,t) , s (4.15)
m=1

We follow the framework given in [106] to estimate the parameters of the model
and derive the optimal policy using policy gradients. Using the reward function.
The reward function is a result that is returned from applying an action to the
environment. This function is influenced by both the environment and the neural
network model. In addition, this function is defined for all possible policies. To
determine this function, we start with the total expected cost F given the VNF
placement for state s:

JFπ (θp |s) = E [F (a)] (4.16)


a ∼ πθp (.|s)

Thus, the total expected cost for all RUs in all time slots is:

JFπ (θp ) = E [JFπ (θp |s)] (4.17)


s ∼ M×T

Furthermore, problem P has several constraints. It means that the expectation

53
of constraint dissatisfaction needs to be considered:

JCπ (θp ) = E [JCπ (θp |s)] (4.18)


s ∼ M×T

where function JCπ (θp ) is defined for each constraint dissatisfaction. JCπ (θp ) is a
signal returned by the environment. This signal depends on the dissatisfaction of
DUs/CU computational capacity, end-to-end delay, and bandwidth in problem P.
As a result, the problem is to find the policy that minimizes the expected cost of
VNF placement while satisfying all constraints:

min JFπ (θp ) s.t. JCπi (θp ) ≤ 0 (4.19)


π ∈ Π

In the next step, we rely on the Lagrange relaxation [106, 107] method to make
problem P unconstrained. In the new unconstrained problem, a penalty is imposed
on infeasible solutions as follows:

g (µ) = min JLπ (µ, θp )


θp
" #
X
= min JFπ (θp ) + µi . JCπi (θp ) (4.20)
θp
i

= min JFπ (θp ) + Jξπ (θp )


 
θp

where g(µ), JLπ (µ, θp ) and µi are the Lagrangian dual function, Lagrangian objective
function and Lagrange multipliers (penalty coefficients), respectively. The term
Jξπ (θp ) is used as the expectation of the penalties and is equal to the sum of all
constraint dissatisfaction signals weighted by µi .
While the Lagrangian dual function is always convex, the primal function is not
needed to be convex [107]. The dual problem finds a lower bound on a maximization
problem. Hence, the objective is defined as the Lagrange coefficients that give the
best lower bound:

max g (µ) = max min JLπ (µ, θp ) (4.21)


µ µ θp

In this chapter, like [106] coefficient µi is chosen manually from [0, µmax ]. It is
also possible to obtain the optimal value using other alternatives, such as a multi-
time scale learning approach [108]. µ = 0 means that the agent ignores the penalty,
while the agent will only pay attention to the penalty for µ → ∞. In this case, the

54
Lagrangian function would be JLπ (θp ), which specifies the quality of the policy.
In the following, we rely on Monte-Carlo policy gradients and stochastic gradient
descent to calculate the weights θp to minimize JLπ (θp ). Weights θp is updated as
follows:

θp k+1 = θp k − η.∇θpk JLπ



θp k (4.22)

where η is the learning rate. The gradient of the objective function with respect to
weights θp can be given by the log-likelihood technique:

∇θp JLπ (θp ) =


 
E L (a|s) .∇θp log πθp (a|s) (4.23)
a ∼ πθp (.|s)

We use L (a|s) to represent total cost as well as constraint dissatisfaction signals:

L (a|s) = F (a|s) + ξ (a|s)


X (4.24)
= F (a|s) + µi . Ci (a|s)
i

Finally, Monte-Carlo sampling is used to estimate ∇θp JLπ (θp ). The B sample
RUs are s1 , s2 , s3 . . . , sB ∼ M × T . With a baseline estimator [57], b(s), we can de-
cline gradient variance, thereby accelerating convergence. To encourage exploration
and prevent premature convergence, we have added entropy regularization to the
objective of the policy network:

∇θp JLπ (θp ) ≈


B
1 X
(L (aj |sj ) − bθc (sj )) .∇θp log πθp (a|sj ) (4.25)
B j= 1

+ βH πθp (aj |sj )

where β corresponds to the weight assigned to the action entropy. We use a state-
dependent baseline given by the auxiliary network with weights θc . The auxiliary
network provides an estimation of the expected penalized cost bθc (aj ). It is trained
with stochastic gradient descent on the mean squared error objective between its
predictions bθc (aj ) and the actual penalized cost L (a|s), which is given by the
environment. Consequently, the auxiliary network objective is formulated as follows:

55
Algorithm 3 Training of the proposed deep RL approach
1: Input: Train Agent, No. of epochs: K, Batch size: B, Learning Set: V , S =
{1, 2, . . . , B}
2: Initialize: Policy and auxiliary networks with random weights
3: for k = 1, . . . K do
4: Reset gradient: dθ ← 0
5: sj ∼ M × T Sample Input for j ∈ S
6: aj ∼ P Sample Output for j ∈ S
7: bj ← bθc (sj ) for j ∈ S
8: Calculate L(aj ) for j ∈ S
g (θp ) ← B1 B
P
9: i= 1 (L(aj ) − b(sj )) .∇θp log πθp (aj |sj )
10: L (θc ) ← B i= 1 ||bθc (sj ) − L (aj |sj ) ||2
1 PB

11: θp ← Adam(θp , g (θp ))


12: θc ← Adam(θc , L (θc ))
13: end for
14: return θp , θc

B
1 X
L (θc ) = ||bθc (sj ) − L (aj |sj ) ||2 (4.26)
B j= 1

In Algorithm 3, we present our algorithm demonstrating the summary of K


epochs training according to a single time-step Monte-Carlo policy gradient with a
baseline estimator.

4.6 Performance Evaluation


This section evaluates the proposed algorithms under different traffic patterns over a
real-world network topology [53]. The simulation setup, model training, and various
scenarios are discussed in the upcoming subsections.

4.6.1 Simulation Setup


For the experiment, we adopted the network topology presented in [53]. This topol-
ogy considers a 14×13 km2 area covering the city center of Bristol, UK, as a potential
O-RAN deployment site, which contains 100 sites, including 4G and 5G base sta-
tions, and 11 sites allocated for edge computing centers. The map of this area
and the site information are shown in Fig. 4.3a. In our O-RAN system model, we
consider 100 sites as RUs, with 10 DUs and a single CU allocated to the 11 edge
computing centers. On average, each DU serves 10 RUs, and each DU is connected

56
Table 4.3: Hyperparameters

Parameter Value
Learning Rate 0.0001
Bath Size 32
LSTM hidden Size 128
Clip norm 0.8
γ 0.99
Lagrange multiplier 10

to one of the routers. We placed five routers in the town center to connect DUs to
the single CU. Midhaul links, which have more capacity than other links, including
fronthaul links, connect the routers to the CU. In [53], the CPU usage at RUs is
monitored, which we map to traffic demand between 40 and 120 Mbps without loss
of generality. Fig. 4.3b shows the hourly average traffic demand of five randomly
selected RUs from the map, which we split into five different time slots. The peak
traffic demand occurs during time slot four, and the first and second time slots
represent off-peak hours.
We use an Intel Haswell i7-4770 3.40GHz CPU for both DUs and the CU,
as in [109]. For each split p ∈ P, ρdp = {0.005, 0.004, 0.00325, 0} and ρcp =
{0, 0.001, 0.00175, 0.005} per traffic unit (Mbps) in each DU and CU, respectively.
We calculate end-to-end delay by ignoring fronthaul links delay and using the stan-
dard store-and-forward model, which includes transmission, propagation, and com-
putational delay, given by 12000/Cij , 4µ secs/Km, and 5µsecs, respectively. The
midhaul link capacity varies up to 100 Gbps, and we use the shortest path for rout-
ing. The computational cost at CU is much less than DU, with β0 = 0.017βn [4].
We set the parameter α in Eq. (5.15) to 0.5. The learning rate for the policy and
auxiliary networks is 0.0001, and γ = 0.99. The sequence-to-sequence model has
two LSTM layers, and the size of the hidden state in LSTM is set to 32. We use the
Adam algorithm to update the weights of the neural network, with a batch size of
128 and a clip norm for the gradient is 0.8. The Lagrangian coefficients are set to
10. The hyperparameters are given in Table. 4.3.

4.6.2 Model Training


As the first step, we tested the learning process of the proposed DRL model with
different occupancy ratios. Fig. 4.4a demonstrates the agent’s training with suffi-
cient resources, while Fig. 4.4b illustrates agent training with constrained resources

57
(a)

(b)

Figure 4.3: Real case study (a) possible area for O-RAN deployment in Bristol, UK,
and, (b) traffic demands of five 4G sites (RUs) in 24 hours.

in the O-CU and O-DUs. Both normal and worst-case scenarios are analyzed to
effectively demonstrate the reliability and impact of the proposed solution in these
two cases. For each epoch, Fig. 4.4a and 4.4b present the total O-RAN cost (JFπ ),
penalization cost (Jξπ ), and Lagrangian cost (JLπ ). Initially, the agent generates ran-
dom actions that do not satisfy various constraints, leading to a high penalization
cost. Consequently, the agent initially prioritizes constraint satisfaction by reducing
penalization cost, while ignoring the total O-RAN cost. As the number of epochs
increases, the agent employs stochastic gradient descent to adjust its neural network
weights for minimizing the Lagrangian cost function. Through iterative repetition

58
2000
1400
Total O-RAN cost Total O-RAN cost
1200 Penalization cost 1750
Penalization cost
1000
Lagrangian cost 1500 Lagrangian cost
1250

Cost

Cost
800
1000
600
750
400
500

200 250

0 0
0 250 500 750 1000 1250 1500 1750 2000 0 250 500 750 1000 1250 1500 1750 2000
Epoch Epoch

(a) (b)

Figure 4.4: Training of the proposed deep RL for studied topology with (a) sufficient
available resources and (b) constrained resources

of this process, the agent effectively reduces the total O-RAN cost, indicating policy
improvement.
As depicted in Fig. 4.4a, the penalization cost nearly vanishes upon completion of
the training. In this case, the agent successfully deduced the desired policy from the
provided information. Consequently, the probability of receiving a placement that
violates constraints is minimal. However, if such an event occurs, the constraint sat-
isfaction might experience slight compromise, resulting in marginal dissatisfaction.
Contrastingly, in the scenario depicted in Fig. 4.4b, where resources are constrained,
a significant increase in the likelihood of constraint dissatisfaction is observed. As
indicated by Fig. 4.4b, the penalty cost becomes comparable to the total O-RAN
cost as the agent nears the end of training. This suggests that the agent’s focus has
shifted towards minimizing penalties arising from constraint dissatisfaction. This
means that the agent agent prioritizes accommodating demands which is a strategic
approach that proves optimal in this scenario.

4.6.3 Comparison Benchmarks


In this part, we compare our proposed solution with two widely-used RL methods,
Deep Q-Network (DQN) and simple Actor-Critic (AC). We evaluate our proposed
solution using two different neural network architectures: LSTM and the sequence-
to-sequence model. To enable a more meaningful comparison, we also employed the
sequence-to-sequence model as the neural network architecture for DQN and AC. As
depicted in Fig. 4.5(a), our proposed solution outperforms DQN and AC in terms of
total O-RAN cost. Moreover, while both DQN and AC fail to completely eliminate
the penalization cost, our proposed solution with two neural network architectures
successfully eliminates it, thanks to the incorporation of entropy regularization in

59
Prop. sol. with LSTM Prop. sol. with Seq2Seq. AC with Seq2Seq. DQN with Seq2Seq.

1500
1200

Penalization Cost
200
Total O-RAN Cost

Lagrangian Cost
1250
1000
150 800 1000

600 750
100
400 500
50
200 250
0 0 0
0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000
Epoch Epoch Epoch

(a) (b) (c)

Figure 4.5: Comparison of our proposed solution with two commonly used methods,
DQN and AC, based on three criteria over 2000 epochs: (a) total O-RAN cost, (b)
penalization cost, and (c) Lagrangian cost.

our proposed solution. Furthermore, employing the sequence-to-sequence model


instead of LSTM leads to faster convergence and better results with lower costs. For
instance, in Fig. 4.5(b), the penalization cost almost disappears after 750 epochs with
the sequence-to-sequence model, compared to 1250 epochs with LSTM, resulting in
a 40% improvement. This improvement can be attributed to the fact that the
sequence-to-sequence model is more capable of capturing the underlying sequential
patterns. In summary, our proposed solution with the sequence-to-sequence model
outperforms the other methods, with the ranking of methods from best to worst
being: our proposed solution with the sequence-to-sequence model, LSTM, AC, and
DQN.

4.6.4 Constraint Dissatisfaction


Previously, it is discussed that the neural network infers a policy to minimize the
objective function. As shown in Fig. 4.5, when the infrastructure has sufficient re-
sources, the neural network can obtain VNF placement that satisfies all constraints.
There is a pattern to how these constraints are dissatisfied, in that it is more likely
to dissatisfy a variable by little rather than by a lot. However, it cannot be predicted
which constraints will be dissatisfied.
Fig. 4.1 illustrates that the CU is deployed in the cloud, making it easier to
modify the number of CPUs in the CU compared to the DUs. In our experiments,
we kept the number of CPUs in each DU fixed at two and increased the number of

60
Time Slot 1 Time Slot 2 Time Slot 3 Time Slot 4 Time Slot 5 70
0 0 0 0 0 60

Number of CPUs in O-CU


8 8 8 8 8 50
40
16 16 16 16 16
30
24 24 24 24 24 20
10
32 32 32 32 32
30 60 100 30 60 100 30 60 100 30 60 100 30 60 100 0
Bandwidth

Figure 4.6: The heat map displays the number of constraint dissatisfactions in the
studied topology during five different time slots.

CPUs in the CU until all constraints were satisfied. As described in Section 4.6.2, the
disappearance of the penalization cost during training indicates that all constraints
have been satisfied. Fig. 6.5 presents a heat map depicting the number of constraint
dissatisfaction as a function of the bandwidth and the number of CPUs in the CU.
Specifically, for time slots one and five with low traffic demands, a minimum of 16
CPUs in the CU and 60 GB bandwidth are required to accommodate all traffic
flows. Time slots two and three, with medium traffic demands, require 22 and 24
CPUs in the CU, respectively, and 100 GB bandwidth. Finally, for time slot four
with the highest demand, 32 CPUs and 100 GB bandwidth are required to satisfy
all constraints.
To compare our proposed solution with a C-RAN scenario with a single CU, we
consider time slot four as an example. In C-RAN, we calculate the total number
of required CPUs for time slot τ from (ρ1 + ρ2 + ρ3) m∈M λτm , which is equal
P

to ⌈46.6⌉ = 47 for time slot four. Furthermore, in time slot four, we calculated
that a total of 52 CPUs are needed to satisfy all constraints. This includes 32
CPUs in the CU and 2 CPUs in each DU, based on our constraint dissatisfaction
analysis as shown in Fig. 6.5. In this case, our solution showed a 52−47 47
≈ 10%
overprovisioning of CPUs. When both CU and DUs are deployed, it is important
to note that the overprovisioning of CPUs occurs because the load of each VNF
cannot be distributed between DUs and CU, making it impossible to utilize 100%
of the capacity of both CU and DUs. Additionally, in C-RAN, each RU requires
2.5 GB bandwidth, regardless of the demand in the C-RAN architecture (as shown
in Table 4.1). Therefore, for the studied topology with 100 RUs, C-RAN would
require 250 GB bandwidth on midhaul links. However, our constraint dissatisfaction
analysis showed that only 100 GB bandwidth is needed in our solution, resulting

61
Table 4.4: CPU overprovisioning (OVP) vs bandwidth (BW) saving
2*Time Slot No. of CPUs BW 2*CPU OVP 2*BW Saving
C-RAN Prop. Sol. C-RAN Prop. Sol.
1 21.2 ≈ 22 28 250 60 27% 76%
2 38.6 ≈ 39 42 250 100 7% 60%
3 41.5 ≈ 42 44 250 100 4% 60%
4 46.6 ≈ 47 52 250 100 10% 60%
5 22.85 ≈ 23 28 250 60 21% 76%

in a 250−100
250
= 60% reduction in midhaul link bandwidth requirements compared to
the C-RAN. Table 4.4 shows the CPU overprovisioning (OVP) vs bandwidth (BW)
saving for all time slots. As the table indicates, time slots 1 and 5 have higher levels
of CPU overprovisioning compared to the other time slots, while the bandwidth
saving is relatively higher.

4.6.5 Robustness
This section investigates the impact of CPU centralization in the CU on network
performance. We define the centralization ratio (CR) as:

T otal CP U in CU
(4.27)
T otal CP U in CU + T otal CP U in DU s
CR = 0 corresponds to a D-RAN deployment with all CPUs in DUs, while CR = 1
corresponds to a C-RAN deployment with all CPUs in the CU. As a result of Section
4.6.4, we utilize 52 CPUs, and the CR values for different implementations are
presented in Table 4.5.
As discussed in this study, we introduced a new term to the objective function
of problem P, represented by Eq. (4.11). This term denotes the number of VNF
reconfigurations within a specified time period T . The purpose of this addition was
to enhance the robustness of the VNF reconfiguration process within the O-RAN
system. Equation (4.12) defines a parameter α, where α = 0 indicates that only
the number of VNF reconfigurations term is included in the objective function, and
α = 1 indicates that only the computational cost term is included. To examine the
impact of varying the relative weights of computational cost and number of VNF
reconfigurations on the results, we analyze three cases where α takes the values 0, 0.5,
and 1. Table 4.6 shows the results. When we consider only the computational cost in
the objective function (α = 1), we observe that the number of VNF reconfigurations
is higher compared to the other two scenarios. However, we can reduce this number
by increasing the CR parameter. The reason for this is that when the CPUs are
centralized in one CU, the need for VNF reconfigurations decreases. By adding the
number of VNF reconfigurations to the objective with equal weight to computational

62
Table 4.5: Different Centralization Ratios (CRs)

DUs Capacity CU Capacity CR


40 12 0.2
30 22 0.4
20 32 0.6
10 42 0.8

Table 4.6: Trade-off between computational cost and the number of VNF reconfig-
urations
Overhead Reduction Gain Increased computing cost
2*CR Computatinal Cost No. of VNF Reconfigs
Comp. to α=1 Comp. to α=1
α=0 α=0.5 α=1 α=0 α=0.5 α=1 α=0 α=0.5 α=0 α=0.5
0.2 13.27 12.96 12.48 118 238 486 76% 51% 6% 3%
0.4 55.86 51.37 48.03 157 298 465 66% 36% 16% 7%
0.6 99.64 96.45 80.76 286 320 402 29% 20% 23% 19%
0.8 133.38 123.67 115.59 87 97 113 23% 14% 15% 7%

cost (α = 0.5), the frequency of VNF reconfigurations is reduced. The reduction


gains in the overhead range from 14% to 51%, while computational costs increase
by 2% to 19% for varying CRs. The reduction gains in the overhead range from
23% to 76%, while the computational costs increase by 6% to 23% for different CRs
when only considering VNF reconfigurations as the objective (α = 0).

4.6.6 Network KPIs


Figure 7 illustrates the impact of CR on various network Key Performance Indica-
tors (KPIs). Specifically, the figure depicts the changes in bandwidth usage, CPU
utilization of CU, and average CPU utilization in DUs as the CR increases. As
shown in Fig. 4.7a, increasing CR leads to a gradual increase in bandwidth usage.
Meanwhile, Fig.4.7b shows that the average CPU utilization in DUs decreases as
the centralization ratio increases. On the other hand, Fig. 4.7c shows that CPU
utilization in the CU is not significantly affected by centralization, mainly because
the CU has a high capacity that changes slowly.

4.7 Summary
In this chapter, we formulated robust VNF reconfigurations under dynamic traffic
conditions to minimize the necessity of VNF reconfigurations and the computational
cost in DUs/CU. The problem formulation was done to understand the objective
function and constraints. The complexity analysis specified that the proposed prob-
lem was an NP-hard optimization problem that could not be solved optimally, es-

63
TS=1 TS=2 TS=3 TS=4 TS=5 TS=1 TS=2 TS=3 TS=4 TS=5 TS=1 TS=2 TS=3 TS=4 TS=5

100
300
80

Avg. CPU Utilazation in DUs %

CPU Utilazation in CU %
250 80
60
200
BW usage
60

150 40
40
100
20
20
50

0 0 0
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8
Centralization Ratio (CR) Centralization Ratio (CR) Centralization Ratio (CR)

(a) (b) (c)

Figure 4.7: The network KPIs with respect to the Centralization Ratio (CR) in five
time slots, (a) bandwidth usage, (d) CPU utilization in DUs, and (c) average CPU
utilization in CU
pecially for a large-scale O-RAN system. We leveraged constrained combinatorial
optimization with deep RL. The sequence-to-sequence model was used to estimate
the optimal policy by learning long-term dependencies to solve the problem. In
this case, the agent interacted with the O-RAN environment and found an optimal
policy to solve the problem. It was impossible to directly apply neural combinato-
rial optimization to constrained optimization problems because, without constraint
dissatisfaction, the cost function could not provide sufficient information. Thus, we
relied on the Lagrangian relaxation technique to define a penalized cost function
made from the cost of VNF placement in the objective and penalty for the con-
straint dissatisfactions. Our experimental results showed that, after training was
completed, the penalty disappeared, and the VNF placement cost decreased and
converged to a specific value.

64
Chapter 5

Energy Efficient Dynamic VNF


Splitting in O-RAN

5.1 Overview
In this chapter, we shed light on the importance of energy efficiency as another
critical objective for VNF splitting in O-RAN environments. As the demand for
sustainable network solutions grows, energy-aware VNF splitting becomes increas-
ingly vital to address the challenges posed by dynamic traffic conditions in real-life
O-RAN scenarios.
To tackle this challenge, we first model the energy consumption in DUs/CUs
and O-RAN links, enabling us to quantify and optimize energy usage effectively.
We then formulate a multi-objective problem that minimizes the total energy con-
sumption, taking into account various network constraints such as delay, bandwidth,
and computing resources in DUs/CUs.
To propose a novel and effective solution, we integrate a well-known DRL ap-
proach called Advantage Actor-Critic (A2C) with a sequence-to-sequence model.
This integration allows us to handle the sequential nature of RAN disaggregation
and consider long-term dependencies effectively. By leveraging DRL in the opti-
mization process, we aim to enhance the energy efficiency of VNF splitting and
contribute to more sustainable and eco-friendly O-RAN deployments.
The results of our performance evaluation demonstrate the superiority of our
proposed solution for dynamic VNF splitting over approaches that do not involve
VNF splitting. Notably, we observe a significant reduction in energy consumption,
achieving up to 56% and 63% improvements for business and residential areas, re-
spectively, under varying traffic conditions.

65
By focusing on energy efficiency as a key objective in VNF splitting, our chapter
contributes to the research on sustainable and eco-friendly O-RAN deployments.
With our integrated DRL approach and multi-objective optimization, we pave the
way for more energy-efficient and environmentally conscious network management
strategies. These results are paramount in advancing the field of AI-driven network
management and contributing to a greener and more sustainable future.

5.2 Motivation
Telecommunication consumes around 4-6% of global energy, with energy costs ac-
counting for 20-40% of mobile network operating expenditure (OPEX) [110]. How-
ever, as 5G systems expand, energy consumption is expected to rise by 2-3 times,
prompting growing concerns from operators [111]. The RAN and base stations have
traditionally been a substantial source of energy consumption in cellular networks.
This chapter focuses on optimizing energy consumption in the O-RAN) architecture.
In this chapter, the goal is to develop an energy-efficient dynamic VNF splitting
method in an O-RAN environment, considering a broad range of O-RAN config-
urations. To achieve this, our proposed system model and problem formulation
incorporate various factors such as VNF processing load, bandwidth demand, and
energy consumption costs of DUs/CUs and interconnecting links. The novelty lies in
our comprehensive approach that considers diverse aspects of O-RAN architecture
and energy efficiency.
The proposed solution tackles the challenge of achieving dynamic VNF splitting
that can adapt to varying traffic conditions in O-RAN environments. To address this
challenge, we apply DRL approach, which is a promising approach in intelligent re-
source management [112]. Dynamic traffic conditions can make on-demand resource
management in O-RAN more challenging because service requests fluctuate. While
some prior efforts [55,76] have attempted to address this challenge, they did not uti-
lize DRL and advanced neural network architectures that have the potential to learn
more complex patterns and potentially provide better performance for effective on-
demand resource management. Our approach is novel because we integrate Advan-
tage Actor-Critic (A2C), a well-known DRL algorithm, into a sequence-to-sequence
(Seq2Seq) model that considers the sequential nature of RAN disaggregation and
long-term dependencies. This enables efficient energy consumption management.
Overall, our paper contributes to the research on dynamic VNF splitting in O-RAN
environments and highlights the importance of considering the impact of dynamic

66
Table 5.1: Bandwidth requirement of different splits [4]

Split VNFs in DU VNFs in CU UL BW (Mbps)


1 f1 → f2 → f3 None λ
2 f1 → f2 f3 λ
3 f1 f2 → f3 1.02λ + 1.5
4 None f1 → f2 → f3 2500

traffic conditions on energy consumption.

5.3 System Model


This chapter adopts VNF splits proposed in [4] like Chapters 3 and 4 for the O-
RAN system, where traffic flows are aggregated at RUs and CUs. A chain of VNFs
(f 0 → f 1 → f 2 → f 3) is considered in this case. Four possible split points are
presented in Table 6.1. As more VNFs are placed in the CU from split 1 to 4, the
bandwidth demand between DU and CU increases. In Fig. 6.1, RUs are connected
to multiple DUs as they are multi-homed. Each edge site has multiple DUs that
serve multiple RUs through Fronthaul (FH) links. The edge site is connected to the
cloud site through midhaul (MH) links, providing the required delay for the splits
(2-20ms) [4]. The cloud site includes multiple CUs and has a direct interface to the
core nodes (EPC and 5GNC).
The network defines the set of RUs and edge sites as R and E, respectively.
The set of DUs in edge site e and CUs in the cloud site are defined as Me and N ,
respectively. CM H denotes the capacity of MH links, and a binary matrix WR×E is
used to indicate the allowed associations between RUs and edge sites since RUs are
multi-homed. ρp,n and ρp,m describe the processing load of CU-n and DU-m which
result in the split p (p ∈ P). ρ1 , ρ2 and ρ3 are processing load of f 1, f 2 and f 3,
respectively. Hm,e is the processing capacity (cycles/s) of DU-m from edge site e, and
Hn is the processing capacity of CU-n (Hn ≫ Hm,e ). This system model considers
all network traffic flows, including downlink and uplink traffic. This paper only
focuses on the uplink case. In this scenario, the O-RAN system must accommodate
VNFs for all traffic flows from RUs to the core network.

67
Figure 5.1: Proposed O-RAN system architecture overview. It consists of three key
components: RUs, edge sites containing DUs, and a cloud site containing CUs, along with
a transport layer.

5.4 Problem Formulation


In the following section, we will outline the problem formulation for splitting VNFs
across CUs and DUs in O-RAN to minimize energy consumption.

5.4.1 Decision Variables


As previously stated, the VNFs are located in DUs and CUs per RU. Based on
Table 6.1, for RU-r, four splits options (P = {1, 2, 3, 4}) are available. Thus, we use
a binary indicator variable xτp,r ∈ {0, 1} as the decision for deploying split p ∈ P for
RU-r at time slot τ .

X
xτp,r = 1, ∀r ∈ R, i ∈ I (5.1)
p∈P

The constraint (5.1) ensures that RU-r can only have one split at time slot τ .
The total computational load of VNFs assigned to each DU should not go beyond
its computational capacity:
X X
τ
γr,e,m λ̂τr ρp,m xτp,r ≤ Hm,e , ∀m ∈ Me , e ∈ E (5.2)
r∈R p∈P
X
τ
γr,e,m ≤ ωr,e , ∀r ∈ R, e ∈ E (5.3)
m∈Me
X X
τ
γr,e,m = 1, ∀r ∈ R (5.4)
e∈E m∈Me

where λ̂τr is the peak of traffic at time slot τ , and binary variable γr,e,m
τ
that indicates
whether RU-r is connected to DU-m from edge site e. Constraint (5.3) guarantees

68
that RU-r can exclusively be allocated to edge sites given by matrix W , while
constraint (5.4) ensures that only one DU and one edge site are selected for each
RU at time slot τ . Next, the total computational load of VNFs assigned to each CU
should not exceed its computational capacity:
X X
τ
zr,n λ̂τr ρp,n xτp,r ≤ Hn , ∀n ∈ N (5.5)
r∈R p∈P

X
τ
zr,n = 1, ∀r ∈ R (5.6)
n∈N
τ
where zr,nis a binary variable indicates whether RU-r is connected to CU-n at time
slot τ , and constraint (5.6) ensures that only one CU is selected for each RU at time
slot τ . The bandwidth demand for RU-r at time slot τ , denoted as bτr , depends on
the selected split. Based on Table 6.1, bτr is calculated as follow:

bτr = λ̂τr (xτ1,r + xτ2,r ) + xτ3,r (1.02 λ̂τr + 1.5)+ 2500 xτ4,r , ∀e ∈ E (5.7)

In determining the routing decision, the link capacity is taken into account:
X X
bτr γr,e,m
τ
≤ CM H , ∀e ∈ E (5.8)
r∈R m∈Me

5.4.2 Energy Consumption


We adopted the energy consumption model from the EARTH project [113] to the
O-RAN setting, accounting for static and traffic-dependent dynamic energy costs.
The energy consumed by DU-m depends on the average load of its CPU at time slot
τ . The average load of DU-m located from edge site e at time slot τ for the selected
split points will be:
X X
ℓ̄m,e (xτm ) = τ
γr,e,m λ̄τr ρp,m xτp,r , ∀e ∈ E, m ∈ Me (5.9)
r∈R p∈P

where λ̄τr is the average traffic demand of RU-r at time slot τ . Energy consumption
of DU-m at time slot τ over tτ period is modeled as [113]:

Em,e (xτm ) =

ℓ̄m,e (xτm )
 (5.10)
tτ I(ℓ̄m,e (xτm )>0) Pm,0 + Pm,max
Hm,e

where Pm,0 in Watts represents the fixed power costs (e.g., cooling, power amplifica-
tion, etc.) of running a DU-m server. Pm,max in Watts is the dynamic load-dependent

69
power consumption that increases linearly with the DU’s load. The indicator vari-
able I is set to 1 when the condition ℓ̄m,e (xτm ) > 0 is true, indicating that DU-m is
busy, and 0 when it is idle. As a result, the total energy consumption of all edge
sites is:
X X
EEdges (xτ ) = Em,e (xτm ) (5.11)
e∈E m∈Me

The energy consumption of a CU-n and cloud site are expressed similarly:
ℓ̄n,e (xτn )
 
En (xτn ) = tτ I(ℓ̄n,e (xτn )>0) P + Pn,max (5.12)
n,0 Hn

X
ECloud (xτ ) = En (xτn ) (5.13)
n∈N

Energy consumption on MH links is given like [114] as:


!
X X X
τ
EN et (x ) = tτ Pnet bτr γr,e,m
τ
(5.14)
e∈E m∈Me r∈ R

where Pnet is the power consumption per bandwidth unit on MH links. Finally, the
overall objective is:

P: min (EEdges (xτ ) + ECloud (xτ ) + EN et (xτ )) ,


xτ ∈ X τ
(5.15)
s.t. (1)–(8)

where xτ represents the set of splits for all RUs at time slot τ , and X τ denotes the
domain of optimization.

5.5 Seq2Seq-A2C Decision Algorithm


To adapt proactively to a dynamic environment and effective decision-making, we
suggest using a solution called Seq2Seq-A2C. This solution integrates Seq2Seq model
with the A2C algorithm to address problem P.

5.5.1 Basics of A2C and Seq2Seq Model


In advance, we briefly introduce RL, which seeks to enhance control and decision-
making skills in a given environment through trial and error. At each time-step t,
the agent receives a state st ∈ S and chooses an action (at ) from the available actions
A based on its policy π (at |st ). The agent moves to the next state st+1 and receives a
P∞ k
reward rt after interacting with the environment. Rt = k=0 γ rt+k represents the

70
total accumulated return at time-step t. 0 < γ < 1 is the discount factor. The RL
agent’s objective is to maximize the expected return from each state st . This can
be estimated through the state-value function V (s) and the action-value function
Q(s, a). A2C is a useful approach that approximates only the state value function
V (st ) instead of approximating both Q(st , at ) and V (st ). This simplifies the learning
process. The term ”advantage” refers to A(st , at ) = Q(st , at ) − V (st ), which denotes
the advantage of taking action at in state st . The actor is responsible for π (at |st ),
and the critic evaluates the actor’s actions by V (st ). It is common to use Temporal
Difference (TD) error [115] to estimate A (st , at ) with minimal variance:

A (st , at ) = Q (st , at ) − V (st )


(5.16)
≈ rt + γ V (st+1 |st , at ) − V (st ) = δ (st )

To understand the Seq2Seq model, first, it needs to understand Long Short-Term


Memory (LSTM) [116] architecture. LSTM is an artificial Recurrent Neural Network
(RNN) used for prediction and classification with remarkable success. The Seq2Seq
model [104] is performed by two LSTM cells named encoder and decoder. The final
states of the encoder are passed to the initial states of the decoder. Due to the space
limitation, we omit the details of the LSTM and Seq2Seq model here, and interested
readers could resort to [104, 116] to find the details.

5.5.2 Seq2Seq-A2C
The process of learning is explained as follows. Initially, the agent is given an input
state s = (r, i) ∈ R × I that identifies RU-r during time slot τ . Next, it selects an
output action, as , which specifies one of the splits from Table 6.1. Assuming each
RU is connected to k edge sites, the action space would be 4k, representing the four
different splits for each edge site. To make this decision, the actor network with
parameters θa is used to calculate the policy πθa (as |s) to select a split for each RU
at each time slot.
Fig. 5.2 shows the DRL architecture proposed for energy-aware VNF splitting.
Since the energy consumption model of all DUs in each edge site is the same, we
adopt a method of placing VNF that involves maximizing the number of VNF in
each DU before moving on to the next one. The same approach is used for CUs.
Suppose the agent intends to choose an action for RU-3 during time slot 1, identified
as state s3 = (3, 1). To accomplish this, a state vector that includes the current
environment state (consisting of previously assigned RUs, s1 and s2 ), along with
the desired RU (s3 , shown in red). This vector is then provided to the agent. The

71
Figure 5.2: The architecture of the proposed Seq2Seq-A2C algorithm.

agent produces an action that specifies the placement of VNF for the corresponding
RU. The environment evaluates the effectiveness of this decision by applying problem
formulation methods outlined in Section 5.4 to calculate a reward signal as feedback.
The environment is denoted by problem P. The agent perceives the environment as
a black box, in the sense that it interacts with the environment to derive an optimal
policy that solves problem P, without understanding the underlying system. We use
the Lagrange relaxation technique [107] to define the reward function to eliminate
the constraints of problem P. This results in an unconstrained problem, where we
penalize solutions that are not feasible by applying a penalty:

X
rt = −E (st , at ) + µi . Ci (st , at ) (5.17)
i

where E (st , at ) is total energy consumption of taking action at in state st in problme


P. µi is the Lagrange multiplier (penalty coefficient) for constraint i, and Ci is the
amount of dissatisfaction of constraint i.
As shown in Fig. 5.2, the proposed Seq2Seq-A2C solution comprises two net-
works: an actor and a critic network. The actor network, which has a weight

72
Algorithm 4 Seq2Seq-A2C
1: Initialize actor and critic networks with random weights
2: Set learning rate α, discount factor γ, and entropy coefficient β
3: for each epoch do
4: Reset environment to initial state
5: Collect B samples and store (si , ai , ri ) in buffer
6: Calculate advantage function A(si , ai ) for each sample
7: Calculate entropy H(πθ (si )) for each sample
8: Calculate actor and critic loss: Eqs. (5.18) and (5.20)
9: Update actor and critic parameters: Eqs. (5.19) and (5.21)
10: end for
11: return trained actor and critic neural networks with updated parameters

parameter θa , produces actions based on the current state. The actor network con-
sists of a Seq2Seq model and a fully connected layer. The Seq2Seq model extracts
feature from the state s relevant to selecting an action. The fully connected layer
takes the output of the Seq2Seq model and maps it to a probability distribution
over the possible actions using the SoftMax function. The critic network, which has
a weight parameter θc , has the same structure as the actor. However, it only uses a
single neuron with weight θc to extract features relevant to value estimation.
Algorithm 4 presents the Seq2Seq-A2C algorithm. To promote exploration, we
incorporate entropy regularization into the actor network’s loss function, which can
be expressed as follows:
B
1 X
La = − ( δ (st ) logπθa (ai |si ) + βH (πθa (ai |si ))) (5.18)
B
1

where β corresponds to the weight assigned to the action entropy, and B is batch
size. The actor network’s parameter update can be represented by :

θa ← θa + α∇θa La (5.19)

The critic network’s loss function and parameter update:


B
1 X
Lc = A (si , ai )2 (5.20)
B
1

θc ← θc + α∇θc Lc (5.21)

5.6 Performance Evaluation


This section tests the proposed algorithm under various settings and compares the
attained results against the benchmark solutions.

73
Table 5.2: Network and Seq2Seq-A2C parameters

Parameter Value
Pm,0 , Pm,max 120, 280 (W)
Pn,0 , Pn,max 240, 560 (W)
PN et 0.005 (Watt/Mbps)
λ̂r , λ̄r [40,200], [50,160] Mbps
α 0.0001
Bath Size 32
LSTM hidden Size 32
Clip norm 0.8
γ 0.99
Lagrange multiplier 10

5.6.1 Simulation Setup


To evaluate the quality of our solution, we rely on traffic measured in [3] for different
cells in a business and residential area as shown in Fig. 6.3. It is observed that the
traffic fluctuates significantly throughout the day, and there are long periods of low
activity. We consider an O-RAN with one cloud, four edge sites, and 120 RUs. Each
edge site serves 30 RUs, and each RU is associated with two edge sites. The cloud
site has 8 CUs, while each edge site has 4 DUs. Each DU contains 4 CPUs, and each
CU contains 10 CPUs. DUs and CUs use an Intel Haswell i7-4770 3.40GHz CPU,
similar to the approach in [111]. In this scenario, ρp,m = {0.005, 0.004, 0.00325, 0}
and ρp,n = {0, 0.001, 0.00175, 0.005} per traffic unit (Mbps). Table 5.2 provides
each DU and CU’s fixed and dynamic power consumption. Based on the traffic
fluctuation for the business area shown in Fig. 6.3, we have defined four time slots
with intervals of 8, 6, 6, and 4 hours per day. The residential area has four time
slots with intervals of 8, 8, 4, and 4 hours per day. Table 5.2 details the network
configuration and parameters of Seq2Seq-A2C. The hyperparameters for DRL are
based on the guidelines outlined in [100], while the energy consumption model is
derived from the framework presented in [76].

5.6.2 Model Training


Initially, we examined how the Seq2Seq-A2C performed during the learning phase
and compared the results with a single LSTM cell (LSTM-A2C), simple AC, and
Deep Q-Network (DQN) with SeqSseq model, Seq2Seq-AC and Seq2Seq-DQN. For
each epoch, the normalized energy consumption and normalized penalty cost are

74
1.0
Business Area Week-day
Residential Area Week-day
Business Area Week-end
0.8
Residential Area Week-end

0.6
NT
0.4

0.2

0.0
0 4 8 12 16 20 24
Time

Figure 5.3: Normalized Traffic (NT) condition for business and residential area [3].

shown in Fig. 5.4a and Fig. 5.4b, respectively. Fig. 5.4 demonstrates that Seq2Seq-
A2C performs better than other methods by significantly reducing the penalty cost,
bringing it closer to zero. Additionally, at the end of the training, Seq2Seq-A2C
had lower energy consumption than the other methods. The observed improvement
in Seq2Seq-A2C performance during training is attributed to integrating A2C with
entropy regularization into the actor loss function. This integration facilitates ex-
ploration, ultimately enhancing learning efficiency. To compare LSTM and Seq2Seq
models, we can see that the proposed solution with the Seq2Seq model (Seq2Seq-
A2C) model offers faster convergence and superior results at a lower cost than the
LSTM-A2C model. For example, Fig. 4b illustrates that the Seq2Seq-A2C model
experiences a near disappearance of the penalization cost after epoch 1000, whereas
this occurs at epoch 1500 for the LSTM-A2C model, which means a 33% improve-
ment.
All methods in Fig. 5.4 begin with generating random actions that do not sat-
isfy many constraints, leading to a high penalty cost. Initially, the agent prioritizes
constraint satisfaction by reducing the penalty cost, ignoring the energy consump-
tion cost. As the number of periods increases, both costs decrease, which indicates
the improvement of the agent’s policies. The penalty cost almost disappears after
training, indicating that the agent has learned the desired policy based on the given
information. This means that the likelihood of receiving a VNF splits that violates
the constraints is low, but if it occurs, there will be a slight dissatisfaction.

75
Figure 5.4: Training of the proposed DRL: (a) Normalized Energy Consumption
(NEC), and (b) Normalized Penalty Cost (NPC).

5.6.3 Comparison Benchmarks


In this section, we evaluate our solution against four different common benchmarks
in the context of RAN management:

• Traditional distributed RAN (D-RAN): All baseband processing occurs at the


base station.

• Centralized RAN (C-RAN): All VNFs are located in CUs.

• Greedy-A (G-A) approach: We first place the VNFs in DUs without splitting
them. When the DUs are full, we allocate the remaining VNFs to CUs.

• Greedy-B (G-B) approach: We first place the VNFs in CUs without splitting
them. When the CUs are full, we allocate the remaining VNFs to DUs.

As shown in Fig. 6.5 to 6.7, Seq2Seq-A2C consistently outperforms other meth-


ods in all time slots. Regarding energy efficiency, the ranking of the methods from
best to worst is Seq2Seq-A2C, G-A, D-RAN, G-B, and C-RAN. For example for
week-day traffic, Seq2Seq-A2C significantly improves the energy consumption for
the business area over G-A as the second-best benchmark. The improvement per-
centage ranges from 24% to 56% for different time slots. In the residential area, the
improvement percentage ranges from 15% to 63%.

76
Seq-to-Seq-A2C G-A G-B D-RAN C-RAN

20.0

17.5

15.0

Power (KW)
12.5

10.0

7.5

5.0

2.5

0.0
t0 t1 t2 t3
Time Slots

Figure 5.5: Comparison benchmarks for Average Energy Consumption (AEC) per second
for residential area week-day traffic

Sqq-to-Seq-A2C G-A G-B D-RAN C-RAN

20.0

17.5

15.0
Power (KW)

12.5

10.0

7.5

5.0

2.5

0.0
t0 t1 t2 t3
Time Slots

Figure 5.6: Comparison benchmarks for Average Energy Consumption (AEC) per second
for business area week-day traffic

Sqq-to-Seq-A2C G-A G-B D-RAN C-RAN

20.0

17.5

15.0
Power (KW)

12.5

10.0

7.5

5.0

2.5

0.0
t0 t1 t2 t3
Time Slots

Figure 5.7: Comparison benchmarks for Average Energy Consumption (AEC) per second
for residential area weekend traffic

5.7 Summary
In this chapter, we have proposed an intelligent energy-efficient VNF splitting ap-
proach in O-RAN using the Seq2Seq-A2C algorithm. By incorporating the Seq2Seq

77
model, our approach is able to capture long-term dependencies in dynamic traffic
demands and make more informed decisions. The energy consumption is formulated
as a multi-objective optimization problem with constraints. This problem is then
utilized to define the reward function, which consists of two parts: energy cost and
penalty for constraint violation. Our experimental results show that, after training
is completed, the penalty disappears and the energy cost decreases and converges to
a specific value, which is lower than the values given by other compared methods.

78
Chapter 6

Edge AI Empowered Dynamic


VNF Splitting in O-RAN Slicing:
A Federated DRL Approach

6.1 Overview
Throughout the previous chapters, we have explored the effectiveness of centralized
DRL in addressing VNF splitting challenges within O-RAN environments. How-
ever, the potential of distributed AI remains untapped, prompting the focus of this
chapter. Here, we delve into the combined benefits of O-RAN and distributed AI in
network slicing.
O-RAN’s virtualization and disaggregation capabilities offer efficient resource al-
location, while AI-driven networks contribute to optimized performance and decision-
making. The inclusion of Edge-AI further enhances the architecture, enabling local
data processing and faster decision-making, ultimately optimizing the system.
In response to this, we propose a novel federated DRL approach, allowing dy-
namic RAN disaggregation to be offloaded to edge sites for multiple network slices.
Our objective is to achieve optimized dynamic RAN disaggregation by maximizing
resource utilization and minimizing reconfiguration overhead.
Through comprehensive performance evaluations, our proposed approach out-
performs the distributed DRL approach. By fine-tuning the learning rate, we can
narrow the performance gap with the optimal solution by 3%. Moreover, by ad-
justing the reward function weighting factor, we successfully achieve the desired
network Key Performance Indicators (KPIs), resulting in a highly efficient and ef-
fective network-slicing solution.

79
6.2 Motivation
Edge-AI plays a pivotal role in enabling real-time decision-making for dynamic net-
work management [117, 118]. By leveraging AI capabilities at the edge of the net-
work, intelligent and automated decisions can be made in response to changing
network conditions and demands. In the context of O-RAN, one of the key moti-
vations for providing edge-AI lies in the need to create on-demand virtual network
slices and control the entire physical infrastructure of the network in real-time [119].
Edge-AI technology plays a pivotal role in enhancing the capabilities of edge de-
vices integrated into the O-RAN architecture, enabling them to locally process and
analyze data. This decentralized approach empowers edge devices to swiftly make
decisions without solely depending on centralized infrastructure, facilitating quicker
response times and reducing latency. Moreover, the federated approach within edge-
AI enables distributed edge devices to collectively refine and enhance their individ-
ual performance. By employing techniques that involve sharing model updates while
safeguarding the privacy of data, this federated approach significantly bolsters the
accuracy and efficiency of decision-making processes. These advancements lead to
superior resource allocation optimization and elevate the overall Quality of Service
(QoS) standards within various network slicing scenarios present in O-RAN envi-
ronments. This combined utilization of Edge-AI and the federated approach not
only ensures faster local decision-making but also contributes to the collaborative
enhancement of edge device capabilities, ultimately improving the efficiency and
adaptability of O-RAN infrastructure.
In the context of the O-RAN architecture, the division of VNFs between the DU
and CU introduces various possibilities [2]. This chapter focuses on harnessing AI
to facilitate on-the-fly, dynamic VNF splitting for different network slices, thereby
delegating decision-making processes to the network edge, closer to end-users. Prior
research has delved into VNF splitting within O-RAN slicing, covering diverse facets
like energy efficiency [55], financial gains [78], and resource allocation [77]. Further-
more, significant studies have examined VNF splitting in scenarios devoid of network
slicing [6, 120]. However, none of these endeavors have explored the application of
federated DRL at the edge network to enable localized, real-time decision-making
processes.

80
6.3 Cost of Distributed and Federated Learning
The implementation of distributed learning, including federated learning, brings
about several notable costs and considerations. Firstly, in distributed learning se-
tups, there are increased communication and coordination expenses due to the need
for constant data exchanges among multiple devices or nodes. This communication
overhead can be a significant cost factor, particularly in scenarios where large vol-
umes of data need to be synchronized across diverse devices. Additionally, there’s an
inherent computational cost associated with managing and orchestrating the learn-
ing process across multiple nodes, often demanding higher computational resources.
In the case of federated learning specifically, while it preserves data privacy, the pro-
cess requires a considerable initial investment in setting up secure communication
protocols and encryption mechanisms to ensure confidentiality and integrity. More-
over, there’s the cost of potential model degradation due to heterogeneity among
the distributed data sources, requiring additional efforts to harmonize the diverse
datasets across nodes. These combined costs underscore the importance of weigh-
ing the trade-offs between the advantages of distributed/federated learning and the
expenses incurred in terms of communication, computation, and data privacy main-
tenance.

6.4 System Model


Fig. 6.1 shows our proposed O-RAN architecture consisting of multiple network
functions and slices. The deployment includes two RICs, namely the non-RT RIC
and the near-RT RIC, as well as CUs, DUs, and RUs. The CU is further divided
into the control plane (CP) and user plane (UP). Within each RU, there are two
types of slices: eMBB and URLLC, with multiple for each type. In our architecture,
the VNFs are deployed individually for each network slice. Due to dynamic traffic
conditions, we need dynamic VNF splitting to make on-demand resource manage-
ment. In our proposed architecture, an xAPP is responsible for facilitating VNF
migration in response to dynamic VNF splitting decisions made by edge sites. This
xAPP ensures efficient relocation of VNF instances to accommodate the changing
requirements of network slices.
In the network, there are a total of N edge sites connected to the cloud site via
midhaul (MH) links, as shown in Fig. 6.1. The set of edge sites is denoted as N ,
and the number of agents in the set is represented by N = |N |. Each individual

81
edge site, denoted as n, comprises a pool of DUs. Furthermore, each edge site is
linked to a collection of RUs through fronthaul (FH) links. The cloud site includes
a pool of CUs. Each edge site n is characterized by its computing capacity, denoted
as Hn . However, for the cloud site, we assume an infinite computing capacity. This
assumption is made to eliminate conflicting decisions regarding the placement of
VNFs between the edge sites and the cloud site. The network slices are represented
by M, and the number of eMBB network slices is denoted as M = |M|. The
traffic demand for slice m at time t is represented by λtm .
We utilize the VNF splitting proposed in [121] for each network slice. In this
case, VNFs are categorized into two groups: User-Related Functions (URF), which
are dedicated to specific user data, and Cell-Related Functions (CRF), responsible
for processing the multiplexed URFs at the physical layer. For eMBB slices, the
top-tier component, CU, is capable of executing only URFs. On the other hand,
DUs are responsible for handling both URFs and CRFs. There are two reasons for
not processing CRFs at the CU level. Firstly, processing CRFs at this level would
require a significant allocation of bandwidth on the MH links. Secondly, operating
these functions at the DUs allows for more flexibility in meeting stringent delay
requirements. By contrast, due to the high sensitivity of URLLC slices to delay and
lower traffic volume compared to eMBB slices, both URFs, and CRFs are placed in
DUs. As a result, we differentiate between URLLC and eMBB slices in terms of VNF
splitting. For URLLC slices, we prioritize reserving processing capacity at the edge
sites and do not consider VNF splitting. However, for eMBB slices, we focus on VNF
splitting between the DUs and the CUs. Consequently, the subsequent sections of
this paper specifically delve into the VNF splitting of eMBB slices. Additionally, it is
important to note that for each network slice, due to dynamic traffic conditions, the
VNF split decisions must be dynamically made at different time intervals (τ ∈ T ).
In the upcoming section, our goal is to optimize the dynamic VNF splitting
for eMBB network slices between DUs and CUs, in response to dynamic traffic
conditions. The aim is to maximize the utilization of edge sites while minimizing
the overhead associated with VNF reconfiguration. Maximizing edge site utilization
by placing more VNFs close to users enhances network efficiency and improves the
user experience. In the next section, we employ Federated Deep Q-Network (F-
DQN) to effectively achieve this goal.

82
Figure 6.1: The proposed O-RAN system model architecture leverages federated learning
at the edge network to enable dynamic VNF splitting for network slices.

6.5 O-RAN Slicing with F-DQN


In F-DQN, we integrate the power of federated learning with DQN methodology. F-
DQN encompasses three critical components: state representation, reward function,
and action space. The state captures the current context of the federated learning
agents, enabling them to perceive and understand their environment. The reward
function evaluates the actions taken by the agents based on their performance,
facilitating learning and decision-making. The action space defines the range of
possible actions that the agents can select to interact with their environment. By
combining the benefits of federated learning and DQN, F-DQN empowers distributed
agents to collectively learn and optimize their decision-making process in a federated
manner.
In F-DQN-based O-RAN Slicing, each edge site is treated as an independent
agent. As mentioned in Section 6.4, for eMBB slices, the CRF is placed in the DU.
For URFs, a binary variable xτm is introduced, where 1 represents placement in the
DU and 0 represents placement in the CU.

6.5.1 State
The state in our scenario refers to the VNF splits of eMBB slices in the previous
time interval (τ − 1), as well as the corresponding traffic demand for all slices. Thus:

83
st = xτ1 −1 , ..., xτM−1 , λτ1 , ..., λτM
 
(6.1)

6.5.2 Action space


The action in our context represents the decision made for VNF splitting of eMBB
slices. Since there are two available options for VNF splitting for each eMBB slice,
the action space will consist of 2M possibilities, where M represents the number of
eMBB slices.

6.5.3 Reward
In order to achieve our objective of optimizing the VNF splitting to maximize the
utilization of edge sites while minimizing VNF reconfiguration overhead, we define a
reward function for each agent (n ∈ N ) that consists of two components: rnτ and r′τn .
The first component, rnτ , aims to maximize the utilization of edge site n, while the
second component, r′τn , focuses on minimizing the VNF reconfiguration overhead of
edge site n. Final reward is calculated as:

Rnτ = αrnτ + (1 − α) r′τn (6.2)

where α represents the weighting factor. To define rnτ , we first establish the process-
ing load of the CRF per traffic unit (Mbps) as ρCRF and the processing load of the
URF as ρU RF . With this information, we can then determine the utilization of edge
site n using the following definition:
!
1 X
Un = λtm (ρCRF + ρU RF xτm ) (6.3)
Hn m∈M

where Hn is the processing capacity of edge site n.Based on the utilization Un of


edge site n, we can now define rnτ as follows:
(
Un Un < 1
rnτ = (6.4)
−PUn else
where P represents the penalty coefficient applied when the edge site n becomes
overloaded.
r′τn is defined to minimize VNF reconfiguration overhead:

84
1 X τ
r′τn = 1 − x − xτm−1 (6.5)
M m∈M m
τ τ −1
P
where m∈M |xm − xm | is the total number of VNF reconfigurations at time
interval τ compared to time τ − 1. The result is then normalized by dividing it by
the total number of slices which is M .

6.5.4 DQN
The objective of DQN is to maximize the expected long-term reward, represented
as:

Q (s; a) = E [rt + γQ (st+1 ; at+1 ) | st = s, at = a ] (6.6)

In the DQN approach, the Q-values are estimated using a deep neural network
(DNN). The parameters of the DNN are updated using the stochastic gradient de-
scent algorithm. This update process can be represented as follows:

θt+1 = θt + α [rt + γ max (Q (st+1 , at+1 ; θt ) (6.7)


− Q (st , at ; θt )) ∇Q (st , at ; θt )] (6.8)

where θ represents the parameters of the DQN, α represents the learning rate, and γ
represents the discount factor. Through experience replay and iterative parameter
updates, the DNN can predict the expected accumulated reward and select the
optimal action accordingly.

6.5.5 F-DQN
The algorithm involves multiple agents, each having its own local dataset. At each
agent, a local training procedure is performed, where the dataset is split into mini-
batches, and the model parameters are updated using gradient descent. The global
model is initialized at the RIC, and in each iteration, a random subset of agents is
chosen. Each chosen agent performs local training in parallel, updating their local
model. The updated local models are then aggregated at the RIC using averaging
aggregation. This process is repeated for a specified number of iterations. The
output of the algorithm is the final global model. The pseudo-code in Algorithm
I provides a structured framework for implementing F-DQN, where participants

85
Algorithm 5 Federated DQN (F-DQN) Algorithm
Require:
Local mini-batch size B
Number of local epochs E
Number of participants N
Learning rate α
Global model θG
Loss function L(θ)
Ensure:
Global model θG
1: function LocalTrain((i, θ))
2: Split local dataset Di into mini-batches Bi , each of size B
3: for j = 1 to E do
4: for b in Bi do
5: Update θ by performing gradient descent step: θ ← θ − α∇L(θ, b)
6: end for
7: end for
8: end function
9: Training at the server (RIC):
10: Initialize θG
11: for t = 1 to T do
12: Randomly choose m participants from the total participants
13: for each chosen participant i in parallel do
i
14: Compute θt+1 by calling LocalTrain(i, θG )
15: end for
Aggregate the local models: θG ← N1 N i
P
16: i=1 Di · θt+1 (averaging aggregation)
17: end for

86
Figure 6.2: The aggregated fronthaul link traffic dataset used in our simulations,
representing the traffic for eMBB slices of each agent

Figure 6.3: Training process of F-DQN vs L-DQN for agent 1 and 2

collaborate to train a shared model while keeping their data local and maintaining
privacy.

6.6 Performance Evaluation


6.6.1 Simulation settings
In our simulations, we set up a scenario with two edge sites functioning as agents,
and each agent is responsible for serving 10 RUs. Each edge site is equipped with 10
CPU cores. We define the processing load of the CRF as ρCRF = 0.001 and the URF
as ρU RF = 0.005. For each RU, we consider three eMBB slices. The training dataset
used in our simulations contains the aggregated fronthaul link traffic for each eMBB
slice of each agent, as depicted in Fig. 6.2. The traffic demand ranges from 50Mbps
to 1300Mbps. Additional simulation parameters are given in Table 6.1. This setup
allows us to analyze and evaluate the performance of the dynamic VNF splitting
algorithm in the context of eMBB network slices.

87
Table 6.1: Network and F-DQN parameters

Parameters Values
λτm [50, 1300] Mbps
P {1, 10, 100}
ρCRF , ρU RF 0.001,0.005
Edge Capacity 10 CPUs
Learning Rate (α) {0.0001, 0.001, 0.01, 0.1}
Batch Size 64
Hidden Size 64
γ 0.99
Local model aggregation duration Each 100 episodes

Figure 6.4: Training process for agent 1 with varying penalty coefficients (P)

6.6.2 Model training


In the initial evaluation, we conduct a comparison between the training processes
of F-DQN and Local DQN (L-DQN). In L-DQN, each agent starts training inde-
pendently without sharing knowledge. The normalized reward for each episode is
depicted in Fig. 6.3. In F-DQN, both agents converge sooner than L-DQN, and
F-DQN outperforms L-DQN and gives more rewards after convergence. This is be-
cause F-DQN leverages collective intelligence from multiple agents. By aggregating
and sharing local models, F-DQN benefits from a larger and more diverse knowledge
base, leading to improved decision-making and higher overall performance.
In the next step, we investigate the impact of the penalty coefficient (P) on
the training process of both agents. The results, depicted in Fig. 6.4 and Fig. 6.5,
demonstrate that the value of P significantly influences the convergence behavior.
Higher values of P result in later convergence. For instance, when P = 1, conver-
gence starts around episode 100, whereas for P = 100, convergence initiates around
episode 250.

88
Figure 6.5: Training process for agent 2 with varying penalty coefficients (P)

Figure 6.6: Performance gap comparison between different learning rates for F-DQN,
C-DQN, and L-DQN approaches

6.6.3 Comparison with optimal solution


In this experiment, we aimed to assess the performance gap between the solutions
obtained using different learning rates compared to the optimal solution obtained
through exhaustive search. Exhaustive search is a method that systematically eval-
uates all possible solutions to find the optimal solution. Fig. 6.6 illustrates the
percentage gap between three different methods: F-DQN, L-DQN, and Centralized
DQN (C-DQN) which the near RT-RIC do the training by using the global dataset.
The results highlight the significance of choosing an appropriate learning rate to
achieve a near-optimal solution. Among the various learning rate values examined,
0.01 demonstrated the lowest gap. Specifically, F-DQN and C-DQN exhibited de-
sirable performance with a gap of approximately 3% when the learning rate was set
to 0.01. On the other hand, L-DQN showed a higher gap of around 9%.

6.6.4 Network KPIs


This section focuses on network KPIs. In Fig. 6.7 and 6.8, we vary the parameter
α in Eq. (6.2) from 0.1 to 0.9. We measure the number of reconfigurations and
the utilization of edge sites for two types (type 1 and type 2) of traffic with normal
distributions. Type 1 has a mean of 800 Mbps and a standard deviation of 5, while
type 2 traffic also has a mean of 800 Mbps but with a higher standard deviation

89
Figure 6.7: VNF reconfigurations with varying α values

Figure 6.8: Edge sites utilization with varying α values

of 20 (note that type 2 fluctuates more than type 2). By adjusting the parameter
α, we can control the number of reconfigurations and the utilization of edge sites.
For example, setting α = 0.1 reduces the number of reconfigurations by 7, while
α = 0.9 increases the average edge site utilization by 85%. Interestingly, for type
1 traffic (less fluctuated), increasing α beyond 0.5 has minimal impact on average
utilization and the number of reconfigurations. However, for type 2 traffic (more
fluctuated), both average utilization and the number of reconfigurations continue to
increase with higher α values.

6.7 Summary
This chapter proposes a novel approach that leverages federated DRL for dynamic
VNF splitting in network slicing within the O-RAN architecture. The approach
enables real-time decision-making at the network edge, improving responsiveness
and reducing latency. The performance evaluation shows comparable performance to
centralized DRL, highlighting the effectiveness of the federated approach. This work
contributes to decentralized control, optimized resource utilization, and efficient
VNF reconfigurations in network slicing scenarios.

90
Chapter 7

Conclusion and Future Works

In this thesis, our focus was on exploring the challenges and opportunities in op-
timizing O-RAN using both heuristic and AI approaches. This chapter provides a
thorough summary of the key contributions that have been made to advance the
field.

7.1 Summary of Achievements


This thesis presents a comprehensive investigation into optimizing VNF splitting and
dynamic network management within the context of O-RAN architecture, leveraging
the capabilities of AI. The motivation stems from the challenges faced by traditional
RANs and the promising opportunities offered by the O-RAN architecture, which
introduces virtualization, disaggregation, and softwarization. Throughout this re-
search, we delve into the motivations, objectives, technical challenges, and key tech-
nical contributions, seeking to address various aspects related to VNF splitting and
dynamic network management. The main objectives revolve around load-balancing
VNF splitting, robust VNF splitting, energy-efficient VNF splitting, and delega-
tion decision-making, each posing unique technical challenges. To address these
challenges, we propose a range of solutions, including innovative O-RAN system
designs, heuristic algorithms, and both centralized and distributed AI approaches.
Extensive evaluations are conducted to showcase the effectiveness and remarkable
performance of the proposed methodologies, ultimately contributing to optimized
VNF splitting, resource utilization, and real-time decision-making at the network
edge. The key contributions are:

1. Extensively explored motivations, objectives, challenges, and contributions re-

91
lated to VNF splitting and dynamic network management within O-RAN ar-
chitecture.

2. Proposed solutions encompass load-balancing VNF splitting, robust VNF split-


ting, energy-efficient VNF splitting, and delegation of decision-making to the
edge network.

3. Utilized various methodologies, including O-RAN system designs, heuristic


algorithms, and centralized/distributed AI approaches.

4. Demonstrated remarkable performance and effectiveness of the proposed ap-


proaches through extensive evaluations.

5. We optimized VNF splitting using an efficient heuristic algorithm, achieving


load balance between CUs and midhaul links for enhanced performance.

6. We proposed robust VNF reconfigurations that minimize reconfiguration over-


head and computational cost. To address the NP-hard nature of the problem,
we employed constrained combinatorial optimization with deep RL. Addition-
ally, we utilized a penalized cost function to make improved VNF placement
decisions under dynamic traffic conditions.

7. We developed an intelligent energy-efficient VNF splitting approach by lever-


aging the Seq2Seq-A2C algorithm. This approach effectively captured long-
term dependencies using LSTM and sequence-to-sequence methods in dynamic
traffic demands, resulting in more informed decisions. The convergence of en-
ergy cost after training surpassed other compared methods.

8. Made significant contributions to decentralized control, optimized resource


utilization, and efficient VNF reconfigurations in O-RAN network slicing sce-
narios.

The findings of this thesis demonstrate the potential of AI-driven optimization


in the context of O-RAN architecture, offering valuable insights for future research
and advancements in dynamic network management and VNF splitting strategies.

7.2 Future Work


The main limitations that I encountered in this research were related to:

92
1. The proposed VNF splitting and dynamic network management approaches
were evaluated using abstract and real network topologies. However, the scala-
bility and performance of these solutions in O-RAN deployments with complex
and heterogeneous network configurations need further investigation.

2. The optimization method for VNF splitting considered a fixed set of network
functions in the chain (f 0 → f 1 → f 2 → f 3). Future research could explore
the extension of the approach to accommodate all eight possible splits.

3. The proposed robust VNF reconfiguration approach utilized deep RL to ap-


proximate optimal policies under dynamic traffic conditions. However, the
training process could be computationally intensive and time-consuming, par-
ticularly for large-scale O-RAN systems. Addressing the scalability and effi-
ciency of the training process is essential for real-world deployment.

4. The intelligent energy-efficient VNF splitting approach integrated the Seq2Seq-


A2C algorithm to capture long-term dependencies in dynamic traffic demands.
Nevertheless, further investigation is required to understand the impact of
varying traffic patterns and traffic loads on the approach’s effectiveness and
adaptability.

For future work, several areas of research can be explored to address these limi-
tations and further enhance the proposed solutions:

1. To enhance the proposed VNF splitting solutions, future research could ex-
plore a more comprehensive approach that considers all possible split options.
Investigating the trade-offs and performance implications of different split con-
figurations can lead to more robust and adaptable VNF placement strategies.

2. Considering the heterogeneous nature of O-RAN nodes in energy consump-


tion modeling is a potential avenue for future research. Developing energy-
aware optimization techniques that account for the varying computational ca-
pabilities and power consumption of different network nodes can lead to more
energy-efficient O-RAN deployments.

3. The federated DRL approach for dynamic VNF splitting presented in the
thesis demonstrates promising results. Future studies could delve deeper into
the exploration of federated learning techniques for other aspects of O-RAN
management, such as resource allocation, interference mitigation, and load
balancing, to further enhance the overall network performance and efficiency.

93
4. The performance evaluation of the proposed approaches primarily focused on
the simulation environment. Future work could involve conducting extensive
field trials and practical deployments to validate the effectiveness and real-
world performance of the proposed solutions in diverse O-RAN scenarios.

94
References

[1] O. Alliance, “O-ran architecture and specifications,” 2020.

[2] S. C. Forum, “Small cell virtualization: Functional splits and use cases,” 2016.

[3] M. A. Marsan and et al., “Towards zero grid electricity networking: Powering
bss with renewable energy sources,” in ICC workshops 2013-IEEE, Budapest,
Hungary, pp. 596–601, IEEE, 2013.

[4] F. W. Murti, J. A. Ayala-Romero, A. Garcia-Saavedra, X. Costa-Pérez, and


G. Iosifidis, “An optimal deployment framework for multi-cloud virtualized
radio access networks,” IEEE Transactions on Wireless Communications,
vol. 20, no. 4, pp. 2251–2265, 2020.

[5] M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “Understanding


o-ran: Architecture, interfaces, algorithms, security, and research challenges,”
arXiv preprint arXiv:2202.01032, 2022.

[6] E. Amiri, N. Wang, M. Shojafar, and R. Tafazolli, “Optimizing virtual network


function splitting in open-ran environments,” in 2022 IEEE 47th Conference
on Local Computer Networks (LCN), pp. 422–429, IEEE, 2022.

[7] E. Amiri, N. Wang, M. Shojafar, and R. Tafazolli, “Energy-aware dynamic vnf


splitting in o-ran using deep reinforcement learning,” IEEE Wireless Commu-
nications Letters, 2023.

[8] E. Amiri, N. Wang, M. Shojafar, M. Q. Hamdan, C. H. Foh, and R. Tafazolli,


“Deep reinforcement learning for robust vnf reconfigurations in o-ran,” IEEE
Transactions on Network and Service Management, 2023.

[9] M. Q. Hamdan, H. Lee, D. Triantafyllopoulou, R. Borralho, A. Kose, E. Amiri,


D. Mulvey, W. Yu, R. Zitouni, R. Pozza, et al., “Recent advances in machine

95
learning for network automation in the o-ran,” Sensors, vol. 23, no. 21, p. 8792,
2023.

[10] T. O. Olwal, K. Djouani, and A. M. Kurien, “A survey of resource management


toward 5g radio access networks,” IEEE Communications Surveys & Tutorials,
vol. 18, no. 3, pp. 1656–1686, 2016.

[11] V. S. Pana, O. P. Babalola, and V. Balyan, “5G radio access networks: A


survey,” Array, vol. 14, p. 100170, 2022.

[12] S. K. Singh, R. Singh, and B. Kumbhani, “The evolution of radio access net-
work towards open-ran: Challenges and opportunities,” in 2020 IEEE Wireless
Communications and Networking Conference Workshops (WCNCW), Seoul,
Korea (South), pp. 1–6, IEEE, 2020.

[13] A. Arnaz, J. Lipman, M. Abolhasan, and M. Hiltunen, “Towards integrating


intelligence and programmability in open radio access networks: A compre-
hensive survey,” IEEE Access, 2022.

[14] S. T. Arzo, R. Bassoli, F. Granelli, and F. H. Fitzek, “Study of virtual network


function placement in 5g cloud radio access network,” IEEE Transactions on
Network and Service Management, vol. 17, no. 4, pp. 2242–2259, 2020.

[15] Z. Piao, M. Peng, Y. Liu, and M. Daneshmand, “Recent advances of edge cache
in radio access networks for internet of things: techniques, performances, and
challenges,” IEEE Internet of Things Journal, vol. 6, no. 1, pp. 1010–1028,
2018.

[16] O. R. Alliance, “O-ran: towards an open and smart ran,” White paper, vol. 19,
2018.

[17] E. Amiri, N. Wang, S. Vural, and R. Tafazolli, “Dynamic anchor point se-
lection in software defined distributed mobility management,” in 2022 IEEE
Symposium on Computers and Communications (ISCC), pp. 1–7, IEEE, 2022.

[18] E. Amiri, N. Wang, S. Vural, and R. Tafazolli, “Hsd-dmm: Hierarchical soft-


ware defined distributed mobility management,” in 2021 IEEE 20th Interna-
tional Symposium on Network Computing and Applications (NCA), pp. 1–7,
IEEE, 2021.

96
[19] E. Amiri, N. Wang, S. Vural, and R. Tafazolli, “Hsd-dmm: Hierarchical soft-
ware defined distributed mobility management,” in 2021 IEEE 20th Interna-
tional Symposium on Network Computing and Applications (NCA), pp. 1–7,
IEEE, 2021.

[20] E. Amiri, E. Alizadeh, and M. H. Rezvani, “Optimized controller placement


for software defined wide area networks,” in 2021 7th International Conference
on Web Research (ICWR), pp. 216–221, IEEE, 2021.

[21] E. Amiri, E. Alizadeh, and M. H. Rezvani, “Controller selection in software


defined networks using best-worst multi-criteria decision-making,” Bulletin of
Electrical Engineering and Informatics, vol. 9, no. 4, pp. 1506–1517, 2020.

[22] K. R. Lejjy, E. Amiri, E. Alizadeh, and M. H. Rezvani, “A game theory-


based mechanism to optimize the traffic congestion in vanets,” in 2020 6th
International Conference on Web Research (ICWR), pp. 217–222, IEEE, 2020.

[23] E. Amiri and R. Hooshmand, “Retracted article: Improved aodv based on


topsis and fuzzy algorithms in vehicular ad-hoc networks,” Wireless Personal
Communications, vol. 111, pp. 947–961, 2020.

[24] E. Amiri and R. Javidan, “A new method for layer 2 loop prevention in soft-
ware defined networks,” Telecommunication Systems, vol. 73, no. 1, pp. 47–57,
2020.

[25] I. Gholizdeh, E. Amiri, and R. Javidan, “An efficient key distribution mecha-
nism for large scale hierarchical wireless sensor networks,” in 2019 27th Iranian
Conference on Electrical Engineering (ICEE), pp. 1553–1559, IEEE, 2019.

[26] E. Amiri and R. Hooshmand, “Improving aodv with topsis algorithm and fuzzy
logic in vanets,” in 2019 27th Iranian Conference on Electrical Engineering
(ICEE), pp. 1367–1372, IEEE, 2019.

[27] E. Amiri, E. Alizadeh, and K. Raeisi, “An efficient hierarchical distributed sdn
controller model,” in 2019 5th Conference on Knowledge Based Engineering
and Innovation (KBEI), pp. 553–557, IEEE, 2019.

[28] E. Amiri, M. R. Hashemi, and K. R. Lejjy, “Policy-based routing in rip-hybrid


network with sdn controller,”

97
[29] E. Amiri, E. Alizadeh, and K. R. Lejjy, “Loop free clusters in layer 2 hybrid
software defined networks,”

[30] K. R. Lejjy, E. Alizadeh, and E. Amiri, “A game theory approach for data
dissemination in vehicular ad-hoc network,” EMCE, 2018.

[31] A. G. Dalla-Costa, L. Bondan, J. A. Wickboldt, C. B. Both, and L. Z.


Granville, “Orchestra: A customizable split-aware nfv orchestrator for dy-
namic cloud radio access networks,” IEEE Journal on Selected Areas in Com-
munications, vol. 38, no. 6, pp. 1014–1024, 2020.

[32] F. Ben Jemaa, G. Pujolle, and M. Pariente, “Analytical models for qos-driven
vnf placement and provisioning in wireless carrier cloud,” in Proceedings of the
19th ACM International Conference on Modeling, Analysis and Simulation of
Wireless and Mobile Systems, pp. 148–155, 2016.

[33] R. Riggio, A. Bradai, T. Rasheed, J. Schulz-Zander, S. Kuklinski, and


T. Ahmed, “Virtual network functions orchestration in wireless networks,”
in 2015 11th International conference on network and service management
(CNSM), pp. 108–116, IEEE, 2015.

[34] F. Z. Morais, G. M. de Almeida, L. Pinto, K. V. Cardoso, L. M. Contreras,


R. d. R. Righi, and C. B. Both, “Placeran: Optimal placement of virtual-
ized network functions in the next-generation radio access networks,” arXiv
preprint arXiv:2102.13192, 2021.

[35] D. Bhamare, A. Erbad, R. Jain, M. Zolanvari, and M. Samaka, “Efficient


virtual network function placement strategies for cloud radio access networks,”
Computer Communications, vol. 127, pp. 50–60, 2018.

[36] I. Koutsopoulos, “Optimal functional split selection and scheduling policies in


5g radio access networks,” in 2017 IEEE International Conference on Com-
munications Workshops (ICC Workshops), pp. 993–998, IEEE, 2017.

[37] J. Pérez-Romero, O. Sallent, A. Gelonch, X. Gelabert, B. Klaiqi, M. Kahn,


and D. Campoy, “A tutorial on the characterisation and modelling of low layer
functional splits for flexible radio access networks in 5g and beyond,” IEEE
Communications Surveys & Tutorials, 2023.

98
[38] A. M. Alba and W. Kellerer, “A dynamic functional split in 5g radio access
networks,” in 2019 IEEE Global Communications Conference (GLOBECOM),
pp. 1–6, IEEE, 2019.

[39] B. Németh, N. Molner, J. Martı́n-Pérez, C. J. Bernardos, A. De la Oliva, and


B. Sonkoly, “Delay and reliability-constrained vnf placement on mobile and
volatile 5g infrastructure,” IEEE Transactions on Mobile Computing, vol. 21,
no. 9, pp. 3150–3162, 2021.

[40] N. Kazemifard and V. Shah-Mansouri, “Minimum delay function placement


and resource allocation for open ran (o-ran) 5g networks,” Computer Networks,
vol. 188, p. 107809, 2021.

[41] A. Alleg, T. Ahmed, M. Mosbah, R. Riggio, and R. Boutaba, “Delay-aware


vnf placement and chaining based on a flexible resource allocation approach,”
in 2017 13th international conference on network and service management
(CNSM), pp. 1–7, ieee, 2017.

[42] Y. L. Lee, D. Qin, L.-C. Wang, and G. H. Sim, “6g massive radio access net-
works: Key applications, requirements and challenges,” IEEE Open Journal
of Vehicular Technology, vol. 2, pp. 54–66, 2020.

[43] L. Valcarenghi, K. Kondepu, F. Giannone, and P. Castoldi, “Requirements for


5g fronthaul,” in 2016 18th International Conference on Transparent Optical
Networks (ICTON), pp. 1–5, IEEE, 2016.

[44] J. Yusupov, A. Ksentini, G. Marchetto, and R. Sisto, “Multi-objective function


splitting and placement of network slices in 5g mobile networks,” in 2018 IEEE
Conference on Standards for Communications and Networking (CSCN), pp. 1–
6, IEEE, 2018.

[45] L. M. M. Zorello, L. Bliek, S. Troia, T. Guns, S. Verwer, and G. Maier,


“Baseband-function placement with multi-task traffic prediction for 5g radio
access networks,” IEEE Transactions on Network and Service Management,
vol. 19, no. 4, pp. 5104–5119, 2022.

[46] T. Pamuklu, S. Mollahasani, and M. Erol-Kantarci, “Energy-efficient and


delay-guaranteed joint resource allocation and du selection in o-ran,” in 2021
IEEE 4th 5G World Forum (5GWF), pp. 99–104, IEEE, 2021.

99
[47] D. Spatharakis, I. Dimolitsas, D. Dechouniotis, G. Papathanail, I. Fotoglou,
P. Papadimitriou, and S. Papavassiliou, “A scalable edge computing archi-
tecture enabling smart offloading for location based services,” Pervasive and
Mobile Computing, vol. 67, p. 101217, 2020.

[48] F. W. Murti, S. Ali, G. Iosifidis, and M. Latva-aho, “Deep reinforcement learn-


ing for orchestrating cost-aware reconfigurations of vrans,” IEEE Transactions
on Network and Service Management, 2023.

[49] L. M. Larsen, A. Checko, and H. L. Christiansen, “A survey of the functional


splits proposed for 5g mobile crosshaul networks,” IEEE Communications Sur-
veys & Tutorials, vol. 21, no. 1, pp. 146–172, 2018.

[50] J. Xie, F. R. Yu, T. Huang, R. Xie, J. Liu, C. Wang, and Y. Liu, “A survey
of machine learning techniques applied to software defined networking (sdn):
Research issues and challenges,” IEEE Communications Surveys & Tutorials,
vol. 21, no. 1, pp. 393–430, 2018.

[51] F. W. Murti, S. Ali, and M. Latva-aho, “Constrained deep reinforcement


based functional split optimization in virtualized rans,” IEEE Transactions
on Wireless Communications, 2022.

[52] H. Zhang, H. Zhou, and M. Erol-Kantarci, “Federated deep reinforce-


ment learning for resource allocation in o-ran slicing,” arXiv preprint
arXiv:2208.01736, 2022.

[53] X. Wang, J. D. Thomas, R. J. Piechocki, S. Kapoor, R. Santos-Rodrı́guez, and


A. Parekh, “Self-play learning strategies for resource assignment in open-ran
networks,” Computer Networks, vol. 206, p. 108682, 2022.

[54] R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, “The lstm-based advantage
actor-critic learning for resource management in network slicing with user
mobility,” IEEE Communications Letters, vol. 24, no. 9, pp. 2005–2009, 2020.

[55] T. Pamuklu, M. Erol-Kantarci, and C. Ersoy, “Reinforcement learning based


dynamic function splitting in disaggregated green open rans,” in ICC 2021-
IEEE International Conference on Communications, pp. 1–6, IEEE, 2021.

[56] R. Joda, T. Pamuklu, P. E. Iturria-Rivera, and M. Erol-Kantarci, “Deep rein-


forcement learning-based joint user association and cu-du placement in o-ran,”
IEEE Transactions on Network and Service Management, 2022.

100
[57] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial
optimization with reinforcement learning,” arXiv preprint arXiv:1611.09940,
2016.

[58] M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, and L.-M. Rousseau,


“Learning heuristics for the tsp by policy gradient,” in International confer-
ence on the integration of constraint programming, artificial intelligence, and
operations research, pp. 170–181, Springer, 2018.

[59] A. Mirhoseini, H. Pham, Q. V. Le, B. Steiner, R. Larsen, Y. Zhou, N. Kumar,


M. Norouzi, S. Bengio, and J. Dean, “Device placement optimization with
reinforcement learning,” in International Conference on Machine Learning,
pp. 2430–2439, PMLR, 2017.

[60] A. Ferdowsi, M. A. Abd-Elmagid, W. Saad, and H. S. Dhillon, “Neural com-


binatorial deep reinforcement learning for age-optimal joint trajectory and
scheduling design in uav-assisted networks,” IEEE Journal on Selected Areas
in Communications, vol. 39, no. 5, pp. 1250–1265, 2021.

[61] Q. Jiang, Y. Zhang, and J. Yan, “Neural combinatorial optimization for


energy-efficient offloading in mobile edge computing,” IEEE Access, vol. 8,
pp. 35077–35089, 2020.

[62] R. Solozabal, J. Ceberio, A. Sanchoyerto, L. Zabala, B. Blanco, and F. Liberal,


“Virtual network function placement optimization with deep reinforcement
learning,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 2,
pp. 292–303, 2019.

[63] T. Tsourdinis, I. Chatzistefanidis, N. Makris, and T. Korakis, “Ai-driven


service-aware real-time slicing for beyond 5g networks,” in IEEE INFOCOM
2022-IEEE Conference on Computer Communications Workshops (INFO-
COM WKSHPS), pp. 1–6, IEEE, 2022.

[64] A. Garcia-Saavedra and X. Costa-Perez, “O-ran: Disrupting the virtualized


ran ecosystem,” IEEE Communications Standards Magazine, vol. 5, no. 4,
pp. 96–103, 2021.

[65] T. D. Tran, K.-K. Nguyen, and M. Cheriet, “Joint route selection and content
caching in o-ran architecture,” in 2022 IEEE Wireless Communications and
Networking Conference (WCNC), pp. 2250–2255, IEEE, 2022.

101
[66] C. C. Zhang, K. K. Nguyen, and M. Cheriet, “Joint routing and packet
scheduling for urllc and embb traffic in 5g o-ran,” in ICC 2022-IEEE In-
ternational Conference on Communications, pp. 1900–1905, IEEE, 2022.

[67] M. Polese, L. Bonati, S. D’oro, S. Basagni, and T. Melodia, “Understanding


o-ran: Architecture, interfaces, algorithms, security, and research challenges,”
IEEE Communications Surveys & Tutorials, 2023.

[68] G. Kougioumtzidis, V. Poulkov, Z. D. Zaharis, and P. I. Lazaridis, “Intelligent


and qoe-aware open radio access networks,” in 2022 3rd URSI Atlantic and
Asia Pacific Radio Science Meeting (AT-AP-RASC), pp. 1–4, IEEE, 2022.

[69] N. Hammami and K. K. Nguyen, “On-Policy vs. Off-Policy Deep Reinforce-


ment Learning for Resource Allocation in Open Radio Access Network,” in
2022 IEEE Wireless Communications and Networking Conference (WCNC),
pp. 1461–1466, 2022. https://ieeexplore.ieee.org/document/9771605.

[70] M. Sharara, T. Pamuklu, S. Hoteit, V. Vèque, and M. Erol-Kantarci, “Policy-


gradient-based reinforcement learning for computing resources allocation in
o-ran,” in 2022 IEEE 11th International Conference on Cloud Networking
(CloudNet), pp. 229–236, 2022.

[71] F. Mungari, “An rl approach for radio resource management in the o-


ran architecture,” in 2021 18th Annual IEEE International Conference
on Sensing, Communication, and Networking (SECON), pp. 1–2, 2021.
https://ieeexplore.ieee.org/document/9491579.

[72] J. A. Ayala-Romero, A. Garcia-Saavedra, X. Costa-Perez, and G. Iosifidis,


“Bayesian online learning for energy-aware resource orchestration in virtual-
ized rans,” in IEEE INFOCOM 2021 - IEEE Conference on Computer Com-
munications, pp. 1–10, 2021. https://ieeexplore.ieee.org/document/9488845.

[73] H. Gupta, M. Sharma, B. R. Tamma, et al., “Apt-ran: A flexible split-based


5g ran to minimize energy consumption and handovers,” IEEE Transactions
on Network and Service Management, vol. 17, no. 1, pp. 473–487, 2019.

[74] F. Z. Morais, G. M. F. De Almeida, L. L. Pinto, K. Cardoso, L. M. Contreras,


R. da Rosa Righi, and C. B. Both, “Placeran: optimal placement of virtualized
network functions in beyond 5g radio access networks,” IEEE Transactions on
Mobile Computing, 2022.

102
[75] C. U. Manual, “Ibm ilog cplex optimization studio,” Version, vol. 12, no. 1987-
2018, p. 1, 1987.

[76] R. Singh, C. Hasan, X. Foukas, M. Fiore, M. K. Marina, and Y. Wang,


“Energy-efficient orchestration of metro-scale 5g radio access networks,”
in IEEE INFOCOM 2021-IEEE Conference on Computer Communications,
pp. 1–10, IEEE, 2021.

[77] M. K. Motalleb, V. Shah-Mansouri, S. Parsaeefard, and O. L. A. López, “Re-


source allocation in an open ran system using network slicing,” IEEE Trans-
actions on Network and Service Management, vol. 20, no. 1, pp. 471–485,
2022.

[78] E. Sarikaya and E. Onur, “Placement of 5g ran slices in multi-tier o-ran 5g


networks with flexible functional splits,” in 2021 17th International Conference
on Network and Service Management (CNSM), pp. 274–282, IEEE, 2021.

[79] L. G. Optimization et al., “Gurobi optimizer reference manual, 2020,” URL


http://www. gurobi. com, vol. 12, 2020.

[80] M. Kalntis and G. Iosifidis, “Energy-aware scheduling of virtualized base sta-


tions in o-ran with online learning,” in GLOBECOM 2022-2022 IEEE Global
Communications Conference, pp. 6048–6054, IEEE, 2022.

[81] B. H. Prananto, A. Kurniawan, et al., “O-ran intelligent application for cellular


mobility management,” in 2022 International Conference on ICT for Smart
Society (ICISS), pp. 01–06, IEEE, 2022.

[82] M. Di Mauro, G. Galatro, M. Longo, F. Postiglione, and M. Tambasco, “Avail-


ability analysis of ip multimedia subsystem in cloud environments,” in 2019 4th
International Conference on System Reliability and Safety (ICSRS), pp. 111–
115, IEEE, 2019.

[83] M. Dryjański, L. Kulacz, and A. Kliks, “Toward modular and flexible open ran
implementations in 6g networks: Traffic steering use case and o-ran xapps,”
Sensors, vol. 21, no. 24, p. 8173, 2021.

[84] H. Erdol, X. Wang, P. Li, J. D. Thomas, R. Piechocki, G. Oikonomou, R. Ina-


cio, A. Ahmad, K. Briggs, and S. Kapoor, “Federated meta-learning for traf-
fic steering in o-ran,” in 2022 IEEE 96th Vehicular Technology Conference
(VTC2022-Fall), pp. 1–7, IEEE, 2022.

103
[85] M. W. Akhtar, A. Mahmood, S. F. Abedin, S. A. Hassan, and M. Gidlund,
“Exploiting noma for radio resource efficient traffic steering use-case in o-
ran,” in GLOBECOM 2022-2022 IEEE Global Communications Conference,
pp. 5771–5776, IEEE, 2022.

[86] F. Kavehmadavani, V.-D. Nguyen, T. X. Vu, and S. Chatzinotas, “Traffic


steering for embb and urllc coexistence in open radio access networks,” in
2022 IEEE International Conference on Communications Workshops (ICC
Workshops), pp. 242–247, IEEE, 2022.

[87] N. Nikaein, “Processing radio access network functions in the cloud: Critical
issues and modeling,” in Proceedings of the 6th International Workshop on
Mobile Cloud Computing and Services, pp. 36–43, 2015.

[88] U. Dötsch, M. Doll, H.-P. Mayer, F. Schaich, J. Segel, and P. Sehier, “Quan-
titative analysis of split base station processing and determination of advan-
tageous architectures for lte,” Bell Labs Technical Journal, vol. 18, no. 1,
pp. 105–128, 2013.

[89] K. C. Garikipati, K. Fawaz, and K. G. Shin, “Rt-opex: Flexible scheduling for


cloud-ran processing,” in Proceedings of the 12th International on Conference
on emerging Networking EXperiments and Technologies, pp. 267–280, 2016.

[90] A. M. Alba, J. H. G. Velásquez, and W. Kellerer, “An adaptive functional split


in 5g networks,” in IEEE INFOCOM 2019-IEEE Conference on Computer
Communications Workshops (INFOCOM WKSHPS), pp. 410–416, IEEE,
2019.

[91] A. Garcia-Saavedra, J. X. Salvat, X. Li, and X. Costa-Perez, “Wizhaul: On


the centralization degree of cloud ran next generation fronthaul,” IEEE Trans-
actions on Mobile Computing, vol. 17, no. 10, pp. 2452–2466, 2018.

[92] N. Mharsi, M. Hadji, D. Niyato, W. Diego, and R. Krishnaswamy, “Scal-


able and cost-efficient algorithms for baseband unit (bbu) function split place-
ment,” in 2018 IEEE Wireless Communications and Networking Conference
(WCNC), pp. 1–6, IEEE, 2018.

[93] O. Arouk, T. Turletti, N. Nikaein, and K. Obraczka, “Cost optimization of


cloud-ran planning and provisioning for 5g networks,” in 2018 IEEE Interna-
tional Conference on Communications (ICC), pp. 1–6, IEEE, 2018.

104
[94] N. Mharsi and M. Hadji, “Edge computing optimization for efficient rrh-bbu
assignment in cloud radio access networks,” Computer Networks, vol. 164,
p. 106901, 2019.

[95] L. Wang and S. Zhou, “Flexible functional split and power control for en-
ergy harvesting cloud radio access networks,” IEEE Transactions on Wireless
Communications, vol. 19, no. 3, pp. 1535–1548, 2019.

[96] H. Gupta and A. Franklin, A flexible split based 5g c-ran to minimize en-
ergy consumption & handovers. PhD thesis, Indian institute of technology
Hyderabad, 2019.

[97] “Everything you need to know about open ran: An e-book,” 2020. Available
At: www.parallelwireless.com.

[98] G. T. 38.801, “Study on new radio access technology; radio access architecture
and interfaces,” V2.0.0, 2017.

[99] A. Marı́n, “The discrete facility location problem with balanced allocation of
customers,” European Journal of Operational Research, vol. 210, no. 1, pp. 27–
38, 2011.

[100] F. W. Murti, S. Ali, and M. Latva-aho, “Deep reinforcement based optimiza-


tion of function splitting in virtualized radio access networks,” in 2021 IEEE
International Conference on Communications Workshops (ICC Workshops),
pp. 1–6, IEEE, 2021.

[101] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-


world’networks,” nature, vol. 393, no. 6684, pp. 440–442, 1998.

[102] O.-R. Alliance, “O-ran working group 2 (non-rt ric and a1 interface wg) non-rt
ric architecture (o-ran.wg2.non-rt-ric-arch-ts-v02.01),” Technical Specification,
2022.

[103] M. M. Akbar, M. S. Rahman, M. Kaykobad, E. G. Manning, and G. C. Shoja,


“Solving the multidimensional multiple-choice knapsack problem by construct-
ing convex hulls,” Computers & operations research, vol. 33, no. 5, pp. 1259–
1273, 2006.

[104] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with


neural networks,” Advances in neural information processing systems, vol. 27,
2014.

105
[105] Y. Hua, Z. Zhao, R. Li, X. Chen, Z. Liu, and H. Zhang, “Deep learning with
long short-term memory for time series prediction,” IEEE Communications
Magazine, vol. 57, no. 6, pp. 114–119, 2019.

[106] R. Solozabal, J. Ceberio, and M. Takáč, “Constrained combinatorial optimiza-


tion with reinforcement learning,” arXiv preprint arXiv:2006.11984, 2020.

[107] C. Lemaréchal, “Lagrangian relaxation,” in Computational combinatorial op-


timization, pp. 112–156, Springer, 2001.

[108] C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy


optimization,” arXiv preprint arXiv:1805.11074, 2018.

[109] A. Garcia-Saavedra, G. Iosifidis, X. Costa-Perez, and D. J. Leith, “Joint op-


timization of edge computing architectures and radio access networks,” IEEE
Journal on Selected Areas in Communications, vol. 36, no. 11, pp. 2433–2443,
2018.

[110] “Energy consumption of ict,” White Paper, 01 September, 2022.

[111] T. Barnett, S. Jain, U. Andra, and T. Khurana, “Cisco visual networking


index (vni) complete forecast update, 2017–2022,” Americas/EMEAR Cisco
Knowledge Network (CKN) Presentation, pp. 1–30, 2018.

[112] C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless
networking: A survey,” IEEE Communications surveys & tutorials, vol. 21,
no. 3, pp. 2224–2287, 2019.

[113] G. Auer and et al., “How much energy is needed to run a wireless network?,”
IEEE wireless communications, vol. 18, no. 5, pp. 40–49, 2011.

[114] A. Marotta and et al., “On the energy cost of robustness for green virtual
network function placement in 5G virtualized infrastructures,” Computer Net-
works, vol. 125, pp. 64–75, 2017.

[115] K. P. Murphy, Probabilistic machine learning: an introduction. MIT press,


2022.

[116] A. Graves, “Long short-term memory,” Supervised sequence labelling with re-
current neural networks, pp. 37–45, 2012.

106
[117] S. Deng and et al., “Edge intelligence: The confluence of edge computing
and artificial intelligence,” IEEE Internet of Things Journal, vol. 7, no. 8,
pp. 7457–7469, 2020.

[118] Z. Zhou and et al., “Edge intelligence: Paving the last mile of artificial in-
telligence with edge computing,” Proceedings of the IEEE, vol. 107, no. 8,
pp. 1738–1762, 2019.

[119] L. Bonati and et al., “Intelligence and learning in o-ran for data-driven nextg
cellular networks,” IEEE Communications Magazine, vol. 59, no. 10, pp. 21–
27, 2021.

[120] G. M. Almeida and et al., “Optimal joint functional split and network function
placement in virtualized ran with splittable flows,” IEEE Wireless Communi-
cations Letters, vol. 11, no. 8, pp. 1684–1688, 2022.

[121] X. Wang, A. Alabbasi, and C. Cavdar, “Interplay of energy and bandwidth


consumption in cran with optimal function split,” in 2017 IEEE International
Conference on Communications (ICC), (Paris, France), pp. 1–6, May 2017.

107

You might also like