Written on 03/06/2022 by IPSIENNE students
Faculty of Sciences of Rabat - Mohammed Vy
Master IPS
An expedition into the world of 5G
Big data Analysis for the next Generation
a 5G dataset with channel and context metrics
Abstract :
The fifth-generation 5G was introduced to the world after a significant growth of data traffic and the
discovery of so many limitations of the long-term evolution, also named fourth-generation 4G. On the
one hand, these limitations concern the latency and the size of the bandwidth for downloading,
uploading, and streaming videos online and other features; on the other hand, the need for the society
to go forward for a better future and aim for a total transformation of the industries and business.
Relying on the dataset gathered from the Irish mobile operator consisting of special metrics, including
channel-related metrics, context-related metrics, cell-related metrics, and throughput information.
Therefore, thanks to G-NetTrack Pro, these measurements are also produced by supplying a large 5G
ns3 multicellular simulation framework. The results show that 5G is 100 faster than 4G in some
features (e.g., frequency band and ultra-low latency), creating unprecedented opportunities for
consumers and businesses. On the whole, this new generation is promising. In addition, our dataset
shows its good performance using a cellular key performance. 5G is going to improve our connected
lives and introduce further innovations, consequently preparing the path for a newest generation that
is 6G.
Keywords : 5G ,LTE,6G,Smart City,
Written on 03/06/2022 by IPSIENNE students
Introduction :
Since the 1st generation appeared in the an increase in throughput, the number of
1970s, these technologies have evolved connected devices and a reduction in
very strongly, allowing more connectivity latency, which will give rise to new use
around the world. This evolution has cases and scenarios.
allowed network users to navigate faster Despite the development of technology and
than before and has pushed people to think the arrival of 4G, the world still needs a high
about developing new technologies, digital quality of download and transfer capacity,
activities and to better exploit them. The and this is the role of 5G, whose download
internet revolution has exploded over time quality will be ten times or 100 times better
a large mass of complex data that is difficult than before, and also to download a two-
to load or analyze by weak technology, it hour movie it takes 1 hour on a 4G network,
has become essential to develop network unlike in 5G the same movie it takes only 6
technologies. For several years the set of minutes, that's why 4G is no longer good
mobile networks continues to increase, the enough for the reason of big data. In 2020,
main standard that uses the 2G is the GSM. 5G has been activated in some countries
Unlike 1G, the second- generation standard such as China, South Correa and USA and
allows access to various services, such as some cities in Europe, but it will still witness
the use of WAP to access the Internet, so a massive spread soon, but despite this
the third generation, called 3G, allows a slight spread, it has caused a massive
high speed for Internet access and data development of big data, especially in video
transfer. As for the fourth generation 4G streaming services such as Netflix and
(LTE), it allows very high speeds and Amazon Prime.
regarding the new generation 5G allows for
Written on 03/06/2022 by IPSIENNE students
Problematic:
The dependence on Internet connectivity is large data at fast speeds. Moreover, in the
the main reason for the development of 4G future, the need for more data will continue
after 1G, 2G and 3G. Now, the increased to grow as the number of IoT devices
demand for high-speed data transfer is increases rapidly.
driving the need for 5G technology. to
Then , 5G technology has undoubtedly
minimize congestion during connectivity,
matured over the past two years. he will
network speed is very important, hence the
have lower latency, faster download speed
emergence of the next generation network.
and can connect the Internet of Things
And there will always be a need for 6G … and
globally. The possibilities offered by 5G will
more.
revolutionize the way the world works, this
Our case study will be based on 5G, which is the idea in general, we will try to verify and
requires a better understanding of the discover more details through our analysis
importance of 5G technology, we must to simulate these effects . this is done by
understand what distinguishes the answering questions such as :
transition to 5G from previous generational
What is going to be different than 4G? What
leaps especially the 4g (LTE), also it is
gave us the need for this evolution? What
necessary to know what comes in the future
evolutions will it bring? and what
(the 6G), starting from the idea that to
challenges will it pose?
analyze a phase (n) we are supposed to
analyze the phase that precedes it, the (n-1) Finally, the 6G networks have not yet been
to simulate well the reason of the change, introduced, many studies are already
also it is necessary to know the future phase looking at the implementation of the
the (n+1) to see the limits of our phase of network in the future, reports say that the
study and the next requirements. network will speed up many development
processes and save considerable time
Firstly , the existing 4G networks cannot
compared to previous networks.
meet the demands of consumers to transfer
Keywords : lower latency, faster download speed, transfer large data at fast speeds
Written on 03/06/2022 by IPSIENNE students
The theoretical phase :
Technologies From 4 g to 6g :
4G ou la LTE
Given the daily growth of Internet traffic on our phones, the quality of service provided by the 3G
infrastructure was no longer sufficient to meet the needs of customers, so in late 2010 an alternative
called Fourth Generation or LTE (Long Term Evolution) was born. The network was to offer its users a
navigation 10 times faster than 3G+ networks, and higher speeds than ADSL. This 4G LTE technology is
differentiated by the speed of downloading data or watching a movie (download), and it is also
excellent for sending data (upload) The technology is based on multiplexing (where the same channel
allows different types of information to pass) Technically speaking LTE is divided into two hertz bands,
frequencies around 800 MHz and frequencies around 2600 MHz.
5G
5G is the fifth generation of cellular networks. 5G is 100 times faster than 4G, creating unprecedented
opportunities for consumers and businesses. Faster connections, ultra-low latency and higher
bandwidth are helping to drive society forward, transform industries and dramatically improve
everyday experiences. Services we once thought were the future, such as e-health, connected vehicles,
connected traffic management systems, advanced mobile cloud gaming, have arrived. With 5G
technology, we can help create a smarter, safer, and more sustainable future. Today, 5G is
revolutionizing our business and personal lives by enabling new use cases such as connected cars,
augmented reality, video, and augmented gaming.
The limits of 5G:
Building 5G-enabled mobile devices has a real impact on our planet, and most of the materials are
mined. Another problem with the technology: it could have health effects. Indeed, prolonged exposure
to electromagnetic waves can be harmful to health, according to the World Health Organization, which
classifies 5G as a carcinogenic risk
6th Generation :
After 5G the world of telecommunication did not stop, but researchers continued to research and
develop to arrive at 6G around 2030, this new generation that gives more promises than 5G, for
example the speed should reach 10 to 11 Gbps, the response time should be divided by 10 and it will
use as technology 5G+Sattelite, which means we could download 100 hours of NETFLIX only in one
second
Written on 03/06/2022 by IPSIENNE students
In our future way of life, 6G applications will need more network capacity than 5G networks, and
wireless networks will be the link between humans and machines.
This new generation will improve communication allowing everyone to communicate anytime,
anywhere.
For the uses of 6G we find: sustainable development, local trustzone, robots to cobots, telepresence,
massive twinning …
Generation/Features 4G 5G 6G
Year 2000-2010s 2015 onwards After 5G
Speed 200Mbps to 1Gbps 1Gbps and Higher 10 to 11Gbps
Technology Unified IP & 4G+WWWW 5G+satellite
seamless
combination of
broadband LAN,
WAN, WLAN, PAN
Standard LTE, WiMAX LAS-CDMA, OFDM, GPS,COMPAS S,
MC-CDMA, UWB, GLONASS, Galileo
Network-LMD S, systems
IPv6
Multiplexing CDMA CDMA CDMA
Switching Packet Packet Packet
Core Network Internet Internet Internet
Handoff Horizontal & Vertical Horizontal & Vertical Horizontal & Vertical
Services Dynamic Dynamic Ultra fast Internet
information access, information access, access
wearable devices wearable devices
5G to Strengthen Big Data's Role
➢ 5 GB will make connections faster which will greatly increase its capacity as big data deals with very
large data sets
➢ When we use the Internet, data is growing every second at a very high rate, driving the value of the
Internet market up to $690 billion and expected to reach $1.3 trillion by 2025.
➢ Currently, speed and latency constraints mean that IoT devices must rely on their own internal
processors and memory. With 5G, it will be possible to do most computing in the cloud, make IoT
devices cheaper and enable big data to an unprecedented degree.
➢ The ability to collect data will not increase with the increase in smart devices, but rather with the
increasing speeds that enable rapid data collection.
Written on 03/06/2022 by IPSIENNE students
➢ Currently, large files take only a few seconds to download. Imagine the magnitude of change you'll
see in the data world when 5GB appears. The download capacity will be ten times faster or 100 times
faster than before
How 5G and Big Data work for IoT :
5G technology and Big Data are 2 key elements of the Internet of Things revolution. Because IoT
devices exist to produce actionable data. Collecting a large amount of data from different sources
requires high speed, so it is necessary to develop efficient means of collection, storage and analysis in
real time. Therefore, 5G supports many devices and sensors simultaneously, and collect their data.
Previous technologies were focused on centralization in the cloud, while 5G is focused on data
processing at the edge of the network. 5G will bring changes with the data which will develop the IoT.
Thanks to the increase of fast and immediate connections, which will allow the loading and collection
of large volumes of data. So 5G will allow very fast and immediate connectivity with IoT devices. So the
operation of Big Data with 5G allows the development of the IoT field
Big Data Processing
What is Big Data?
Big data is the new evolutional term nowadays, meaning voluminous, massive, varied and
complex data structure that should be analyzed and visualized for future goals and processes.
What's the 5'V ?
Volume : how much data was gather, generated and stored from different sources.
Variety : it’s the flow of continuous new unstructured data
Velocity : how speed was data gathered and acted upon it
Value : data is always valuable however it was, therefore multiple quantitative techniques
have been on use.
Veracity : it implies the truth concept of a research study.
Written on 03/06/2022 by IPSIENNE students
Life cycle of a Big Data Analytics :
Big data analytics life cycle follows these 9 steps :
Definition of Business Problem: first, the team learns about the field of activity that outlines
the key goal of analysis. Then the issue is identified as a challenge that can be treated in the
following stages. In the addition, it may be considered as Big data problem (5V characteristics)
or not , considering the business requirement in that case.
Data Identification / Definition: After that the problem is identified, it’s time to find the right
dataset to use. This step depends on the source of datasets either external (from third-party
providers) or internal (from the company itself/feedback forms), all that based on the scope
of the project analysis.
Data Acquisition and filtration: Now that data was identified and being collected from the
different sources, all treatment starts at this stage, like filtration that means clearing data from
corrupted and irrelevant data (missing records- incompatible data- no impact of the focus).
The new dataset is compressed to be of good use in the future.
Data extraction: After filtration part, the compatible data with our scope objectives are
extracted and transformed.
Data munging: knowing that the data is gathered from different sources which made her
unstructured, that’s why it can lead to unwanted and unsuitable results. There is a need
validate it by establishing complex validation rules.
Data aggregation & representation: According to some rules that were defined in the project,
the data is cleaned and validated. The data can be divided among several datasets, somehow
it is not recommended to work with multiple datasets. Consequently, data sets are brought
together.
Exploratory data analysis: Data analysis may be categorized as confirmation and exploratory
analysis. Always the cause of any phenomena is analyzed, it calls the hypothesis, therefore it
is analyzed to approve or reject this hypothesis. As a result, we get definitive answers to some
of the questions and then confirms it. In an exploratory stage, the obtained information is the
reason why it occurred, that leads to discover patterns.
Data visualization: eventually, obtain responses to certain questions in a format that may not
be suitable for commercial users. Some kind of representation is necessary to get a value or
conclusion of the analysis. A variety of tools are used to represent data in a way that can be
easily be interpreted by non-technical experts.
Utilization of analysis results: Finishing the analysis and presenting the results direct the user
to make a decision to deploy results for a better optimization of the business process.
Written on 03/06/2022 by IPSIENNE students
What is Apache Spark?
Apache Spark is a data processing framework that can handle big data sets quickly and
distribute processing tasks over multiple computers, either on its own or in combination with
other distributed computing technologies. These two characteristics are essential in the fields
of big data and machine learning, which demand the use of vast computational capacity to
process large data sets. With an easy-to-use API that abstracts away most of the enormous
work of distributed computing and large data processing, Spark also relieves developers of
some of the programming difficulties associated with these activities.
The Use of Spark In Big Data
Before talking about how Apache Spark works? You have to ask yourself why did we turn to
Apache Spark.
This open source framework appeared in general to solve a basic problem:
Regarding the MapReduce on hadoop, which waste more than half the processing time in read
and write operations in HDFS.
And each time the reading has to be from HDFS in each time we need read data which
increases the processing time.
To fix this problem Spark was introduced. which is an open source distributed computing
framework and which is not a modified version of hadoop. It uses hadoop in one thing only, it
is storage and it does not use hadoop in the management of clusters it manages with its own
manage clusters.
Spark stores intermediate processing data in memory which remains shared between jobs,
this allows execution to be fast.
So spark was created to find the solution to the limitations of MapReduce, by performing
processing in memory, and by minimizing the number of steps in a task and reusing data from
memory in multiple parallel processes using of RDD which also in the memory and which
allows the developers to carry out parallel calculations on a cluster, by using the jobs created
when an action is called on the RDD.
And with the help of these steps, the MapReduce programming model work quickly and then
the model accesses the big data stored within the Hadoop File System (HDFS) just as quickly.
The HDFS used to scale a single Hadoop cluster to thousands of nodes is a subset of the
cluster's overall data volume.
Pros and Cons of Apache Spark
Pamis the essential points of SPARK is speed, in Big Data the processing speed is always important,
SPARK is popular because of its speed, it is 100 times faster than Hadoup for data processing. Apache
SPARK has easy-to-use APIs to work on large data sets, it is also powerful because of its well-built
libraries for graph analysis algorithms and machine learning. Spark supports many languages for code
redaction like python, JAVA, SCALA etc....
Written on 03/06/2022 by IPSIENNE students
SPARK also has inconvenients such as does not contain any automatic optimization process you need
to optimize the code manually because it does not have automatic optimsation process. Also the file
management system ,Spark does not have its own file management system it depends on other
platforms for file management such as Hadoop. Spark has fewer algorithms for maching learning.
Spark features : spark sql,streaming,MLlib,Graphx
Apache Spark Core : Spark Core is a popular execution engine that underlies the Spark platform on
which all other features are built. This allows memory management to provide speed, task scheduling,
distribution and monitoring, and interaction with the memory system through a common execution
model that supports a variety of Java, Scala, Python applications and APIs that facilitate development.
Spark SQL is Apache Spark's engine for processing structured data and data processing. It includes a
cost-based optimizer that improves query plans based on logical plans for creating physical plans. It
also provides a code generator for quick queries.
Spark Streaming
This component allows Spark to process the data coming from multiple data sources
(Kafka,HDFS,Flume) in real time and update it over time instead of processing the data periodically.
The data is then processed using complex algorithms and sent to the file system, database, and
dashboard.
Spark MLlib
MLlib (Machine Learning Library): This library makes practical machine learning scalable and easy by
providing a wide range of machine learning algorithms (classification, regression, clustering, co-
filtering) and utilities such as linear algebra and statistics.
Spark GraphX
Spark GraphX: A distributed graphing framework built on Spark that allows users to interactively create
and transform structured data on a large scale. Graphs are defined by a model of the network that
connects objects of a particular meaning. It consists of vertices that carry the information and the
edges between them.
Spark Core and the Resilient Distributed Dataset concept
Spark Core is the foundation of the entire project. Provides dispatching, scheduling, and basic I / O
functionality for distributed tasks. Spark uses a special underlying data structure called RDD (Resilient
Distributed Datasets). It represents a logical collection of data partitioned between Machines.Resilient
Distributed Datasets (RDD)
Restoring distributed datasets (RDDs) are the basic data structure of Spark. This is an immutable
distributed collection of objects. Each record in the RDD is divided into logical partitions that can be
calculated on the various nodes of the cluster. The RDD can contain any type of Python, Java, or Scala
object, including custom classes.
Spark can be used for a variety of purposes. B. Real-time processing, machine learning, chart
processing, etc. Spark consists of various independent components that can be used depending on the
use case. The following numbers give an overview of the Spark ecosystem.
Written on 03/06/2022 by IPSIENNE students
Dataset Description 5G Dataset with Channel and Context Metrics
DATASET OVERVIEW
We used a 5G trace dataset obtained from a major Irish mobile provider for this study. The dataset
was created by combining two mobility patterns (*static and car) as well as two application patterns
(video streaming and file download*). It is made up of client-side cellular key performance indicators
(KPIs) that include channel-related metrics, context-related metrics, cellrelated metrics, and
throughput data. These metrics are derived from G-NetTrack Pro, a well-known non-rooted Android
network monitoring tool.
ATTRIBUTES DESCRIPTION
Timestamp Time of data capture
Longitude & Latitude Coordinates of a mobile device's GPS
Speed It's the speed of at which the phone is moving, It depends on mobility pattern static or driving.
Operator name A mobile telecommunications firm that provides wireless Internet GSM services for
mobile device users is known as a mobile phone operator, wireless provider, or carrier. The consumer
is given a SIM card, which they must enter into their mobile device in order to use the service.
CellID CellID — CID : It's a number that's used to identify each BTS (Base transceiver station).
NetworkMode Current network mode whether it is LTE/5G..
SNR – Signal Interference + Noise Ratio : it's the signalto-noise ratio, It is used for link adaptation along
with packet scheduling. The connection speed will be very slow at SINR values below 0, because there
will be more noise in the received signal than there will be useful information, and the possibility of
losing the connection will also exist
CQI – Channel Quality Information : The CQI ratio is a key factor in determining the maximum data rate
for multimedia transmission.
RSRP Reference Signal Received Power: It's a signal strengthrelated indicator for cells that's employed
as a factor in cell resection and handover decisions. It is the measured power of a single reference
signal resource element within the considered bandwidth. This is one of the factors in determining
which of the candidate cells is the best to connect to in accordance with their signal strength. It's
measured between 0dBm to -110dBm, the signal strength increases as the RSRP value increases.
RSRQ– Reference Signal Received Quality : is the second reliable indicator for handover decisions. This
metric is primarily used to rank candidate cells based on signal quality[1]. Even if the RSRP is low, a
connection with a high RSRQ should be good because the modem is able to extract information from
the weak signal due to minimal noise. [2]
RSSI– Received Signal Strength Indicator is the linear average of total received power observed by UE
only in OFDM symbols carrying reference symbols from all sources within the measurement
bandwidth.
Written on 03/06/2022 by IPSIENNE students
SNR– Signal Interference + Noise Ratio : it's the signalto-noise ratio, It is used for link adaptation along
with packet scheduling . The connection speed will be very slow at SINR values below 0, because there
will be more noise in the received signal than there will be useful information, and the possibility of
losing the connection will also exist
CQI – Channel Quality Information : The CQI report is an important parameter for determining the
maximum data rate for multimedia transmission
DL_bitrate– Downlink bit rate : It represents the number of bits per second that are coming from a cell
tower to your cellular device.
UL_bitrate– Uplink bit rate : It represents the number of bits per second that are leaving your cellular
device and going back to a cell tower.
State the current state of the download It can take one of two values: I (idle) or D (downloading)
PINGAVG,PINGMIN,PINGMAX,PINGSTDEV,PINGLOSS Ping statistics
CELLHEX Cell Id in hex format
NODEHEX Node in hex format
LACHEX– Location Area Code in hex format. A location area is a collection of base transceiver stations
grouped together to improve signaling.
RAWCELLID Cell Id in raw form
NRxRSRP RSRP values for the neighbouring cell.
NRxRSRQ RSRQ values for the neighbouring cell.
Mob_Pattern: it stands for mobility pattern and it indicates whether the mobile is in a movement state
(driving) or not (Static)
Type: it indicates which download approache was taken file download or streaming (Netflix,Amazon
Prime)
Written on 03/06/2022 by IPSIENNE students
The practical phase
CQI (the channel quality
indicator) refers data rate
treatment over channel differs
from the two-network mode.
The plot shows that the 5G
quality rate is better than LTE.
RSRP (reference signal received power) is an important metric that categorize a network from another
one. Here, the boxplot clarify that LTE has an excellent RSRP despite the new features of 5G. An
excellent RSRP is more than -80.
Written on 03/06/2022 by IPSIENNE students
As mentioned
before, RSRQ
Written on 03/06/2022 by IPSIENNE students
Written on 03/06/2022 by IPSIENNE students