05 Oalibj - 2022021516353500
05 Oalibj - 2022021516353500
Haya Smaya
Mechanical Engineering Faculty, Institute of Technology, MATE Hungarian University of Agriculture and Life Science, Gödöllő,
Hungary
Subject Areas
Information Management
Keywords
Big Data, Information Tools
1. Introduction
The research background is dedicated to defining big data, how to analyze it, the
challenges, and how to distinguish between data and big data analyses. There-
fore, a comprehensive literature review has been carried out to define and cha-
racterize Big-data and analyze processes. Several keywords, which are (big-data),
history, and every click on the application. Some researchers estimate that Face-
book generates more than 500 terabytes of data every day, including photos,
videos, and messages. Everything we do online in every industry uses mostly the
same concept; therefore, big data get all this hype [7].
Generally, Big-data is a massive amount of data set that cannot be stored,
processed, or analyzed using traditional tools [8]. This data could also exist in
several forms, such as structured data and semi-structured data. The structured
data might be an Excel sheet that has a definite format. At the same time,
Semi-structured data could be resembled by an email, for example. Unstructured
data are undetermined pictures and videos. Combining all these types of data
creates what is so-called (Big-data) (Figure 1) [9] [10].
(1 tera = 1000 Giga) and petabytes (1 Peta = 1000 terra). The cost of storing this
tremendous amount of data is a hurdle for the data scientist to overcome.
2-Variety
Variety refers to the different data types such as structured, unstructured, and
semi-structured data in relational database storage systems. The data format could
be in the forms as documents, emails, social media text messages, audio, video,
graphics, images, graphs, and the output from all types of machine-generated data
from various sensors, devices, machine logs, cell phone GPS signals and more [11].
3-Velocity
The motion of the data sets is a significant aspect to categorize data types
based on it. Data-at-rest and data-in-motion is the term that deals with velocity.
The major concern is the consistency and completeness of fast-paced data
streams and getting the desired result matching. Velocity also includes time and
latency characteristics: the data being analyzed, processed, stored, managed, and
updated at a first-rate or with a lag time between the events.
4-Value
Value deal with what value should be resulted from a set of data.
5-Veracity
Veracity describes the quality of data. Is the data noiseless or conflict-free?
Accuracy and completeness are concerned.
3. Big-Data Analysis
3.1. Viability of Big-Data Analysis
Big data analytics assists businesses in harnessing their data and identifying new
opportunities. As a result, smarter business decisions, more effective operations,
higher profits, and happier consumers are the result. More than 50 firms were
interviewed for the publication Big Data in Big Companies (Figure 3) [13] to
Figure 3. Frequency distribution of documents containing the term “big data” in ProQuest Research
Library. The source [6].
learn how they used big data. According to the report, they gained value in the
following ways:
Cost reduction. When it comes to storing massive volumes of data, big data
technologies like Hadoop and cloud-based analytics provide significant cost
savings—and they can also find more effective methods of doing business.
Faster, better decision-making. Businesses can evaluate information instan-
taneously—and make decisions based on what they’ve learned—thanks to Ha-
doop’s speed and in-memory analytics, as well as the ability to study new sources
of data.
New products and services. With the capacity to use analytics to measure
client requirements and satisfaction comes the potential to provide customers
with exactly what they want. According to Davenport, more organizations are us-
ing big data analytics to create new goods to fulfill the needs of their customers.
when dealing with vast amounts of scattered data. They are appropriate for
raw and unstructured data because they do not require a fixed schema.
• A data lake is a big storage repository that stores raw data in native format
until it’s needed. A flat architecture is used in data lakes.
• A data warehouse is a data repository that holds vast amounts of data ga-
thered from many sources. Predefined schemas are used to store data in data
warehouses.
• Knowledge discovery/big data mining tools businesses will be able to mine
vast amounts of structured and unstructured big data.
• In-memory data fabric large volumes of data are distributed across system
memory resources. This contributes to minimal data access and processing
delay.
• Data virtualization allows data to be accessed without any technical limita-
tions.
• Data integration software enables big data to be streamlined across different
platforms, including Apache, Hadoop, MongoDB, and Amazon EMR.
• Data quality software, cleans and enriches massive amounts of data
• Data preprocessing software, which prepares data to be analyzed further
Unstructured data is cleared and data is prepared.
• Spark: which is a free and open-source cluster computing platform for batch
and real-time data processing.
massive amounts of data from numerous sources in a variety of forms and types
(Figure 5).
Making better-informed decisions more quickly for more successful strate-
gizing, can benefit and improve the supply chain, operations, and other strategic
decision-making sectors.
Savings that can be realized because of increased business process efficiencies
and optimizations.
Greater marketing insights and information for product creation can come
from a better understanding of client demands, behaviour, and sentiment.
Risk management tactics that are improved and more informed as a result of
huge data sample sizes [15].
from the huge diversity of big data analytics tools and platforms available on
the market.
• Some firms are having difficulty filling the gaps due to a probable lack of in-
ternal analytics expertise and the high cost of acquiring professional data
scientists and engineers.
need to construct the data report and design it in such a way that the deci-
sion-maker can understand it quickly. Data can be displayed in a variety of ways,
including pie charts, graphs, charts, diagrams, and more. Depending on the na-
ture of the data to be displayed, data reporting can also be done in the form of a
table.
3) Deriving Valuable Insights from the Data: To benefit the organizations,
Data Analysts will need to extract relevant and meaningful insights from the da-
ta package. The company will be able to use those valuable and unique insights
to make the greatest decision for its long-term growth.
4) Collection, Processing, and Summarizing of Data: A Data Analyst must
first collect data, then process it using the appropriate tools, and finally sum-
marize the information such that it is easily comprehended. The summarized
data can reveal a lot about the trends and patterns that are used to forecast and
predict things.
Job Responsibilities of Big Data Professionals
1) Analyzing Real-time Situations: Big Data professionals are in high de-
mand for analyzing and monitoring scenarios that occur in real-time. It will as-
sist many businesses in taking immediate and timely action to address any issue
or problem, as well as capitalize on the opportunity [18]. Many businesses may
cut losses, boost earnings, and become more successful this way.
2) Building a System to Process Large Scale Data: Processing large amounts
of data promptly is difficult. Unstructured data that cannot be processed by a
simple tool is sometimes referred to as Big Data. A Big Data Professional must
create a complex technological tool or system to handle and analyze Big Data to
make better decisions [19].
3) Detecting Fraud Transactions: Fraud is on the rise, and it is critical to
combat the problem. Big Data experts should be able to spot any potentially
fraudulent transactions. Many sectors, particularly banking, have important du-
ties in this area. Every day, many fraudulent transactions occur in financial sec-
tors, and banks must act quickly to address this problem. People will lose trust in
the financial system if they continue to save their hard-earned money in banks.
4. Conclusions
Gradually, the business sector is relying more on its development on data
science. A tremendous amount of data is used to describe the behaviour of com-
plex systems, anticipate the output of processes, and evaluate this output. Based
on what we discussed in this essay, it can be stated that Big-data analytics is the
cutting-edge methodology in data science alongside every other technological
aspect, and studying comprehensively this major, would be essential for further
development.
Several methods and software are commercially available for analyzing
big-data sets. Each of them can relate to technology, business, or social media.
Further studies using analyzing software could enhance the depth of the know-
Conflicts of Interest
The author declares no conflicts of interest.
References
[1] Siegfried, P. (2017) Strategische Unternehmensplanung in jungen KMU—Probleme
and Lösungsansätze. de Gruyter/Oldenbourg Verlag, Berlin.
[2] Siegfried, P. (2014) Knowledge Transfer in Service Research—Service Engineering
in Startup Companies. EUL-Verlag, Siegburg.
[3] Divesh, S. (2017) Proceedings of the VLDB Endowment. Proceedings of the VLDB
Endowment, 10, 2032-2033.
[4] Su, X. (2012) Introduction to Big Data. In: Opphavsrett: Forfatter og Stiftelsen
TISIP, Institutt for informatikk og e-læring ved NTNU, Zürich, Vol. 10, Issue 12,
2269-2274.
[5] Siegfried, P. (2015) Die Unternehmenserfolgsfaktoren und deren kausale Zusam-
menhänge. In: Zeitschrift Ideen-und Innovationsmanagement, Deutsches Institut
für Betriebs-wirtschaft GmbH/Erich Schmidt Verlag, Berlin, 131-137.
https://doi.org/10.37307/j.2198-3151.2015.04.04
[6] Gandomi, A. and Haider, M. (2015) Beyond the Hype: Big Data Concepts, Methods,
and Analytics. International Journal of Information Management, 35, 137-144.
https://doi.org/10.1016/j.ijinfomgt.2014.10.007
[7] Lembo, D. (2015) An Introduction to Big Data. In: Application of Big Data for Na-
tional Security, Elsevier, Amsterdam, 3-13.
https://doi.org/10.1016/B978-0-12-801967-2.00001-X
[8] Siegfried, P. (2014) Analysis of the Service Research Studies in the German Research
Field, Performance Measurement and Management. Publishing House of Wroclaw
University of Economics, Wrocław, Band 345, 94-104.
[9] Cheng, O. and Lau, R. (2015) Big Data Stream Analytics for Near Real-Time Senti-
ment Analysis. Journal of Computer and Communications, 3, 189-195.
https://doi.org/10.4236/jcc.2015.35024
[10] Abu-salih, B. and Wongthongtham, P. (2014) Chapter 2. Introduction to Big Data
Technology. 1-46.
[11] Sharma, S. and Mangat, V. (2015) Technology and Trends to Handle Big Data: Sur-
vey. International Conference on Advanced Computing and Communication Tech-
nologies, Haryana, 21-22 February 2015, 266-271.
https://doi.org/10.1109/ACCT.2015.121
[12] Davenport, T.H. and Dyché, J. (2013) Big Data in Big Companies. Baylor Business
Review, 32, 20-21.
http://search.proquest.com/docview/1467720121?accountid=10067%5Cnhttp://sfx.li
b.nccu.edu.tw/sfxlcl41?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:jour
nal&genre=article&sid=ProQ:ProQ:abiglobal&atitle=VIEW/REVIEW:+BIG+DAT
A+IN+BIG+COMPANIES&title=Bay
[13] Riahi, Y. and Riahi, S. (2018) Big Data and Big Data Analytics: Concepts, Types and
Technologies. International Journal of Research and Engineering, 5, 524-528.
https://doi.org/10.21276/ijre.2018.5.9.5
[14] Verma, J.P. and Agrawal, S. (2016) Big Data Analytics: Challenges and Applications
for Text, Audio, Video, and Social Media Data. International Journal on Soft Com-
puting, Artificial Intelligence and Applications, 5, 41-51.
https://doi.org/10.5121/ijscai.2016.5105
[15] Begoli, E. and Horey, J. (2012) Design Principles for Effective Knowledge Discovery
from Big Data. Proceedings of the 2012 Joint Working Conference on Software Ar-
chitecture and 6th European Conference on Software Architecture, WICSA/ECSA,
Helsinki, 20-24 August 2012, 215-218.
https://doi.org/10.1109/WICSA-ECSA.212.32
[16] Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R. and
Muharemagic, E. (2015) Deep Learning Applications and Challenges in Big Data
Analytics. Journal of Big Data, 2, 1-21. https://doi.org/10.1186/s40537-014-0007-7
[17] Bätz, K. and Siegfried, P. (2021) Complexity of Culture and Entrepreneurial Prac-
tice. International Entrepreneurship Review, 7, 61-70.
https://doi.org/10.15678/IER.2021.0703.05
[18] Bockhaus-Odenthal, E. and Siegfried, P. (2021) Agilität über Unternehmensgrenzen
hinaus—Agility across Boundaries, Bulletin of Taras Shevchenko National Univer-
sity of Kyiv. Economics, 3, 14-24. https://doi.org/10.17721/1728-2667.2021/216-3/2
[19] Kaisler, S.H., Armour, F.J. and Espinosa, A.J. (2017) Introduction to Big Data and
Analytics: Concepts, Techniques, Methods, and Applications Mini Track. Proceed-
ings of the Annual Hawaii International Conference on System Sciences, Hawaii,
4-7 January 2017, 990-992. https://doi.org/10.24251/HICSS.2017.117