0% found this document useful (0 votes)
626 views

Evolution of Big Data

Uploaded by

sit21it084
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
626 views

Evolution of Big Data

Uploaded by

sit21it084
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

UNIT I INTRODUCTION TO BIG DATA 9

Evolution of Big Data - Best Practices for Big Data Analytics - Big Data Characteristics - The Promotion of the Value of Big
Data - Unstructured Data – Big Data Use Cases- Industry Examples of Big Data – Web Analytics – Big Data and Marketing – Fraud
and Big Data – Risk and Big Data – Credit Risk Management – Big Data and Algorithmic Trading – Big Data and Healthcare –
Big Data in Medicine – Advertising and Big Data – Big Data Technologies –- Characteristics of Big Data Applications - Perception
and Quantification of Value -Understanding Big Data Storage -Big Data Analytics.

What is Big Data? Introduction, Types, Characteristics, Examples

What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored
and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording
media.

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with
such large size and complexity that none of traditional data management tools can store it or process it
efficiently.

What is an Example of Big Data?

Following are some of the Big Data examples-

Stock Exchange
The New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade data
per day.ig data is also data but with huge size.

Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social media site
Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges,
putting comments etc.

A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand
flights per day, generation of data reaches up to many Petabytes.

One Bit - 1 or 0 - 8 bits - 1 Byte


(1024)1 Bytes = 1 KiloBye - 1 KB
(1024)2 Bytes = 1 MegaByte - 1 MB
(1024)3 Bytes = 1 GigaByte - 1GB
(1024)4 Bytes = 1 TeraByte - 1TB
(1024)5 Bytes = 1 PetaByte - 1PB
(1024)6 Bytes = 1 ExaByte - 1EB
(1024)7 Bytes = 1 ZettaByte - 1ZB
(1024)8 Bytes = 1 YottaByte - 1YB
(1024)9 Bytes = 1 BrontoByte - 1BB
(1024)10 Bytes = 1 geopByte - 1gB

Types Of Big Data


Following are the types of Big Data:
1. Structured
2. Unstructured
3. Semi-structured
Structured
Structured data is highly organized and easily understood by machine language. Any data that can be stored,
accessed and processed in the form of fixed format is termed as a ‘structured’ data. Over the period of time,
talent in computer science has achieved greater success in developing techniques for working with such kind
of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we
are foreseeing issues when the size of such data grows to a huge extent, typical sizes being in the range of
multiple zettabytes.
Examples of structured data include names, dates, addresses, credit card numbers, stock information,
geolocation, and more.

Examples Of Structured Data


● An ‘Employee’ table in a database is an example of Structured Data
● Data stored in a relational database management system is one example of a ‘structured’ data.
● OLTP systems are built to work with structured data wherein data is stored in relations (tables).

Online Transaction Processing is a type of data processing that consists of executing a number of
transactions occurring concurrently—online banking, shopping, order entry, or sending text messages, for
example.
Employee_ID Employee_Name Gender Department Salary_In_lacs

2365 Rajesh Kulkarni Male Finance 650000

3398 Pratibha Joshi Female Admin 650000

7465 Shushil Roy Male Admin 500000

7500 Shubhojit Das Male Finance 500000

7699 Priya Sane Female Finance 550000

Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to the size being
huge, unstructured data poses multiple challenges in terms of its processing for deriving value out of it. A
typical example of unstructured data is a heterogeneous data source containing a combination of simple text
files, images, videos etc. Nowadays organizations have a wealth of data available with them but unfortunately,
they don’t know how to derive value out of it since this data is in its raw form or unstructured format.

Examples Of Unstructured Data


The output returned by ‘Google Search’
Example Of Unstructured Data

Please note that web application data, which is unstructured, consists of log files, transaction history files etc.
Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in
form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured
data is a data represented in an XML file.
It contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields
within the data. Therefore, it is also known as self-describing structure.

A few examples of semi-structured data sources are emails, XML and other markup languages, binary
executables, TCP/IP packets, zipped files, data integrated from different sources, and web pages.

Examples Of Semi-structured Data


Personal data stored in an XML file-
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
Other Examples:
● E-mails
● XML and other markup languages
● Binary executables
● TCP/IP packets
● Zipped files
● Integration of data from different sources
● Web pages
CHARACTERISTICS OF BIG DATA
Big data can be described by the following characteristics:
● Volume
● Variety
● Velocity
● Variability

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial
role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data
or not, is dependent upon the volume of data. Hence, ‘Volume’ is one characteristic which needs to be
considered while dealing with Big Data solutions.

(ii) Variety – The next aspect of Big Data is its variety.


Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier
days, spreadsheets and databases were the only sources of data considered by most of the applications.
Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being
considered in the analysis applications. This variety of unstructured data poses certain issues for storage,
mining and analyzing data.
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and
processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes,
application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive
and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the
process of being able to handle and manage the data effectively.

Advantages Of Big Data Processing

Ability to process Big Data in DBMS brings in multiple benefits, such as-
● Businesses can utilize outside intelligence while taking decisions
Access to social data from search engines and sites like facebook, twitter are enabling
organizations to fine tune their business strategies.

● Improved customer service


Traditional customer feedback systems are getting replaced by new systems designed with Big
Data technologies. In these new systems, Big Data and natural language processing(NLP)
technologies are being used to read and evaluate consumer responses.

The top 7 techniques Natural Language Processing (NLP) uses to extract data from text are:

● Sentiment Analysis.
● Named Entity Recognition.
● Summarization.
● Topic Modeling.
● Text Classification.
● Keyword Extraction.
● Lemmatization and stemming.

● Early identification of risk to the product/services, if any


e.g. Modi meeting-design house, mobile design…, mini bus survey

● Better operational efficiency


Big Data technologies can be used for creating a staging area or landing zone for new data before
identifying what data should be moved to the data warehouse. In addition, such integration of Big
Data technologies and data warehouses helps an organization to off-load infrequently accessed
data.

EVOLUTION OF BIG DATA

The term ‘Big Data’ has been in use since the early 1990s. In its true essence, Big Data is not something that
is completely new or only of the last two decades. Over the course of centuries, people have been trying to
use data analysis and analytics techniques to support their decision-making process.

However, in the last two decades, the volume and speed with which data is generated has changed – beyond
measures of human comprehension. The total amount of data in the world was 4.4 zettabytes in 2013. That is
set to rise steeply to 44 zettabytes by 2020. To put that in perspective, 44 zettabytes is equivalent to 44 trillion
gigabytes. Even with the most advanced technologies today, it is impossible to analyze all this data. The need
to process these increasingly larger (and unstructured) data sets is how traditional data analysis transformed
into ‘Big Data’ in the last decade.

Data Growth over the years

Data Growth over the years

To illustrate this development over time, the evolution of Big Data can roughly be subdivided into three main
phases. Each phase has its own characteristics and capabilities. In order to understand the context of Big Data
today, it is important to understand how each phase contributed to the contemporary meaning of Big Data

Big Data phase 1.0


Data analysis, data analytics and Big Data originate from the longstanding domain of database management. It
relies heavily on the storage, extraction, and optimization techniques that are common in data that is stored in
Relational Database Management Systems (RDBMS).

Database management and data warehousing are considered the core components of Big Data Phase 1. It
provides the foundation of modern data analysis as we know it today, using well-known techniques such as
database queries, online analytical processing and standard reporting tools.

Big Data phase 2.0

Since the early 2000s, the Internet and the Web began to offer unique data collections(survey, online trak,
interview, forms to collect customer feedback….) and data analysis opportunities.

Conjoint analysis:
statistical analysis that firms use in market research to understand how customers value different components
or features of their products or services

With the expansion of web traffic and online stores, companies such as Yahoo, Amazon and eBay started to
analyze customer behavior by analyzing click-rates, IP-specific location data and search logs. This opened a
whole new world of possibilities.

From a data analysis, data analytics, and Big Data point of view, HTTP-based web traffic introduced a
massive increase in semi-structured and unstructured data. Besides the standard structured data types,
organizations now needed to find new approaches and storage solutions to deal with these new data types in
order to analyze them effectively. The arrival and growth of social media data greatly aggravated the need for
tools, technologies and analytics techniques that were able to extract meaningful information out of this
unstructured data.

Big Data phase 3.0

Although web-based unstructured content is still the main focus for many organizations in data analysis, data
analytics, and big data, the current possibilities to retrieve valuable information are emerging out of mobile
devices.

Mobile devices not only give the possibility to analyze behavioral data (such as clicks and search queries), but
also give the possibility to store and analyze location-based data (GPS-data). With the advancement of these
mobile devices, it is possible to track movement, analyze physical behavior and even health-related data
(number of steps you take per day). This data provides a whole new range of opportunities, from
transportation, to city design and health care.

Different clients can make requests from the JobTracker, which becomes the sole arbitrator for allocation
of resources. Simultaneously, the rise of sensor-based internet-enabled devices is increasing the data
generation like never before. Famously coined as the ‘Internet of Things’ (IoT), millions of TVs, thermostats,
wearables and even refrigerators are now generating zettabytes of data every day. And the race to extract
meaningful and valuable information out of these new data sources has only just begun.

It all starts with the explosion in the amount of data we have generated since the dawn of the digital age.
This is largely due to the rise of computers, the Internet and technology capable of capturing data from the
world we live in. Going back even before computers and databases, we had paper transaction records,
customer records etc. Computers, and particularly spreadsheets and databases, gave us a way to store and
organize data on a large scale. Suddenly, information was available at the click of a mouse.

We‘ve come a long way since early spreadsheets and databases, though. Today, every two days we create as
much data as we did from the beginning of time until 2000. And the amount of data we're creating continues
to increase rapidly.

Nowadays, almost every action we take leaves a digital trail. We generate data whenever we go online, when
we carry our GPS-equipped smartphones, when we communicate with our friends through social media or
chat applications, and when we shop. You could say we leave digital footprints with everything we do that
involves a digital action, which is almost everything. On top of this, the amount of machine generated data is
rapidly growing too.

How does Big Data work?

Principle:
Big Data works on the Principle that the more you know about anything or any situation, the more
reliably you can gain new insights and make predictions about what will happen in the future.

Process of Big Data:


By comparing more data points, relationships begin to emerge that were previously hidden, and these
relationships enable us to learn and make smarter decisions. Most commonly, this is done through a process
that involves building models, based on the data we can collect, and then running simulations,
tweaking(change) the value of data points each time and monitoring how it impacts our results. This
process is automated – today‘s advanced analytics technology will run millions of these simulations,
tweaking all the possible variables until it finds a pattern – or an insight – that helps solve the problem it is
working on.

Anything that wasn‘t easily organised into rows and columns was simply too difficult to work with and was
ignored. Now though, advances in storage and analytics mean that we can capture, store and work with
different types of data. Thus, ―data can now mean anything from databases to photos, videos, sound
recordings, written text and sensor data.

To make sense of all this messy data, Big Data projects often use cutting-edge analytics involving artificial
intelligence and machine learning AI, Ml, Deep Learning(DL), Neural N/W, NLP. By teaching computers
to identify what this data represents– through image recognition or natural language processing, for example –
they can learn to spot patterns much more quickly and reliably than humans.

INDUSTRIAL IMPACT OF BIG DATA IN 2020:


Machine Learning and Artificial Intelligence will proliferate. The deadly duo will get beefed up with more
muscles. Continuing with our round-up of the latest trends in big data, we will now take stock of how AI and
ML are doing in the big data industry. Artificial intelligence and machine learning are the two sturdy
technological workhorses working hard to transform the seemingly unwieldy big data into an approachable
stack. Deploying them will enable businesses to experience the algorithmic magic via various practical
applications like video analytics, pattern recognition, customer churn modeling, dynamic pricing, fraud
detection, and many more. IDC predicts that spending on AI and ML will rise to $57.6 billion in 2021.
Similarly, companies pouring money into AI are optimistic that their revenues will increase by 39% in
2020.
Raise of Quantum Computing
The next computing juggernaut is getting ready to strike, the quantum computers. These are the powerful
computers that have principles of Quantum Mechanics working on their base. Although, you must wait
patiently for at least another half a decade before the technology hits the mainstream. One thing is for sure; it
will push the envelope of traditional computing and do analytics of unthinkable proportions.
Predictions for big data are thus incomplete without quantum computing Edge analytics will gain increased
traction The phenomenal proliferation of IoT devices demands a different kind of analytics solution and edge
analytics is probably the befitting answer. Edge analytics means conducting real-time analysis of data at
the edge of a network or the point where data is being captured without transporting that data to a
centralized data store. For its on-site nature, it offers certain cool benefits: reduction in bandwidth
requirements, minimization of the impact of load spikes, reduction in latency, and superb scalability.
Surely, edge analytics will find more corporate takers in future. One survey says between 2017 and 2025, the
total edge analytics market will expand at a moderately high CAGR of 27.6% to pass the $25 billion mark.
This will have a noticeable impact on big data analytics as well.
Dark data
So, what is Dark Data, anyway? Every day, businesses collect a lot of digital data that is stored but is
never used for any purposes other than regulatory compliance and since we never know when it might
become useful. Since data storage is easier, businesses are not leaving anything out.
Old data formats, files, documents within the organization are just lying there and being accumulated in huge
amounts every second. This unstructured data can be a goldmine of insights, but only if it is analyzed
effectively.
According to IBM, by 2020, upwards of 93% of all data will fall under the Dark Data category. Thus, big
data in 2020 will inarguably reflect the inclusion of Dark Data. The fact is we must process all types of data to
extract maximum benefit from data crunching.

Usage:

This ever-growing stream of sensor information, photographs, text, voice and video data means we can now
use data in ways that were not possible before. This is revolutionising the world of business across almost
every industry. Companies can now accurately predict what specific segments of customers will want to buy,
and when to buy. And Big Data is also helping companies run their operations in a much more efficient way.

Even outside of business, Big Data projects are already helping to change our world in several ways, such as:

Improving healthcare:

Data-driven medicine involves analyzing vast numbers of medical records and images for patterns that can
help spot disease early and develop new medicines. Predicting and responding to natural and man-made
disasters: Sensor data can be analyzed to predict where earthquakes are likely to strike next, and patterns of
human behavior give clues that help organisations give relief to survivors and much more.

Preventing crime:

Police forces are increasingly adopting data-driven strategies based on their own intelligence and public data
sets in order to deploy resources more efficiently and act as a deterrent where one is needed.

Marketing effectiveness: Big Data, along with being able to help businesses and organizations in making
smart decisions also drastically increases the sales and marketing effectiveness of the businesses and
organizations thus highly improving their performances in the industry.

PREDICTION AND DECISION MAKING

Now that the organizations can analyze Big Data, they have successfully started using Big Data to mitigate
risks, revolving around various factors of their businesses. Using Big Data to reduce the risks regarding the
decisions of the organizations and making predictions has become one of the many benefits coming from big
data in industries.
Concerns: Big Data gives us unprecedented insights and opportunities, but it also raises concerns and
questions that must be addressed:

Data privacy: The Big Data we now generate contains a lot of information about our personal lives, much of
which we have a right to keep private

Data security: Even if we decide we are happy for someone to have our data for a purpose, can we trust them
to keep it safe?

Data discrimination: When everything is known, will it become acceptable to discriminate against people
based on data we have on their lives? We already use credit scoring to decide who can borrow money, and
insurance is heavily data-driven.

Data quality: Not enough emphasis on quality and contextual relevance. The trend with technology is
collecting more raw data closer to the end user. The danger is data in raw format has quality issues. Reducing
the gap between the end user and raw data increases issues in data quality.

Facing these challenges is an important part of Big Data, and they must be addressed by organisations who
want to take advantage of data. Failure to do so can leave businesses vulnerable, not just in terms of their
reputation, but also legally and financially.

BEST PRACTICES FOR BIG DATA ANALYTICS

Business is awash in data—and also big data analytics programs meant to make sense of this data and apply it
toward competitive advantage. A recent Gartner study found that more than 75 percent of businesses either
use big data or plan to spin it up within the next two years.

Not all big data analytics operations are created equal, however; there‘s plenty of noise around big data, but
some big data analytics initiatives still don‘t capture the bulk of useful business intelligence and others are
struggling getting off the ground.

For those businesses currently struggling with the data, or still planning their approach, here are five best
practices for effectively using big data analytics.

1. Start at the End

The most successful big data analytics operations start with the pressing questions that need answering and
work backwards. While technology considerations can steal the focus, utility comes from starting with the
problem and figuring out how big data can help find a solution. There are many directions that most
businesses can take their data, so the best operations let key questions drive the process and not the
technology tools themselves.

Businesses should not try to boil the ocean, and should work backwards from the expected outcomes,‖ says
Jean-Luc Chatelain, chief technology officer for Accenture Analytics, part of Accenture Digital.

2. Build an Analytics Culture

Change management and training are important components of a good big data analytics program. For
greatest impact, employees must think in terms of data and analytics so they turn to it when developing
strategy and solving business problems. This requires a considerable adjustment in both how employees and
businesses operate.

Training also is key so employees know how to use the tools that make sense of the data; the best big data
system is useless if employees can‘t functionally use it.
We approach big data analytics programs with the same mindset as any other analytic or transformational
program: You must address the people, process and technology in the organization rather than just data and
technology,‖ says Paul Roma, chief analytics officer for Deloitte Consulting.

Be ready to change the way you work, adds Luc Burgelman, CEO of NGDATA, a firm that help financial
services, media firms and telecoms with big data utilization.

Big data has the power to transform your entire business but only if you are flexible and prepared to be
open to change.

3. Re-Engineer Data Systems for Analytics

An increasing range and volume of devices now generate data, creating substantial variation both in sources
and types of data. An important component of a successful big data analytics program is re-engineering the
data pipelines so data gets to where it needs to be and in a form that is useful for analysis.
Many existing systems were not developed for today‘s big data analysis needs.
―This is still an issue in many businesses, where the data supply chain is blocked or significantly more
complex than is necessary, leading to ‗trapped data‘ that value can‘t be extracted from, says Chatelain at
Accenture Digital.
―From a data engineering perspective, we often talk about re- architecting the data supply chain, in part to
break down silos in where data is coming from, but also to make sure insights from data are available where
they are relevant.
4. Focus on Useful Data Islands
There‘s a lot of data. Not all of it can be mined and fully exploited. One key of the most successful big data
analytics operations is correctly identifying which islands of data offer the most promise.
―Finding and using precise data is rapidly becoming the Holy Grail of analytics activit ies, says Chatelain.
―Enterprises are taking action to address the challenges present in grappling with big data, but [they]
continue to struggle to identify the islands of relevant data in the big data ocean.Burgelman at NGDATA also
stresses the importance of data selection.
―Most companies are overwhelmed by the she er volume of the data they po ssess, much of which is
irrelevant to the stated goal at hand and is just taking up space in the database, he says.‖
― By determining which parameters will have the most impact for your company, you‘ll be able to make
better use of the data you have through a more focused approach rather than attempting to sort through it all.‖
5. Iterate Often
Business velocity is at an all-time high thanks to more globally connected markets and rapidly evolving
information technology. The data opportunities are constantly changing, and with that comes the need for an
agile, iterative approach toward data mining and analysis. Good big data analytics systems are nimble and
always iterating as new technology and data opportunities emerge.
Big data itself can help drive this evolution.
― One of the amazing things about big data analytics is that it can help organizations gain a better
understanding of what they don‘t know,says Burgelman.
―So as data comes in and
conclusions are reached, you‘ve got to be flexible and open to changing the scope of the project.
Don‘t be afraid to ask new questions of your data on an ongoing basis.‖ The importance of effective big data
use grows by the day. This makes analytics best practices all the more important, and these five top the list.
BIG DATA CHARACTERISTICS

Big Data Characteristics

Big Data contains a large amount of data that is not being processed by traditional data storage or the
processing unit. It is used by many multinational companies to process the data and business of many
organizations. The data flow would exceed 150 exabytes per day before replication.

There are five v's of Big Data that explains the characteristics.

5 V's of Big Data:

● Volume

● Veracity

● Variety

● Value

● Velocity

Volume:
Volume refers to the unimaginable amounts of information generated every second from social media, cell
phones, cars, credit cards, M2M sensors, images, video, and whatnot. We are currently using distributed
systems, to store data in several locations and brought together by a software Framework like Hadoop.

Big Data is a vast 'volumes' of data generated from many sources daily, such as business processes,
machines, social media platforms, networks, human interactions, and many more.

Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is recorded,
and more than 350 million new posts are uploaded each day. Big data technologies can handle large amounts
of data.

Variety:

Big Data can be structured, unstructured, and semi-structured that are being collected from different
sources. Data will be collected from databases, sheets, PDFs, Emails, audios, SM posts, photos, videos,
etc.

The data is categorized as below:

a. Structured data: In Structured schema, along with all the required columns. It is in a tabular form.
Structured Data is stored in the relational database management system.
b. Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON, XML,
CSV, TSV, and email. OLTP (Online Transaction Processing) systems are built to work with
semi-structured data. It is stored in relations, i.e., tables.

c. Unstructured Data: All the unstructured files, log files, audio files, and image files are included in
the unstructured data. Some organizations have much data available, but they did not know how to
derive the value of data since the data is raw.

d. Quasi-structured Data:The data format contains textual data with inconsistent data formats that are
formatted with effort and time with some tools.

Example: Web server logs, i.e., the log file is created and maintained by some server that contains a list of
activities.

Veracity:

Veracity means how much the data is reliable. It has many ways to filter or translate the data. Veracity is the
process of being able to handle and manage data efficiently. Big Data is also essential in business
development.

For example, Facebook posts with hashtags.

Value:

Value is an essential characteristic of big data. It is not the data that we process or store. It is valuable and
reliable data that we store, process, and also analyze.

Velocity:
Velocity plays an important role compared to others. Velocity creates the speed by which the data is created in
real-time. It contains the linking of incoming data sets speeds, rate of change, and activity bursts. The
primary aspect of Big Data is to provide demanding data rapidly.

Big data velocity deals with the speed at the data flows from sources like application logs, business
processes, networks, and social media sites, sensors, mobile devices, etc.

Applications of Big Data:


Big Data is considered the most valuable and powerful fuel that can run the massive IT industries of the 21st
Century. Big Data is being the most wide-spread technology that is being used in almost every business sector.

i)Travel and Tourism is one of the biggest users of Big Data Technology. It has enabled us to predict the
requirements for travel facilities in many places, improving business through dynamic pricing and many more

ii) Financial and Banking Sectors extensively uses Big Data Technology. Big data analytics can aid banks in
understanding customer behaviour based on the inputs received from their investment patterns, shopping
trends, motivation to invest and personal or financial backgrounds.

iii) Healthcare Sector: Big Data has already started to create a huge difference in the healthcare sector. With
the help of predictive analytics, medical professionals and Health Care Personnel are now able to provide
personalized healthcare services to individual patients.

iv) The Telecommunication and Multimedia: sector is one of the primary users of Big Data. There are
zettabytes of getting generated every day and to handle such huge data would need nothing other than Big
Data Technologies.

v) Government and Military also use Big Data Technology at a higher rate. You can consider the amount of
data. Government generates on its records and in the military, a normal fighter jet plane requires to process
petabytes of data during its flight.

Advantages of Big Data:


● Big Data has enabled predictive analysis which can save organisations from operational risks.
● Predictive analysis has helped organisations grow business by analysing customer needs.

● Big Data has enabled many multimedia platforms to share data Ex: youtube, Instagram

● Medical and Healthcare sectors can keep patients under constant observations.

● Big Data changed the face of customer-based companies and worldwide market

THE PROMOTION OF THE VALUE OF BIG DATA


That being said, a thoughtful approach must differentiate between hype and reality, and one way to do this is
to review the difference between what is being said about big data and what is being done with big data.

Validating (Against) The Hype: Organizational Fitness

There are a number of factors that need to be considered before making a decision regarding adopting that
technology.

As a way to properly ground any initiatives around big data, one initial task would be to evaluate the
organization‘s fitness as a combination of the five factors needs to be considered are :

1. Sustainability of technology ,
2. Feasibility
3. Integrity,
4. Value and
5. Reasonability

Table 2.1 provides a sample framework for determining a score for each of these factors ranging from 0
(lowest level) to 4 (highest level). The resulting scores can be reviewed in the ways of a degree of objectivity,
especially when considering the value of big data.

Key factors for successful big data technologies which are beneficial for organization are

● Reducing the capital and operational cost


● Does not needed high end servers as it can be run on commodity hardware
● Supports both structured and unstructured data
● Supports high performance and scalable analytics operations
● Simple programming model for scalable applications

But the implementation of a high performance computing system was restricted to large organizations.
However, with the improvement of market condition and economy, the high performance computing system
has attracted many organizations who are willing to invest in implementation of big data analytics.

There are many factors that need to be considered before adopting any new technology like big data analytics.
● The technology can’t be adapted blindly because of its feasibility and popularity within the
organization.
● The risk factor needed to be considered may fail and lead to the phase of the hype cycle which may
nullify the expectations for clear business improvements.

Score by 0 1 2 3 4
Dimension
Sustainabil No plan in Continued Need for Business Program
ity place for funding for yearby-year justifications management
Score by 0 1 2 3 4
Dimension
acquiring maintenance business ensure office effective in
funding for and justifications continued absorbing and
ongoing engagement is for continued funding and amortizing
management given on an ad funding investments in management and
and hoc basis skills maintenance costs.
maintenance Sustainability
costs No plan is at risk on a
for managing continuous
skills inventory basis
Feasibility Evaluation of Organization Organization Organization Organization
new technology tests new evaluates and is open to encourages
is not officially technologies in tests new evaluation of evaluation and
sanctioned reaction to technologies new testing of new
market aftermarket technology technology Clear
pressure evidence of Adoption of decision process
successful use technology on for adoption or
an ad hoc rejection
basis based on Organization
convincing supports allocation
business of time to
justifications innovation
Integrality Significant Willingness to New Clear No constraints or
impediments to invest effort in technologies processes exist impediments to
incorporating determining can be for migrating fully integrate
any ways to integrated into or integrating technology into
nontraditional integrate the new operational
technology into technology, environment technologies, environment
environment with some within but require
successes limitations and dedicated
with some resources and
level of effort level of effort
Value Investment in The expected Selected Expectations Expectations for
hardware quantifiable instances of for some some quantifiable
resources, value widely is perceived quantifiable value for investing
software tools,
evenly value may value for in limited aspects
skills training,
balanced by an suggest a investing in of the technology
and ongoing
investment in positive return limited aspects
management hardware on investment of the
and resources, technology
maintenance software tools,
exceeds the
skills training,
expected and ongoing
quantifiable management
value and
maintenance
Reasonabil Organization’s Organization’s Organization’s Business Business
ity resource resource resource challenges are challenges have
requirements requirements requirements expected to resource
for near-, mid-, for near- and for near-term have resource requirements that
and long-terms mid-terms are is requirements clearly exceed the
are satisfactorily satisfactorily in the mid- capability of the
met, unclear as met, unclear and long-terms existing and
Score by 0 1 2 3 4
Dimension
satisfactorily to whether as to whether that will planned
met long-term mid and exceed the environment
needs are met long-term capability of Organization’s
needs are met the existing go-forward
and planned business model is
environment highly information
centric

● To review the difference between reality and hype, one must see what can be done with big data and
what is said about that.
● The center for economics and Business Research (CEBR) has published the advantages of big data as:
⮚ Provide improvements in the strategy, business planning, research and analytics leading to new
innovations and product development.
⮚ Optimized spending with improved customer marketing.
⮚ Provide predictive, descriptive and prescriptive analytics for improving supply chain
management and provide accuracy in fraud detection.

A scan of existing content on the


feasibility, reasonability, value, integrability, and sustainability, what is being promoted as the expected result
of big data analytics and, more interestingly, how familiar those expectations sound.

A good example is provided within an economic study on the value of big data undertaken and published by
the Center for Economics and Business Research (CEBR) that speaks to the cumulative value of:
● optimized consumer spending as a result of improved targeted customer marketing;
● improvements to research and analytics within the manufacturing sectors to lead to new product
development;
● improvements in strategizing and business planning leading to innovation and new start-up companies;
● predictive analytics for improving supply chain management to optimize stock management,
replenishment, and forecasting;
● improving the scope and accuracy of fraud detection. Benefits promoted by business intelligence and
data warehouse tools vendors and system integrators for the past 15_20 years, namely:
⮚ Better targeted customer marketing
⮚ Improved product analytics
⮚ Improved business planning
⮚ Improved supply chain management
⮚ Improved analysis for fraud, waste, and abuse Further articles, papers, and vendor messaging
on big data reinforce these presumptions, but if these were the same improvements promised
by wave after wave of new technologies, what makes big data different?
● There are some more benefits promoted by inculcating business intelligence and data warehouse tools
vendors and system integrators namely:
⮚ Better targeted customer marketing.
⮚ Improved product analysis.
⮚ Improved business planning.
⮚ Improved supply chain management.
⮚ Improved analysis for fraud, waste and abuse.

BIG DATA USE CASES:


The Big Data environment largely consists of a methodology
⮚ Parallel computing resources,
⮚ Distributed storage,
⮚ Scalable performance management,
⮚ Data exchange via high-speed networks.

The result is improved performance and scalability using big data techniques in a group of Projects, which are
categorized as:
● Business intelligence, querying, reporting, searching, including many implementations of searching,
filtering, indexing, speeding up aggregation for reporting and for report generation, trend analysis,
search optimization, and general information retrieval.
● Improved performance for common data management operations, with the majority focusing on log
storage, data storage and archiving, followed by sorting, running joins,
extraction/transformation/loading (ETL) processing, other types of data conversions, as well as
duplicate analysis and elimination.
● Non-database applications, such as image processing, text processing in preparation for publishing,
genome(gene-set) sequencing, protein sequencing and structure prediction, web crawling, and
monitoring workflow processes.
● Data mining and analytical applications, including social network analysis, facial recognition, profile
matching, other types of text analytics, web mining, machine learning, information extraction,
personalization and recommendation analysis, ad optimization, and behavior analysis.
Generally, Processing applications can combine these core capabilities in different ways.
BIG DATA USES CASES
1. Optimize Funnel Conversion - large customers to small purchasers
2. Behavioral Analytics - longer customers / quick access
3. Customer Segmentation - based on interest they can offer their products or services
4. Predictive Support - identify repair, down for prior action
5. Market Basket Analysis and Pricing Optimization
6. Predict Security Threats - for mal functions can given solution
7. Fraud Detection - finance security
8. Industry specific - healthcare improve patient outcome, agri improve data to improve crops
1. Optimize Funnel Conversion (A large number of potential customers and ends with a much smaller
number of people who actually make a purchase.)
Big data analytics allows companies to track leads through the entire sales conversion process, from a click on
an adword ad to the final transaction, in order to uncover insights on how the conversion process can be
improved
COMPANY T-Mobile EMPLOYEES 38,000 INDUSTRY Communication
Purpose:
T-mobile uses multiple indicators, such as billing and sentiment analysis, in order to identify customers that
can be upgraded to higher quality products, as well as to identify those with a high lifetime customer-value,
so its team can focus on retaining those customers.
COMPANY Credem EMPLOYEES 5,600 INDUSTRY Finance
Purpose
Credem uses data analytics to predict which financial products or services a customer would appreciate, so it
can better target consumers during the sales process. With these insights, the bank increased average revenue
by 22 percent and reduced costs by 9 percent.

2. Behavioral Analytics
With access to data on consumer behavior, companies can learn what prompts a customer to stick around
longer, as well as learn more about their customer’s characteristics and purchasing habits in order to improve
marketing efforts and boost profits
COMPANY Mastercard EMPLOYEES67,000 INDUSTRY Finance
Purpose
With 1.8 billion customers, MasterCard is in the unique position of being able to analyze the behavior of
customers in not only their own stores, but also thousands of other retailers. The company teamed up with Mu
Sigma to collect and analyze data on shoppers’ behavior, and provide the insights it finds to other retailers in
benchmarking reports.
COMPANY Time Warner Cable EMPLOYEES 34,000 INDUSTRY Entertainment
Purpose:
With services like Hulu and Netflix competing for viewers’ attention, Time Warner collects data on how
frequently customers tune in, the effect of bandwidth on consumer behavior, customer engagement and peak
usage times in order to improve their service and increase profits. The company also segments its customers
for advertisers by correlating viewing habits with public data—such as voter registration information—in
order to launch highly targeted campaigns to specific locations or demographics.
COMPANY Nestlé EMPLOYEES >330,000 INDUSTRY Food & Beverage
Purpose
Customer complaints and PR crises have become more difficult to handle thanks to social media. To better
keep track of customer sentiment and what is being said about the company online, Nestle created a 24/7
monitoring centre to listen to all of the conversations about the company and its products on social media. The
company will actively engage with those that post about them online in order to mitigate damage and build
customer loyalty.
COMPANY McDonald’s EMPLOYEES >750,000 INDUSTRY Food & Beverage
Purpose:
McDonalds tracks vast amounts of data in order to improve operations and boost the customer experience.
The company looks at factors such as the design of the
drive-thru, information provided on the menu, wait times, the size of orders and ordering patterns in order to
optimize each restaurant to its particular market.
COMPANY Starbucks Coffee EMPLOYEES 160,000 INDUSTRY Food & Beverage
Purpose:
Starbucks collects data on its customers’ purchasing habits in order to send personalized ads and coupon
offers to the consumers’ mobile phones. The company also identifies trends indicating whether customers are
losing interest in their product and directs offers specifically to those customers in order to regenerate interest.
3. CUSTOMER SEGMENTATION
By accessing data about the consumer from multiple sources, such as social media data and transaction
history, companies can better segment and target their customers and start to make personalized offers to those
customers.
COMPANY Heineken EMPLOYEES 64,252 INDUSTRY Food & Beverage
Purpose:
Thanks to its partnerships with Google and Facebook,Heineken has access to vasts amounts of data aboutits
customers that it uses to create real-time,personalized marketing messages. One project provides real-time
content to fans who happen to be watching a sponsored event.
COMPANY Walmart EMPLOYEES 2,000,000 INDUSTRY Retail
Purpose:
Walmart combines public data, social data and internal datato monitor what customers and friends of
customers aresaying about a particular product online. The retailer usesthis data to send targeted messages
about the product, andto share discount offers. Walmart also uses data analysis toidentify the context of an
online message, such as if a reference to “salt” is about the movie or the condiment.Purpose:
COMPANY Spotify EMPLOYEES 5,000 INDUSTRY Entertainment
Purpose:
Spotify uses data from user profiles and users’ playlists, and historical data on music played to provide
recommendations for each user. By combining data from millions of users, Spotify is able to make
recommendations even if a particular user doesn’t have an extensive history with the site.
COMPANY Nordstrom EMPLOYEES 48,000 INDUSTRY Retail
Purpose:
Nordstrom collects data from its website, social media, transactions and customer rewards program in order to
create customized marketing messages and shopping experiences for each customer, based on the products
and channels that customer prefers.
COMPANY Intercontinental Hotel Group EMPLOYEES 7,981 INDUSTRY Hotel/Travel
Purpose:
IHG collects extensive data about their customers in order to provide a personalized web experience for each
customer, so as to boost conversion rates. It also uses data analytics to evaluate and adjust its marketing mix
4. PREDICTIVE SUPPORT
Through sensors and other machine-generated data, companies can identify when a malfunction is likely to
occur. The company can then preemptively order parts and make repairs in order to avoid downtime and lost
profits.
COMPANY Southwest Airlines EMPLOYEES >45,000 INDUSTRY Travel
Purpose:
Southwest analyses sensor data on their planes in order to identify patterns that indicate a potential
malfunction or safety issue. This allows the airline to address potential problems and make necessary repairs
without interrupting flights or putting passengers in danger.
COMPANY Engine Yard EMPLOYEES 130 INDUSTRY Cloud Storage
Purpose:
Engine yard provides big data analytics to its users, so they can monitor the performance of applications in
real time, pinpoint problems with the infrastructure and optimize the platform to correct performance issues.
COMPANY Engine Yard EMPLOYEES 130 INDUSTRY Cloud Storage
Purpose:
Engine yard provides big data analytics to its users, so they can monitor the performance of applications in
real time, pinpoint problems with the infrastructure and optimize the platform to correct performance issues.
COMPANY Union Pacific Railroad EMPLOYEES 44,000
INDUSTRY Transportation
Purpose:
With predictive analytics and tools such as visual sensors and thermometers, Union Pacific can detect
imminent problems with railway tracks in order to predict potential derailments days before they would likely
occur. So far the sensors have reduced derailments by 75 percent.

COMPANY Morgan Stanley EMPLOYEES 60,000 INDUSTRY Financial

Purpose:
Morgan Stanley uses real-time wire data analytics to detect problems in its applications and prioritize which
issues should be addressed first. The company also uses big data to determine the impact of a particular
market event, as well as its original cause.

COMPANY Purdue University INDUSTRY Education EMPLOYEES 40,000 students 6,600 staff

Purpose:
Purdue University uses big data analytics for a unique kind of predictive support. Its system predicts academic
and behavioral issues so that students and teachers can be notified when changes need to be made in order for
the student to be successful.

5. MARKET BASKET ANALYSIS & PRICING OPTIMIZATION

By quickly pulling data together from multiple sources, retailers can better optimize their product selection
and pricing, as well as decide where to target ads.

COMPANY Procter & Gamble EMPLOYEES 129,000 INDUSTRY Household Retail

P&G uses simulation models and predictive analytics in order to create the best design for its products. It
creates and sorts through thousands of iterations in order to develop the best design for a disposable diaper,
and uses predictive analytics to determine how moisture affects the fragrance molecules in dish soap, so the
right fragrance comes out at the right time in the dishwashing process.

COMPANY Etihad EMPLOYEES more than 9,000 INDUSTRY Travel

As Etihad Airways seeks to expand internationally, it uses big data to determine which destinations and
connections should be added in order to maximize revenue.
COMPANY Coca-Cola Co. EMPLOYEES 146,200 INDUSTRY Food

Coca-Cola uses an algorithm to ensure that its orange juice has a consistent taste throughout the year. The
algorithm incorporates satellite imagery, crop yields, consumer preferences and details about the flavours that
make up a particular fruit in order to determine how the juice should be blended.

6. PREDICT SECURITY THREATS

Big data analytics can track trends in security breaches and allow companies to proactively go after threats
before they strike.

COMPANY Rabobank EMPLOYEES 27,000 INDUSTRY Finance

Rabobank analysed criminal activities at ATMs to determine factors that increased the risk of becoming
victimized. It discovered that proximity to highways, weather conditions and the season all affect the risk of a
security threat.

COMPANY Amazon EMPLOYEES110,000 INDUSTRY Online Retail

With more than 1.5 billion items in its catalog, Amazon has a lot of product to keep track of and protect. It
uses its cloud system, S3, to predict which items are most likely to be stolen, so it can better secure its
warehouses.

7. FRAUD DETECTION

Financial firms use big data to help them identify sophisticated fraud schemes by combining multiple points
of data.

COMPANY Zion’s Bank EMPLOYEES 2,700 INDUSTRY Finance

Zions Bank uses data analytics to detect anomalies across channels that indicate potential fraud. The fraud
team receives data from 140 sources—some in real-time—to monitor activity, such as if a customer makes a
mobile banking transaction at the same
time as a branch transaction.

COMPANY Discovery Health EMPLOYEES 5,000 INDUSTRY Insurance

Discovery Health uses big data analytics to identify fraudulent claims and possible fraudulent prescriptions.
For example, it can identify if a healthcare provider is charging for a more expensive procedure than was
actually performed

COMPANY Discovery Health EMPLOYEES 5,000 INDUSTRY Insurance

Discovery Health uses big data analytics to identify fraudulent claims and possible fraudulent prescriptions.
For example, it can identify if a healthcare provider is charging for a more expensive procedure than was
actually performed

COMPANY Memorial Healthcare INDUSTRY Healthcare EMPLOYEES Enterprise

Memorial Health Care uses data analytics to vet vendors and to uncover unethical activities, such as bid
rigging.

8. INDUSTRY SPECIFIC
Virtually every industry has invested in big data to help solve specific challenges those industries face.
Healthcare, for example, uses big data to improve patient outcomes, and agriculture uses data to boost crop
yields.

COMPANY Kayak EMPLOYEES 101 INDUSTRY Travel

Kayak uses big data analytics to create a predictive model thattells users if the price for a particular flight will
go up or down within the next week. The system uses one billion search queries to find the cheapest flights, as
well as popular destinations andthe busiest airports. The algorithm is onstantly improved by tracking the
flights to see if its predictions are correct.

COMPANY Aurora Health Care EMPLOYEES 30,000 INDUSTRY Health Care

Aurora collects internal as well as national data in order to create a benchmark for healthcare quality. It also
analyzes data on groups of patients with similar medical conditions, to reveal trends in the diseases and to
identify the right candidates for medical research. Finally, the real-time data analysis allows Aurora to predict
and improve patient outcomes ,and so far has reduced readmissions by 10 percent.

COMPANY Catalyst IT EMPLOYEES 130+ INDUSTRY H.R/Recruiting Technology

Catalyst IT Services built a program to screen job candidates based on how the candidate completed a survey.
The program collects thousands of data points, such as how the candidate approaches a difficult question to
determine how the candidate works. Since implementing the program, employee turnover at the company has
been reduced to 15 percent.

COMPANY Shell EMPLOYEES 87,000 INDUSTRY Oil

Shell uses sensor data to map its oil and gas wells in order to increase output and boost the efficiency of its
operations. The data received from the sensors is analyzed by artificial intelligence and rendered in 3D and
4D maps.

COMPANY John Deere EMPLOYEES 60,000 INDUSTRY Farming

Sensors placed on John Deere equipment, along with historical and real-time data on soil conditions, the
weather and crop features are all used together to help farmers determine where and when to plant to get the
highest yield, and how to boost the efficiency of their work to reduce fuel costs.

CHARACTERISTICS OF BIG DATA APPLICATIONS

The availability of a low-cost high-performance computing framework either allows more users to develop
these applications, run larger deployments, or speed up the execution time. The big data approach is mostly
suited to addressing or solving business problems that are subject to one or more of the following criteria:

1. Data throttling: The business challenge has existing solutions, but on traditional hardware, the
performance of a solution is throttled as a result of data accessibility, data latency, data availability, or
limits on bandwidth in relation to the size of inputs.

2. Computation-restricted throttling: There are existing algorithms, but they are heuristic and have
not been implemented because the expected computational performance has not been met with
conventional systems.
3. Large data volumes: The analytical application combines a multitude of existing large datasets and
data streams with high rates of data creation and delivery.

4. Significant data variety: The 11 data in the different sources vary in structure and content, and
some (or much) of the data is unstructured.

5. Benefits from data parallelization: Because of the reduced data dependencies, the application’s
runtime can be improved through task or thread-level parallelization applied to independent data
segments.

These criteria can be used to assess the degree to which business problems are suited to big data technology.
ETL processing is hampered by data throttling and computation throttling, can involve large data volumes,
may consume a variety of different types of datasets, and can benefit from data parallelization. This is the
equivalent of a big data “home run” application!

INDUSTRY EXAMPLES OF BIG DATA

BIG DATA APPLICATIONS IN DIFFERENT SECTORS

In this era where every aspect of our day-to-day life is gadget-oriented, there is a huge volume of data that has
been emanating from various digital sources.

Needless to say, we have faced a lot of challenges in the analysis and study of such a huge volume of data
with traditional data processing tools. To overcome these challenges, some big data solutions were introduced
such as Hadoop. These big data tools helped realize the applications of big data.

In this blog, we will cover the following Big Data applications used in various sectors:

● Big Data in Education Industry


● Big Data in Healthcare Industry
● Big Data in Government Sector
● Big Data in Media and Entertainment
● Big Data in Weather Patterns
● Big Data in Transportation Industries
● Big Data in Banking Sector
● Big Data in Marketing
● Big Data in Business Insights
● Big Data in Space Sector
● Conclusion

More and more organizations, both big and small, are leveraging the benefits provided by big data
applications. Businesses find that these benefits can help them grow fast. There are lots of opportunities
coming in this area, want to become a master in Big Data check out this Big Data Training.

Big Data in Education Industry


The education industry is flooded with huge amounts of data related to students, faculty, courses, results, and
whatnot. Now, we have realized that proper study and analysis of this data can provide insights that can be
used to improve the operational effectiveness and working of educational institutes.

Following are some of the fields in the education industry that has been transformed by big data-motivated
changes:

Customized and Dynamic Learning Programs

Customized programs and schemes to benefit individual students can be created using the data collected based
on each student’s learning history. This improves the overall student results.

Reframing Course Material

Reframing the course material according to the data that is collected based on what a student learns and to
what extent by real-time monitoring of the components of a course is beneficial for the students.

Grading Systems

New advancements in grading systems have been introduced as a result of a proper analysis of student data.

Career Prediction

Appropriate analysis and study of every student’s records will help understand each student’s progress,
strengths, weaknesses, interests, and more. It would also help in determining which career would be the most
suitable for the student in the future.

The applications of big data have provided a solution to one of the biggest pitfalls in the education system,
that is, the one-size-fits-all fashion of academic set-up, by contributing to e-learning solutions.

Example of big data in the Education Industry

The University of Alabama has more than 38,000 students and an ocean of data. In the past when there were
no real solutions to analyze that much data, some of them seemed useless. Now, administrators can use
analytics and data visualizations for this data to draw out patterns of students revolutionizing the university’s
operations, recruitment, and retention efforts.

Prepare yourself for the industry by going through this Hadoop Interview Questions And Answers!

Big Data in Healthcare Industry

Healthcare is yet another industry that is bound to generate a huge amount of data. Following are some of how
big data has contributed to healthcare:

● Big data reduces the costs of a treatment since there are fewer chances of having to perform
unnecessary diagnoses.
● It helps in predicting outbreaks of epidemics and also in deciding what preventive measures could be
taken to minimize the effects of the same.
● It helps avoid preventable diseases by detecting them in the early stages. It prevents them from getting
any worse which in turn makes their treatment easy and effective.
● Patients can be provided with evidence-based medicine identified and prescribed after researching past
medical results.

Example of Big Data In Healthcare

Wearable devices and sensors have been introduced in the healthcare industry which can provide real-time
feed to the electronic health record of a patient. One such technology is Apple.

Apple has come up with Apple HealthKit, CareKit, and ResearchKit. The main goal is to empower iPhone
users to store and access their real-time health records on their phones.

Big Data in Government Sector


Governments, be it of any country, come face to face with a huge amount of data almost daily. The reason for
this is, they have to keep track of various records and databases regarding their citizens, their growth, energy
resources, geographical surveys, and many more. All this data contributes to big data. The proper study and
analysis of this data, hence, helps governments in endless ways. A few of them are as follows:

Welfare Schemes

● In making faster and more informed decisions regarding various political programs
● To identify areas that are in immediate need of attention
● To stay up to date in the field of agriculture by keeping track of all existing land and livestock.
● To overcome national challenges such as unemployment, terrorism, energy resources exploration, and
much more.

Cyber Security

● Big Data is hugely used for deceit recognition in the domain of cyber security.
● It is also used in catching tax evaders.
● Cyber security engineers protect networks and data from unauthorized access.

Example

The Food and Drug Administration (FDA) which runs under the jurisdiction of the Federal Government of the
USA leverages the analysis of big data to discover patterns and associations to identify and examine the
expected or unexpected occurrences of food-based infections.

Go through the Hadoop Course in New York to get a clear understanding of Big Data Hadoop!
Big Data in Media and Entertainment Industry

With people having access to various digital gadgets, the generation of a large amount of data is inevitable and
this is the main cause of the rise in big data in the media and entertainment industry.

Other than this, social media platforms are another way in which a huge amount of data is generated.
Although businesses in the media and entertainment industry have realized the importance of this data, they
have been able to benefit from it for their growth.

Some of the benefits extracted from big data in the media and entertainment industry are given below:

● Predicting the interests of audiences


● Optimized or on-demand scheduling of media streams in digital media distribution platforms
● Getting insights from customer reviews
● Effective targeting of the advertisements

Example

Spotify, on-demand music-providing platform, uses Big Data Analytics, collects data from all its users around
the globe, and then uses the analyzed data to give informed music recommendations and suggestions to every
individual user.

Amazon Prime which offers, videos, music, and Kindle books in a one-stop shop is also big on using big data.
Big Data in Weather Patterns

There are weather sensors and satellites deployed all around the globe. A huge amount of data is collected
from them, and then this data is used to monitor the weather and environmental conditions.

All of the data collected from these sensors and satellites contribute to big data and can be used in different
ways such as:

● In weather forecasting
● To study global warming
● In understanding the patterns of natural disasters
● To make necessary preparations in the case of crises
● To predict the availability of usable water around the world

Example

IBM Deep Thunder, which is a research project by IBM, provides weather forecasting through
high-performance computing of big data. IBM is also assisting Tokyo with improved weather forecasting for
natural disasters or predicting the probability of damaged power lines.

Want to become a master in Big Data technologies? Check out this Hadoop Training in Toronto!
Big Data in Transportation Industry

Since the rise of big data, it has been used in various ways to make transportation more efficient and easy.
Following are some of the areas where big data contributes to transportation.

● Route planning: Big data can be used to understand and estimate users’ needs on different routes and
multiple modes of transportation and then utilize route planning to reduce their wait time.
● Congestion management and traffic control: Using big data, real-time estimation of congestion and
traffic patterns is now possible. For example, people are using Google Maps to locate the least
traffic-prone routes.
● The level of traffic: Using the real-time processing of big data and predictive analysis to identify
accident-prone areas can help reduce accidents and increase the safety level of traffic.

Example

Let’s take Uber as an example here. Uber generates and uses a huge amount of data regarding drivers, their
vehicles, locations, every trip from every vehicle, etc. All this data is analyzed and then used to predict supply,
demand, location of drivers, and fares that will be set for every trip.

And guess what? We too make use of this application when we choose a route to save fuel and time, based on
our knowledge of having taken that particular route sometime in the past. In this case, we analyzed and made
use of the data that we had previously acquired on account of our experience, and then we used it to make a
smart decision. It’s pretty cool that big data has played parts not only in big fields but also in our smallest
day-to-day life decisions too.

Also, learn about Use Cases of Data Analytics in Formula One Racing from our blog.
Big Data in Banking Sector

The amount of data in the banking sector is skyrocketing every second. According to the GDC prognosis, this
data is estimated to grow 700 percent by the end of the next year. Proper study and analysis of this data can
help detect any illegal activities that are being carried out such as:

● Misuse of credit/debit cards


● Venture credit hazard treatment
● Business clarity
● Customer statistics alteration
● Money laundering
● Risk mitigation

Example

Various anti-money laundering software such as SAS AML uses Data Analytics in Banking to detect
suspicious transactions and analyze customer data. Bank of America has been a SAS AML customer for more
than 25 years.

Big Data in Marketing

Traditional marketing techniques were based on the survey and one-on-one interactions with the customers.
Companies would run advertisements on radios, TV channels, and newspapers, and put huge banners on the
roadside. Little did they know about the impact of their ads on the customer.
With the evolution of the internet and technologies like big data, this field of marketing also went digital,
known as Digital Marketing. Today, with big data, you can collect huge amounts of data and get to know the
choices of millions of customers in a few seconds. Business Analysts analyze the data to help marketers run
campaigns, increase click-through rates, put relevant advertisements, improve the product, and cover the
nuances to reach the desired target.

For example, Amazon collected data about the purchase done by millions of people around the world. They
analyzed the purchase patterns and payment methods used by the customers and used the results to design
new offers and advertisements.

Big Data in Business Insights

One of the best Big Data applications we can see in modern industries is generating business insights. Around
60 percent of the total data collected by various enterprises and social media websites is either unstructured or
didn’t get analyzed by them. This data if used correctly, can solve a lot of problems related to profits,
customer satisfaction, and product development. Luckily, companies are now getting aware of the importance
of using the latest technologies to manage and analyze this data more effectively.

One of the companies named Netflix is using Big Data to understand the user behavior, the type of content
they like, popular movies on the website, similar content that can suggest to the user, and which series or
movies should they invest in.

Big Data in Space Sector

Space agencies of different countries collect huge amounts of data every day by observing outer space and
information received from satellites orbiting the earth, probes studying outer space, and rovers on other
planets. They analyze petabytes of data and use them to simulate the flight path before launching the actual
payload in space. Before launching any rocket, it is necessary to run complex simulations and consider
various factors like weather, payload, orbit location, trajectory, etc.

For example, NASA is collecting data from different satellites and rovers about the geography, atmospheric
conditions, and other factors of mars for their upcoming mission. It uses big data to manage all that data and
analyzes that to run simulations.

Conclusion

In this blog, we have seen some of the applications of big data in the real world.

No wonder, there is so much hype for big data, given all of its applications. The importance of big data lies in
how an organization is using the collected data and not in how much data they have been able to collect.
There are Big Data solutions that analyze big data easily and efficiently. These Big Data solutions are used to
gain benefits from the heaping amounts of data in almost all industry verticals.

WEB ANALYTICS

What is Web Analytics?

Web Analytics is the process of collecting, processing, and analyzing website data.

With Web analytics, we can truly see how effective our marketing campaigns have been, find problems in
our online services and make them better, and create customer profiles to boost the profitability of
advertisement and sales efforts.
Every successful business is based on its ability to understand and utilize the data provided by its customers,
competitors, and partners.

Benefits of Web Analytics


1. Measure online traffic
Web analytics will tell you:
● How many users and visitors you have on your website at any given time.
● Where do they come from?
● What are they doing on the website?
● How much time are they spending on the website?
The analytics will divide all the sources of traffic and website conversions in an easily understandable way.
Analyzing the data provided, a company will recognize which activities produce the most profit to the bottom
line.

For example, we learned through data the effects of ranking higher on Google Search on a niche online store.
The analytics tracks how organic and paid traffic has been developing over time in real-time, and this will
help a company invest their time and money more effectively.
2. Tracking Bounce Rate

Bounce Rate in analytics means that a user who has visited the website leaves without interacting with it.

● A high bounce rate might tell us the following:


● The users didn’t feel that content was for them, or it didn’t match well with the search query.

A weak user experience overall. When a high bounce rate occurs on a website, it’s hard to expect a website to
produce quality leads, sales, or any other conversions related to business. Tracking and improving the user
experience and making sure that the content is what the users want will lower the bounce rate and increase the
profitability of the website. Tracking different exit pages from the analytics will show the worst performing
pages in the business.
3. Optimizing and Tracking of Marketing Campaigns

For different marketing campaigns, online or offline can create unique and specific links that can be tracked.
Tracking these unique links will provide you with details on how these marketing campaigns have been
received by the users and if it’s been profitable. By tracking everything possible, you will find potentially
highly returning campaigns to invest more and cancel campaigns that are performing poorly.
Create easily unique links with Google Campaign URL Builder. Unique links also allow tracking
offline-to-online campaigns. For example, a business could share a unique link in an event or utilize the link
in mailing campaigns whose effects could be tracked online.
4. Finding the Right Target Audience and its Capitalization

In marketing, it’s crucial to find the right target audience for your products and services. An accurate target
group will improve the profitability of marketing campaigns and leave a positive mark on the company itself.
Web analytics will provide companies with information to create and find the right target audiences. Finding
the audience will help companies create marketing materials that leave a positive feeling to their customers.
The right marketing campaigns to the right audiences will increase sales, conversions, and make a website
better.
5. Improves and Optimizes Website and Web Services

With web analytics, a company will find potential problems on its website and its services. For example, a bad
and unclear sales funnel on an online store will decrease the number of purchases, thus declining revenue.
Users must find the right content at the right time when they are on the site. Creating specific landing pages
for different purposes could also help. Tracking the performance of the mobile versions is an example of how
to make a better experience for the users.
6. Conversion Rate Optimization (CRO)

Only through the utilization of web analytics can websites improve their conversion optimizations. The goal
of CRO is to make users do tasks assigned to them. The conversion rate is calculated when received goals are
divided by the number of users. There are many conversions a website should measure, and every business
should measure those that are most important to their business.

A list of few conversion anyone can start with:


● Every step of a sales funnel (add to carts, purchases, product views, etc.)
● Leads
● Newsletter sign ups
● Registrations
● Video Views
● Brochure downloads
● Clicks on text-links
● Bids and offers
● Event registrations
● Spent time on a website
● Shares on social media
● Contacts from contact forms.
Improving conversion rates with web analytics, a company will improve its website’s profitability and return
on investment. Taking conversion optimization into consideration is always necessary, especially when
increasing visitors won’t cut it anymore. It’s impossible to grow the number of users forever.
7. Tracking business goals online

A thriving business and its website have to have clear goals it tries to achieve. With web analytics, companies
can create specific goals to track. Measuring goals actively allows reacting faster to certain events through
data.

As important as creating goals is, it’s also important to know what goals any given business should track. Not
every goal online is created equal, thus tracking too many goals could become an issue for a business. Always
track goals that measure the effectiveness, profitability, and weaknesses of certain events.

8. Improve the results from Google Ads and Facebook ads

Analytics has a major role when it comes to managing online advertisements. The data tells us how much the
online advertisements have produced clicks, conversions, and how the ads have been received by the target
audience.

For example, discovering through data which are the most common mistakes of Google Ads, can drastically
improve your results and increase the efficiency of your ads.
Efficient data collection will increase the results of online advertisements. Web analytics enable the use of
remarketing in advertisements.

9. Starting is easy

For most companies and websites, the use of Google analytics will be enough. Google Analytics is a free tool
for web analytics that is fairly simple to install on any platform. Google Analytics will give you quickly an
overview of how your online business is performing.

10. New Creative Ideas

Analyzing data gives a unique opportunity to find new perspectives within your business model.

Tracking your data will provide you with more insights about trends and customer experiences within your
business. These opportunities could potentially be seeds for growth internally and organically.

For example, a newly written article that brings more organic traffic compared to the rest of the site. Knowing
this early on could shift your marketing efforts into a more profitable path.

Free Tools of Web Analytics

In web analytics, there are a multitude of different tools with complex purposes for tracking anything online.
There are free and paid tools for tracking general traffic and even more specific goals. The most common
tools like Google Analytics and Google Search Console should be used by everyone that runs a website.
Using other tools requires more thought if they are really necessary for a business. The more data a company
has doesn’t equal better and improved results. The worst-case scenario is that excess data will lead to bad
decisions.

Choose only the tools you need to achieve a goal. Google Analytics is the foundation of web analytics.
Google Analytics (GA) is the most important tool for websites to start collecting online data for web
analytics.

Without the tool, it can become extremely hard to understand the following:

● Time spent on the website.


● What is done on the website?
● Breakdown of traffic sources.
● Effectiveness of marketing campaigns.
The faster Google Analytics is installed, the faster a business can create data-driven decisions that drive their
online businesses.

Passively collecting data helps to recognize how business decisions have impacted the bottom line online. It’s
important to understand what actions have produced the most for the company and reproduce and improve
them.

In the image below, we studied how different digital marketing investments have produced traffic to our
website. Then compared it to a time before the investments. Combining the data with our current goals, we
could create decisions where and how much to invest next.

Google Tag Manager

Google Tag Manager (GTM) is a fairly simple to use tool that allows installing various web analytics and
marketing tools and their management without coding. Tag Manager enables a quick way to measure the
website’s events that can be used in analytics. Managing multiple scripts or tags on a website can become
overwhelming and time-consuming quickly, therefore using GTM comes recommended.

Installing Google Tag Manager also improves site speed, because instead of multiple tags and scripts, there
would be only one. An added benefit to the use of GTM is that it removes the need to be in contact with the
website developer every time you might want to test new tags.

Facebook Pixel

Facebook Pixel is a valuable tool for web analytics. It provides more data with a different perspective for web
analytics.

Pixel easily tracks important events like purchases, leads, revenue, and many more. Pixel is used for creating
better Facebook advertising campaigns. The more data has been collected with Pixel, the better audiences for
advertisement can be created. The data will make Facebook advertising more efficient.
The Pixel also enables the use of Facebook Analytics dashboards. Currently, you will need to use Facebook
Insights, which doesn’t need the Pixel to get started. Look-a-like audiences in Facebook marketing require the
use of the Pixel and, it also enables remarketing.

Hotjar

The tool Hotjar presents visual data on how users behave on a website. It will provide a better understanding
of your users when real-time data is present. The heatmaps of Hotjar, for example, tells you the following:
How a user has reacted to certain elements.Does the user react the way the website was designed? How does
the user react to the goals given to them?

Using the tool will help you quickly see what works and what doesn’t to make your website a better
experience for the users. Through Hotjar, it’s possible to create queries that can be optimized for different
target groups. The collected data will help in planning the necessary changes.

The tool is only free until the first 2000 page views á day, though. (2020)

Web Analytics as a Service

In our service for web analytics, we start by studying your company and your current goals. Depending on the
availability of the data, we start by either analyzing your website and its traffic or installing the necessary
tools to start collecting data.

Next, we design and provide your business with actionable ideas to help you develop a better business online.
Our goal is to find ideas that can bring your company to the next level today and tomorrow.

Through A/B testing, we find agile, sustainable, and creative ideas to match your goals. Every result we find
will be told precisely and transparently.

We help companies find their core strengths online and develop them.

We combine online data that matters:

● Google and Facebook Ads


● Organic traffic to improve search engine optimization.
● Social media marketing activity and traffic
● Brand monitoring online
● Website analytics
● Email marketing analytics
● Engaio Digital consults with web analytics and provides companies with details on how to grow online
and improve its current activities.

BIG DATA AND MARKETING

BIG DATA AND MARKETING


What is big data marketing?
In marketing, big data comprises gathering, analyzing, and using massive amounts of digital information to
improve business operations.
Big Data and Integrated Marketing Management strategy make Impact in the areas:
Customer engagement. Big data can deliver insight into not just who your customers are, but where
they are, what they want, how they want to be contacted and when.
Customer retention and loyalty. Big data can help you discover what influences customer loyalty and
what keeps them coming back again and again.
Marketing optimization/performance. With big data, you can determine the optimal marketing spend
across multiple channels, as well as continuously optimize marketing programs through testing,
measurement and analysis.

How Big Data is transforming marketing and sales?


In marketing, big data comprises gathering, analyzing, and using massive amounts of digital information to
improve business operations, such as:
● Getting a 360-degree view of their audiences. The concept of “know your customer” (KYC) was
initially conceived many years ago to prevent bank fraud. KYC provides insight into customer
behavior that was once limited to large financial institutions. Now, because of the accessibility of big
data, the benefits of KYC are available to even small and medium businesses, thanks to cloud
computing and big data.
● Customer engagement, specifically how your customers view and interact with your brand, is a key
factor in your marketing efforts. Big data analytics provides the business intelligence you need to bring
about positive change, like improving existing products or increasing revenue per customer.
● Brand awareness is another way big data can have a significant impact on marketing. Aberdeen
Group’s Data-Driven Retail study showed that “data-driven retailers enjoy a greater annual increase in
brand awareness by 2.7 times (20.1% vs. 7.4%) when compared to all others.”
● The 360-degree view from big data allows marketers to present customer-specific content when and
where it is most effective to improve online and in-store brand recognition and recall. Big data allows
you to be the Band-Aid of your product category even if you don’t have the marketing budget of
Johnson & Johnson.
● Improved customer acquisition is another great benefit that big data brings to marketing. A McKinsey
survey found that “intensive users of customer analytics are 23 times more likely to clearly outperform
their competitors in terms of new customer acquisition.” Leveraging the cloud allows for the gathering
and analysis of consistent and personalized data from multiple sources, such as web, mobile
applications, email, live chat, and even in-store interactions.
● Big data can help marketers leverage real-time data in cloud computing environments. The ability of
big data to acquire, process, and analyze real-time data quickly and accurately enough to take
immediate and effective action cannot be matched by any other technology. This is critical when
analyzing data from GPS, IoT sensors, clicks on a webpage, or other real-time data.
● Big data analytics is an essential component of big data. It provides business intelligence that results in
time and cost savings by optimizing marketing performance.
Three types of big data for marketers
Marketers are interested in three types of big data: customer, financial, and operational. Each type of data is
typically obtained from different sources and stored in different locations.
1. Customer data helps marketers understand their target audience. The obvious data of this type are facts
like names, email addresses, purchase histories, and web searches. Just as important, if not more so, are
indications of your audience’s attitudes that may be gathered from social media activity, surveys, and
online communities.
2. Financial data helps you measure performance and operate more efficiently. Your organization’s sales
and marketing statistics, costs, and margins fall into this category. Competitors’ financial data such as
pricing can also be included in this category.
3. Operational data relates to business processes. It may relate to shipping and logistics, customer
relationship management systems, or feedback from hardware sensors and other sources. Analysis of
this data can lead to improved performance and reduced costs.
Real-life examples of big data in marketing
Use cases for big data possibilities are inspirational, but what does big data in marketing look like in the real
world? These examples show how three companies improved their marketing success using big data.
Elsevier uses big data to streamline a marketing calendar
Elsevier is the world’s largest provider of scientific, technical, and medical information, publishing 430,000
peer-reviewed research articles annually.
Big data and a multi-cloud environment provide an efficient way to closely track journals and books
throughout their lifecycle and more effectively schedule resources to streamline production and support
marketing. Those articles come from a wide variety of resources across the global organization. Combining
big data from multiple clouds and sources across the globe merges many regional marketing efforts into a
single global marketing message strategy.
DMD Marketing Corp. outperforms competition 3x with big data
DMD Marketing Corp. offers the only authenticated database available that can reach, report, and respond to
the dynamic digital behavior of more than six million fully opted-in U.S. healthcare professionals. To date,
DMD has deployed more than 300 million emails and 30,000 email marketing campaigns.
Given that marketing emails to healthcare professionals is a very competitive commodity business, big data
gives DMD a way to differentiate. Using cloud-based big data integration tools, DMD refreshes email data
every day, rather than every three days, which helps the company outpace the competition with 95% email
deliverability.
Big data Gives Beachbody near Real-time user behavior to reduce customer churn
Beachbody provides world-class fitness, nutrition, motivation, and support to over 23 million customers.
Their business is all about the customer experience; keeping people motivated and matching them with the
content that keeps them coming back for more.
You may be familiar with Beachbody’s on-demand videos, but they also offer live sessions at gyms. Big data
has enabled the company to acquire near real-time consumer behavior in fitness centers. Combined with
analysis from online data sources, Beachbody’s big data allows the brand to create more personalized offers
for customers and decrease customer churn.
See why Talend was named a Leader in the 2021 Magic Quadrant™ for Data Integration Tools for the sixth
year in a row

Challenges of big data in marketing


Beachbody leveraged the customer 360 view to better understand their customers. While it is one of the
benefits of big data, it is also one of the most challenging to get right.
While 88 percent of IT leaders believe their organization truly understands its customers, only 61 percent of
consumers feel companies understand their needs. Clearly, there is a disconnect between these perceptions
that must be addressed.
1. Disparate data systems
One possible cause of the disconnect is the time to acquire data from a variety of sources. Users’ perceptions
are immediate, so the greater the lag in data acquisition time the greater the disconnect. This is especially
challenging for marketers because the disconnect time makes customer personalization less effective.
Organizations often have a mix of systems that store and process their data. Gathering data from these
disparate systems, often through multiple channels, is a challenge that can easily delay data analysis,
compromise security and compliance, and hinder efficiency.
One way to address this is with customer master data management. Customer MDM is a method to link all
customer data to a single golden record that provides a 360-degree view of the customer, and then share that
information where and when needed. This greatly decreases the time to acquisition.
2. Streaming data sources
The challenges in acquiring data are even greater in the case of streaming data. IoT systems can have
hundreds of sensors, so the quantity of streaming data can be quite demanding, even on big data systems. In
addition to acquiring the data, you also need real-time event processing to make use of it. As marketers invest
more and more in the possibility of reaching a target audience through IoT devices, they need cloud-native big
data tools to effectively handle the influx of streaming data.
Some streaming data, like GPS, website clicks, and video viewer interaction, are directly related to customer
behaviors that provide essential marketing data. These challenges can be addressed using tools that are
currently available on major cloud platforms like AWS, Azure, and Google Cloud, allowing marketers to get
the full benefit of streaming data from these big data cloud platforms.
3. Cross-department cooperation
The three elements of any successful transformation are people, process, and technology. Technology is not
the only challenge with big data in marketing. Big data adoption requires the involvement of different teams
within an organization. Yet each team requires its own view and has its own use of the data.
Marketers can only benefit from big data if analysis of that data is accessible and efficient. Big data and
multi-cloud environments make that possible. It allows IT and other data management departments to use
their own tools in their own environments, while making crucial information accessible to other departments.
This is very evident in comparing the IT and business teams. IT teams need complex tools with extensive user
interfaces. Business teams need focused, simple, yet powerful tools. There is no compromise that will work
for both teams. Separate tools must run for each team to work effectively.
Without a single tool to meet the needs of different teams, we need multiple tools to communicate with each
other, known as collaborative data management. The CDM system allows different teams to share, operate,
and transfer data, each using a user interface that suits their specific needs. This allows each team to use the
tools they need while maintaining data quality.
How the cloud is driving big data for marketing
It is hard to imagine practical implementations of big data in any industry without cloud computing. Big
data’s demand for compute power and data storage are difficult to meet without the on-demand, self-service,
pooled resource, and elastic characteristics of cloud computing. Beyond those basic characteristics,
innovations in cloud computing continue to provide benefits to marketing initiatives using big data.
As it does for big data, cloud computing facilitates the use of virtual machines and containers. This provides
portability of workloads that would not be possible without the cloud. It gives marketing teams the flexibility
to move workloads, avoid vendor lock-in, reduce costs, and innovate new solutions that physical
infrastructure cannot provide.
In addition to the benefits inherent in cloud technology, cloud service providers like AWS, Azure, and Google
Cloud provide extensive marketplaces that make it easy to buy, install, and run big data tools for marketing.
While the “one-click” simplicity often touted may be a bit of an exaggeration, many of these tools can be up
and running in a matter of minutes.
Getting started with big data in marketing
Big data gives us eyes and ears into our marketing initiatives. It captures insights into our prospects and
customers at a level of detail never before possible. We can respond to real-time audience actions and drive
customer behavior in the moment. Big data is transforming marketing and sales in ways that were
unachievable just a few years ago.
Talend Master Data Management combines the power of MDM and data integration to deliver a single view
of your data across internal and external sources in real time. By creating and sharing unified 360 views of
data records, you can make the right decisions for your business at the right time, all the time. It allows you to
develop business cases based on clear and quantifiable business benefits and concrete operational outcomes.
Marketers today have the tools and know-how to launch highly effective big data marketing efforts, enabled
by cloud technology that lets us do it quickly and relatively easily at a reasonable cost. There will be
challenges, but there is a collection of lessons learned on how to tackle those challenges. AWS, Azure, and
Google have been proactive in facilitating big data initiatives to make the effort even easier.
Talend Data Fabric allows you to integrate and analyze data from almost any source, and pre-built connectors
to applications like Salesforce, Marketo, SAP, and Netsuite make building those connections incredibly easy.
Built-in data quality and governance functions mean you are using the best data to create the most trustworthy
insights. There has never been a better time to leverage big data in marketing. Download Talend Data Fabric
today to transform your customer experience.

FRAUD AND BIG DATA

Fraud Detection and Prevention Definition


Banking and healthcare fraud account for tens of billions of dollars in losses annually, which results in
compromised financial institutions, personal impact for bank clients, and higher premiums for patients. Fraud
detection and prevention refers to the strategies undertaken to detect and prevent attempts to obtain money or
property through deception.

What is Fraud Detection and Prevention?


Fraudulent activities can encompass a wide range of cases, including money laundering, cybersecurity threats,
tax evasion, fraudulent insurance claims, forged bank checks, identity theft, and terrorist financing, and is
prevalent throughout the financial institutions, government, healthcare, public sector, and insurance sectors.
To combat this growing list of opportunities for fraudulent transactions, organizations are implementing
modern fraud detection and prevention technologies and risk management strategies, which combine big data
sources with real-time monitoring, and apply adaptive and predictive analytics techniques, such as Machine
Learning, to create a risk of fraud score.
Detecting fraud with data analytics, fraud detection software and tools, and a fraud detection and prevention
program enables organizations to predict conventional fraud tactics, cross-reference data through automation,
manually and continually monitor transactions and crimes in real time, and decipher new and sophisticated
schemes.
Fraud detection and prevention software is available in both proprietary and open source versions. Common
features in fraud analytics software include: a dashboard, data import and export, data visualization, customer
relationship management integration, calendar management, budgeting, scheduling, multi-user capabilities,
password and access management, Application Programming Interfaces (API), two-factor authentication,
billing, and customer database management.
Fraud Detection and Prevention Techniques
Fraud data analytics methodologies can be categorized as either statistical data analysis techniques or artificial
intelligence (AI).
1. Statistical data analysis techniques include:
Statistical parameter calculation, such as averages, quantiles, and performance metrics
2. Regression analysis
estimates relationships between independent variables and a dependent variable
3. Probability distributions and models
4. Data matching - used to compare two sets of collected data, remove duplicate records, and identify links
between sets
5. Time-series analysis

AI techniques include:
● Data mining - data mining for fraud detection and prevention classifies and segments data groups in
which millions of transactions can be performed to find patterns and detect fraud
● Neural networks - suspicious patterns are learned and used to detect further repeats
● Machine Learning - fraud analytics Machine Learning automatically identifies characteristics found in
fraud
● Pattern recognition - detects patterns or clusters of suspicious behavior

The four most crucial steps in the fraud prevention and detection process include:
● Capture and unify all manner of data types from every channel and incorporate them into the analytical
process.
● Continually monitor all transactions and employ behavioral analytics to facilitate real-time decisions.
● Incorporate analytics culture into every facet of the enterprise through data visualization.
● Employ layered security techniques.

Traditional and Novel Fraud Detection Methods


Fraud Detection Using Big Data Analytics
Fraud detection and prevention analytics relies on data mining and Machine Learning, and is used in fraud
analytics use cases such as payment fraud analytics, financial fraud analytics, and insurance fraud detection
analytics. Data mining reveals meaningful patterns, turning raw, big datasets into valuable information.
Machine Learning then submits that information to either Supervised or Unsupervised algorithms.
Supervised Machine Learning algorithms, such as logistic regression and time-series analysis, learn from
historical data and identify patterns of interest that require further investigation. Unsupervised Machine
Learning algorithms, such as cluster analysis and peer group analysis,
examines data without any identified fraud and reveals new anomalies and patterns of interest. Data analysts
and scientists can then act on these anomalies.
Does HEAVY.AI Offer a Fraud Detection and Prevention Solution?
Mainstream analytics and fraud detection and prevention systems are programmed to flag unusual behavior,
but when seconds matter, real time fraud detection analytics and Machine Learning is crucial in very quickly
identifying and halting these fraudulent transactions. ‍
HEAVY.AI accelerates existing fraud detection software and tools and fraud detection Machine Learning
models, enabling these tools to analyze massive transaction datasets with millisecond results. With an
immersive dashboard to cross-filter dozens of attributes, such as amount, merchant, location, and time,
forensic and fraud analysts can use HEAVY.AI to take an ultra fine-grain look into potentially fraudulent
transactions.‍
HEAVY.AI's data science platform enables data science analysts and researchers to visualize, analyze, and
interact with massive datasets to gain new insights aimed at preventing fraud in the healthcare, telecom,
federal, and finance industries.

Thus, big data analytics are used in fraud analytics. These tools enable the implementation of payment fraud
analytics, financial fraud analytics, and insurance fraud detection analytics.
What are the Common Problems in Big Data Analytics in Fraud Detection?
We mentioned the importance of big data analytics in detecting fraud. Although it makes it easier to detect
fraud, it can also bring some problems with it. Some of these problems can be listed as:
Unrelated or Insufficient Data: The data from the transactions may come from many different sources. In
some cases, false results can be obtained in fraud detection due to these insufficient or irrelevant data.
Detection can be based on the inappropriate rules used in the algorithm. Because of this risk of failure,
companies may be hesitant to use big data analytics and machine learning.
High Costs: Big data analytics and fraud detection systems may cause some costs such as the cost of
software, hardware systems, the cost of components used for sustainability of these systems and the time
spent.
Dynamic Fraud Methods: As technology develops, fraud methods develop at the same pace. In order to
catch this speed and detect fraud, it is necessary to constantly monitor the data and give rules to the algorithms
with new and accurate data analytics.
Data Security: While processing the data and making decisions with this data analytics system, the security
of the data is also a problem to be considered. That means the security of data should be checked.

Solutions About Big Data Analytics Problems


It is necessary to separate unnecessary data by processing complex data coming from many channels with
certain analysis and big data analytics. This organized, prepared data is given to the algorithms. These
algorithms ensure that fraudulent transactions are detected and quick action is taken.
Monitoring access to this data, reports, alarms from a single tool with easy and visualized dashboards prevents
wasting money and time. Even if you pay for this tool in the first place, invest in it, it will provide much more
benefit than what is paid to you in the long run by preventing fraudulent transactions detected with these tools.
As conclusion an engineering system should be established to analyze big data, manage and control its
analytics. It is necessary to ensure data security by including cyber security experts. Most importantly, it will
provide many benefits to use a software such as Formica, which will provide features such as data processing,
analysis, inference, alarming in the field of fraud within the company and will prevent time and effort spent by
helping analysts and engineers.

RISK AND BIG DATA


What are the risks of big data?
While it’s easy to get caught up in the opportunities big data offers, it’s not necessarily a cornucopia of
progress. If gathered, stored, or used wrongly, big data poses some serious dangers. However, the key to
overcoming these is to understand them. So let’s get ahead of the curve.
Broadly speaking, the risks of big data can be divided into four main categories: security issues, ethical issues,
the deliberate abuse of big data by malevolent players (e.g. organized crime), and unintentional misuse.
Big data’s security issues
The more data an organization collects, the more expensive and difficult it is to store safely. This is already a
problem. According to the Risk-Based Security Mid-Year Data Breach report, 4.1 billion records were
exposed through data breaches in the first half of 2019 alone. This highlights just how important data security
is, but also the challenges organizations face in keeping our data safe. The more data a company holds, the
higher the cost and practical burden of keeping it secure.
Related to this is the issue of privacy. Governments, social media giants, insurance companies, and healthcare
providers are just a handful of organizations that have unprecedented levels of access to our data. While
they’re bound by data protection laws (with the potential for huge fines) the increasing number of high profile
data breaches in the last few years shows that more action is needed. Organizations—especially big
tech—may have information on where we live, where we go, how we spend our money, and so on. With
personal bank details and other sensitive information under their protection, and cyberattacks on the rise, this
begs the question: just because companies can store vast amounts of data, does that mean they should? This
segues nicely into the next section…
Ethical issues with big data
Presuming organizations manage to keep our data safe from hackers and cyberattacks, that does not preclude
the possibility that they might misuse the information themselves. While data protection laws are in place,
there is still some grey area about how data can be used by companies who have obtained it legally.
Take insurance providers and credit card companies. It’s no revelation that these organizations impose
premiums and limits based on customer behaviors. For instance, if you’ve ever had a car accident, you’ll
know your car insurance premium goes up. Big data allows these companies to make ever-more refined
predictions about the future, allowing them to conduct ever-more invasive financial profiling.
Way back in 2009 (even before big data was as big as it is today) one man had his credit limit cut, simply
because other customers who shopped in the same stores as him had poor repayment histories. This is just one
small example of a murky area of big data use that has clear ethical implications. There are multiple other
ethical issues too, around consent, ownership, and privacy. These have resulted in the emergence of the Right
To Be Forgotten, which has led to new laws being introduced.
Abuse of big data by malevolent players
Another danger with big data is if third parties get their hands on sensitive information. In 2020, it’s estimated
that we’ll produce 2.5 quintillion bytes of data every day. That’s tough to visualize, but you can trust that it’s
an immense amount—far more than any organization can easily manage or analyze. Nevertheless, hackers and
cyberattackers can target this data to sell on the DarkNet.
Phishing, bank fraud, and insurance scams are all common examples of how big data can be deliberately
misused by organized crime groups. The days of try-their-luck emails offering you a million dollars if you just
send through your bank details are long gone! If you’ve recently been the victim of a scam, you’ll know just
how sophisticated they can be.
Big data also plays a big part in the misinformation and spread of fake news that has characterized public
debate for the last half-decade. Nefarious organizations can use big data to target ads or fake news that aims to
influence our ideas, beliefs, and even who we vote for. The reason so much fake news is successful is because
it is well targeted and preys on people’s fears—all of which can be tracked (or at least inferred) from big data.
With the risks of data theft growing by the day, this issue remains to be solved.
Unintentional misuse of big data (including systematic errors)
While those deliberately seeking to abuse big data are one problem, not all dangers are necessarily
premeditated. Enter machine learning. This is a crucial tool for analyzing and extracting insights from big
data. However, while machine learning algorithms learn on their own, they must first be programmed how to
learn, which allows human bias to sneak into the algorithm. Human bias, as well as bad practice in data
analytics, or even just poor quality data, can lead to bad insights. If these insights are used to make important
financial or safety decisions (for example) there are going to be negative effects.
Since data science is a new field, we can’t yet predict how problems like these will evolve. The use of
artificial intelligence is rising, but there are unknown risks attached to this nascent technology. While it’s
unlikely that machines will rise to overthrow us any time soon, there are certainly risks associated with
artificial intelligence. AI can already do amazing things, but it has limitations. For example, it is not very
good at nuance and lacks the intuition of a human being. This can have tragic results, as illustrated by a
self-driving Uber car, which killed a woman in 2018. It turns out the accident occurred because the AI in
charge of the car did not understand that pedestrians sometimes jaywalk.
To avoid these kinds of risks in future, we must address systemic problems before the technology becomes
more widely adopted.
Curious about a career in Data Analytics?
3. Examples of dangerous big data in action
Before looking at how we might tackle some of the problems big data poses, here are some real-world
examples of how it has been misused.
Big data and election interference
Probably the most obvious examples of big data misuse are the 2016 US Presidential Election and the 2016
Brexit referendum in the UK. Following shock results in both polls, Vote Leave in the UK and the Trump
Campaign in the US were linked to a shady data analytics firm calledCambridge Analytica. The now-defunct
firm used information illegally gathered from Facebook to inform the communications strategies for both
polls. Their impact has shaped the global political scene ever since.
Big data and state surveillance
The Chinese government is currently launching a new social credit system. Linked to each citizen’s
permanent record, it aims to promote good citizen behavior. ‘Good’ citizens, e.g. those giving to charity or
paying bills on time, will receive credit that can be exchanged for things like first-class airplane or train
tickets. Meanwhile, ‘bad’ citizens, e.g. those with traffic violations or unpaid debts, could receive
disincentives such as slower internet connections or reduced access to private education. This system, due to
be launched in 2020, clearly represents the dark potential of big data.
Big data and racial profiling
Once again, deliberate misuse is not the only danger of big data. A prime example is Amazon’s facial
recognition software, Rekognition. In 2018, the software incorrectly identified 28 members of the US
Congress as convicted criminals. While this highlighted an overall problem with the software, a
disproportionate number of those misidentified were people of color. This is not an isolated
incident—numerous studies have shown there is significant racial (and in some cases gender) bias within
these kinds of technologies.
4. How can we minimize the dangers of big data?
While big data poses clear dangers we cannot ignore, nor should we toss the baby out with the bathwater, so
to speak. Big data’s potential for positive change is huge. Luckily for us, it’s not a binary choice.
Big data analytics is a new discipline. Naturally, mistakes will be made. The key thing is to learn from these
mistakes and improve safety. By implementing security measures and ethical guidelines, we can reap big
data’s benefits while mitigating its risks. Here are a few ways that data analysts and data scientists can
advocate for safer use of big data.
Stay vigilant about security measures
For any curator of big data, it’s crucial to have effective security measures in place and to ensure that these are
up to date. One area where many organizations trip up is on their back-door security. While it’s common to
have well-guarded front ends, back-up data is often stored in disaster recovery systems or test environments
that are not always as well-protected.
Eliminate unnecessary information
One of the surest ways to prevent a data breach is not to have sensitive data in the first place. Many
companies stockpile data they don’t use, thinking it may be helpful in the future. However, by conducting
regular audits, organizations can keep the data necessary for their business operations, while purging what
remains. Good housekeeping has the added benefit of focusing analytics tasks where they’re most needed.
Check compliance with data legislation
Although we have data protection legislation to secure people’s data, many companies don’t fully comply
with it. For instance, in a 2019 survey by Talend, only 58% of global businesses were complying with GDPR
legislation. In order to protect data, companies need to invest properly in data protection and security, as well
as adhering to other guidelines. As a data analyst, it’s important to advocate for your organization’s
compliance with data protection measures.
A Hippocratic oath of big data
An individual company’s actions are important for big data security, but other initiatives are needed, too.
British mathematician and data scientist, Hannah Fry, has called on data scientists to take an ethical pledge.
The idea is much like the medical Hippocratic oath that doctors take to “do no harm.” Though controversial,
the idea of a Data Science Oath encourages discussion about the ethics of big data, which is no bad thing. In
conjunction, many data scientists are also lobbying governments to introduce stricter rules around how big
data can be collected, stored, and used.

5. Key takeaways
In this post, we’ve explored the benefits and risks of big data. To answer our initial question—“is big data
dangerous?”—in short, it’s only dangerous if we allow it to be. As we’ve seen:
1. Big data has vast potential—it can be used to glean ever more powerful insights and to transform the way
the world works.
2. Big data comes with security issues—security and privacy issues are key concerns when it comes to big
data.
3. Bad players can abuse big data—if data falls into the wrong hands, big data can be used for phishing,
scams, and to spread disinformation.
4. Insights are only as good as the quality of the data they come from—bad, noisy, or ‘dirty’ data (or
applying poor best practice) can lead to poor insights, which can be risky in the wrong situations.
5. There are ethical issues—as a new field, the ethics of big data is still evolving. This is why some are
pushing for a Data Science Oath and for ethical guidelines to be developed.
The battle between big data’s potential and its dangers remains ongoing. However, identifying and
acknowledging its potential risks goes a long way to resolving them. Ultimately, we all need to do our part to
promote a culture of integrity within data science. Putting safeguards in place, and regularly reviewing them,
is key.

You might also like