Big Data Introduction
Big Data Introduction
Due to the advent of new technologies, devices, and communication means like social networking
sites, the amount of data produced by mankind is growing rapidly every year. The amount of data
produced by us from the beginning of time till 2003 was 5 billion gigabytes. If you pile up the data in
the form of disks it may fill an entire football field. The same amount was created in every two days
in 2011, and in every ten minutes in 2013. This rate is still growing enormously. Though all this
information produced is meaningful and can be useful when processed, it is being neglected.
What is Big Data?
Big data is a collection of large datasets that cannot be processed using traditional computing
techniques. It is not a single technique or a tool, rather it has become a complete subject, which
involves various tools, technqiues and frameworks.
What Comes Under Big Data?
Big data involves the data produced by different devices and applications. Given below are some of
the fields that come under the umbrella of Big Data.
Black Box Data − It is a component of helicopter, airplanes, and jets, etc. It captures voices
of the flight crew, recordings of microphones and earphones, and the performance
information of the aircraft.
Social Media Data − Social media such as Facebook and Twitter hold information and the
views posted by millions of people across the globe.
Stock Exchange Data − The stock exchange data holds information about the „buy‟ and
„sell‟ decisions made on a share of different companies made by the customers.
Power Grid Data − The power grid data holds information consumed by a particular node
with respect to a base station.
Transport Data − Transport data includes model, capacity, distance and availability of a
vehicle.
Search Engine Data − Search engines retrieve lots of data from different databases.
Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in
it will be of three types.
Structured data − Relational data.
Semi Structured data − XML data.
Unstructured data − Word, PDF, Text, Media Logs.
Businesses use big data to enhance B2B operations, advertising, and communication. Big data is
primarily being used by many industries, such as travel, real estate, finance, and insurance, to
enhance decision-making. Businesses can use big data to accurately predict what customers want and
don't want, as well as their behavioural tendencies because it reveals more information in a usable
format.
Big data provides business intelligence and cutting-edge analytical insights that help with decision-
making. A company can get a more in-depth picture of its target market by collecting more customer
data.
Business trends and behaviours are revealed by data-driven insights, which also help businesses
compete and grow by enhancing their decision-making. Additionally, these insights help companies
develop more specialised goods and services, strategies, and intelligent marketing campaigns to
compete in their sector.
According to surveys done by New Vantage and Syncsort (now Precisely), big data analytics has
helped businesses significantly cut their costs. Big data is being used to cut costs, according to
66.7% of survey participants from New Vantage. Moreover, 59.4% of Syncsort survey participants
stated that using big data tools improved operational efficiency and reduced costs. Do you know that
Hadoop and Cloud-Based Analytics, two popular big data analytics tools, can help lower the cost of
storing big data
3. Detection of Fraud
Financial companies especially use big data to identify fraud. To find anomalies and transaction
patterns, data analysts use artificial intelligence and machine learning algorithms. These irregularities
in transaction patterns show that something is out of place or that there is a mismatch, providing us
with hints about potential fraud.
For credit unions, banks, and credit card companies, fraud detection is crucial for identifying account
information, materials, or product access. By spotting frauds before they cause problems, any
industry, including finance, can provide better customer service.
For instance, using big data analytics, banks and credit card companies can identify fraudulent
purchases or credit cards that have been stolen even before the cardholder becomes aware of the
issue.
4. A rise in productivity
A survey by Syncsort found that 59.9% of respondents said they were using big data analytics tools
like Spark and Hadoop to boost productivity. They have been able to increase sales and improve
customer retention as a result of this rise in productivity. Modern big data tools make it possible for
data scientists and analysts to analyse a lot of data quickly and effectively, giving them an overview
of more data.
They become more productive as a result of this. Additionally, big data analytics aids data scientists
and analysts in learning more about themselves to figure out how to be more effective in their tasks
and job responsibilities. As a result, investing in big data analytics gives businesses across all sectors
a chance to stand out through improved productivity.
Social media, email exchanges, customer CRM (customer relationship management) systems, and
other major data sources are the main sources of big data. As a result, it provides businesses with
access to a wealth of data about the needs, interests, and trends of their target market.
Big data also enables businesses better to comprehend the thoughts and feelings of their clients to
provide them with more individualised goods and services. Providing a personalised experience can
increase client satisfaction, strengthen bonds with clients, and, most importantly, foster loyalty.
Increasing business agility is a big data benefit for competition. Big data analytics can assist
businesses in becoming more innovative and adaptable in the marketplace. Large customer data sets
can be analysed to help businesses gain insights ahead of the competition and more effectively
address customer pain points.
Additionally, having a wealth of data at their disposal enables businesses to assess risks, enhance
products and services, and improve communications. Additionally, big data assists businesses in
strengthening their business tactics and strategies, which are crucial in coordinating their operations
to support frequent and quick changes in the industry.
7. Greater innovation
Innovation is another common benefit of big data, and the NewVantage survey found that 11.6 per
cent of executives are investing in analytics primarily as a means to innovate and disrupt their
markets. They reason that if they can glean insights that their competitors don't have, they may be
able to get out ahead of the rest of the market with new products and services.
A study by AtScale found that for the past three years, the biggest challenge in this industry has been
a lack of big data specialists and data scientists. Given that it requires a different skill set, big data
analytics is currently beyond the scope of many IT professionals. Finding data scientists who are also
knowledgeable about big data can be difficult.
Data scientists and big data specialists are two well-paid professions in the data science industry. As
a result, hiring big data analysts can be very costly for businesses, particularly for start-ups. Some
businesses must wait a long time to hire the necessary personnel to carry out their big data analytics
tasks.
2. Security hazard
For big data analytics, businesses frequently collect sensitive data. These data need to be protected,
and security risks can be detrimental if they are not properly maintained.
Additionally, having access to enormous data sets can attract the unwanted attention of hackers, and
your company could become the target of a potential cyber-attack. You are aware that for many
businesses today, data breaches are the biggest threat. Unless you take all necessary precautions,
important information could be leaked to rivals, which is another risk associated with big data.
3. Adherence
Another disadvantage of big data is the requirement for legal compliance with governmental
regulations. To store, handle, maintain, and process big data that contains sensitive or private
information, a company must make sure that they adhere to all applicable laws and industry
standards. As a result, managing data governance tasks, transmission, and storage will become more
challenging as big data volumes grow.
4. High Cost
Given that it is a science that is constantly evolving and has as its goal the processing of ever-
increasing amounts of data, only large companies can sustain the investment in the development of
their Big Data techniques.
5. Data quality
Dealing with data quality issues was the main drawback of working with big data. Data scientists and
analysts must ensure the data they are using is accurate, pertinent, and in the right format for analysis
before they can use big data for analytics efforts.
This significantly slows down the reporting process, but if businesses don't address data quality
problems, they may discover that the insights their analytics produce are useless or even harmful if
used.
6. Rapid Change
The fact that technology is evolving quickly is another potential disadvantage of big data analytics.
Businesses must deal with the possibility of spending money on one technology only to see
something better emerge a few months later. This big data drawback was ranked fourth among all the
potential difficulties by Syncsort respondents.
1. Volume:
The name „Big Data‟ itself is related to a size which is enormous.
Volume is a huge amount of data.
To determine the value of data, size of data plays a very crucial role. If the volume of data is
very large, then it is actually considered as a „Big Data‟. This means whether a particular data
can actually be considered as a Big Data or not, is dependent upon the volume of data.
Hence while dealing with Big Data it is necessary to consider a characteristic „Volume‟.
Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2 billion
GB) per month. Also, by the year 2020 we will have almost 40000 Exabytes of data.
2. Velocity:
Velocity refers to the high speed of accumulation of data.
In Big Data velocity data flows in from sources like machines, networks, social media, mobile
phones etc.
There is a massive and continuous flow of data. This determines the potential of data that how
fast the data is generated and processed to meet the demands.
Sampling data can help in dealing with the issue like „velocity‟.
Example: There are more than 3.5 billion searches per day are made on Google. Also,
Facebook users are increasing by 22%(Approx.) year by year.
3. Variety:
It refers to nature of data that is structured, semi-structured and unstructured data.
It also refers to heterogeneous sources.
Variety is basically the arrival of data from new sources that are both inside and outside of an
enterprise. It can be structured, semi-structured and unstructured.
Structured data: This data is basically an organized data. It generally refers to data
that has defined the length and format of data.
Semi- Structured data: This data is basically a semi-organised data. It is generally
a form of data that do not conform to the formal structure of data. Log files are the
examples of this type of data.
Unstructured data: This data basically refers to unorganized data. It generally
refers to data that doesn‟t fit neatly into the traditional row and column structure of
the relational database. Texts, pictures, videos etc. are the examples of unstructured
data which can‟t be stored in the form of rows and columns.
4. Veracity:
It refers to inconsistencies and uncertainty in data, that is data which is available can
sometimes get messy and quality and accuracy are difficult to control.
Big Data is also variable because of the multitude of data dimensions resulting from multiple
disparate data types and sources.
Example: Data in bulk could create confusion whereas less amount of data could convey half
or Incomplete Information.
5. Value:
After having the 4 V‟s into account there comes one more V which stands for Value! The bulk
of Data having no Value is of no good to the company, unless you turn it into something
useful.
Data in itself is of no use or importance but it needs to be converted into something valuable to
extract Information. Hence, you can state that Value! is the most important V of all the 6V‟s.
6. Variability:
How fast or available data that extent is the structure of your data is changing?
How often does the meaning or shape of your data change?
Example: if you are eating same ice-cream daily and the taste just keep changing.