0% found this document useful (0 votes)

2 views

big data analytics

Big data refers to large and complex data sets that can be structured, semi-structured, or unstructured, enabling organizations to gain insights and improve operations. It is characterized by the 5 V's: Volume, Variety, Velocity, Veracity, and Value, which highlight the challenges and opportunities in processing and analyzing such data. Big data analytics involves a series of processes to collect, store, preprocess, integrate, analyze, and visualize data to support informed decision-making and strategic business insights.

Uploaded by

parul.singh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

big data analytics

Uploaded by

parul.singh

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

What is big data?

Big data is a combination of unstructured, semi-structured or structured

data collected by organizations. These data sets can be mined to gain insights
and used in machine learning projects, predictive modeling and other
advanced analytics applications.
Big data can be used to improve operations, provide better customer service
and create personalized marketing campaigns -- all of which can increase value
for an organization. As an example, big data analytics can provide companies
with valuable insights into their customers that can then be used to refine
marketing techniques to increase customer engagement and conversion rates.
1. Structured Data:- This is the data which is in an organized form, for
example in rows and columns. No of rows called Cardinality and No of
columns called Degree of a relation Sources: Database, Spread sheets,
OLTP systems.
Working with Structured data: -
Storage: Data types – both defined and user defined help with the
storage of structured data
update, delete: Updating, deleting, etc. is easy due to structured form -
Security: can be provided easily in RDBMS.
Indexing /Searching: Data can be indexed based not only on a text string
but other attributes as well. This enables streamlined search .
Scalability (horizontal/vertical): Scalability is not generally an issue with
increase in data as resources can be increased easily. - Transaction
Processing (Atomicity, Consistency, Integrity, Durability.
2. Semi-Structured Data: This data which doesn’t conform to a data model
but has some structure. Metadata for this data is available but is not
sufficient. Sources: XML, JSON, E-mail .
Characteristics: -
inconsistent structure. - self describing (label/value pairs)
schema information is blended with data values -.
data objectives may have different attributes not known before
Challenges:
Storage cost: Storing data with their schemas increases cost
RDBMS: Semi-structured data cannot be stored in existing RDBMS as
 data cannot be mapped into tables directly Irregular and partial
structure: Some data elements may have extra
 information while others none at all Implicit structure: In many cases
the structure is implicit.
 Interpreting relationships and correlations is very difficult
 Flat files: Semi-structured is usually stored in flat files which are
difficult to index and search Heterogeneous sources: Data comes from
varied sources which is difficult to tag and search.
3. Unstructured Data: This is the data which does not conform to a data
model or is not in a form which can be used easily by a computer
program. About 80–90% data of an organization is in this format.
Sources: memos, chat-rooms, PowerPoint presentations, images, videos,
letters, researches, white papers, body of an email, etc.
Characteristics:
 Does not confirm to any data model
 Can’t be stored in the form of rows and columns
 Not in any particular format or sequence
 Not easily usable by the program
 Doesn’t follow any rule or semantics
Challenges:
 Storage space: Sheer volume of unstructured data and its
unprecedented growth makes it difficult to store. Audios, videos,
images, etc. acquire huge amount of storage space
 Scalability: Scalability becomes an issue with increase in
unstructured data
 Retrieve information: Retrieving and recovering unstructured data
are cumbersome
 Security: Ensuring security is difficult due to varied sources of data
(e.g. e-mail, web pages)
 Update/delete: Updating, deleting, etc. are not easy due to the
unstructured form
 Indexing and Searching: Indexing becomes difficult with increase
in data.
 Searching is difficult for non-text data.
 Interpretation: Unstructured data is not easily interpreted by
conventional search algorithm
 Tags: As the data grows it is not possible to put tags Manually.
 Indexing: Designing algorithms to understand the meaning of the
document and then tag or index them accordingly is difficult.
Dealing with Unstructured data:
Data Mining: Knowledge discovery in databases, popular Mining
algorithms are Association rule mining, Regression Analysis, and
Collaborative filtering.
 Natural Language Processing: It is related to HCI. It is about
enabling computers to understand human or natural language
input.
 Text Analytics: Text mining is the process of gleaning high quality and
meaningful information from text. It includes tasks such as text
categorization, text clustering, sentiment analysis and concept/entity
extraction.
 Noisy text analytics: Process of extraction structured or semistructured
from noisy unstructured data such as chats, blogs, wikis, emails,
Spelling mistakes, abbreviations, uh, hm, non standard words.
 Manual Tagging with meta data: This is about tagging manually with
adequate meta data to provide the requisite semantics to understand
unstructured data.

Parts of Speech Tagging: POST is the process of reading text and tagging each word in the sentence
belonging to particular parts of speech such as noun, verb, objective.

Unstructured Information management architecture: Open source platform from IBM used for real
time content analytics.

Define Big Data. What are the characteristics of Big Data?

Big Data is high-volume, velocity, and variety information assets that demand
cost-effective, innovative forms of information processing for enhanced
insight and decision making.
Characteristics(V’s):
1. Volume: It refers to the amount of the data. The size of the data is being
increased from Bits to Yottabytes.
Bits-> Bytes-> KBs-> MBs-> GBs-> TBs-> PBs-> Exabytes-> Zettabytes-> Yottabytes

There are different sources of data like doc, pdf, YouTube, a chat conversation on
internet messenger, a customer feedback form on an online retail website, CCTV
coverage and weather forecast.
The sources of Big data:
1. Typical internal data sources: data present within an organization’s firewall.
Data storage: File systems, SQL (RDBMSs- oracle, MS SQL server, DB2, MySQL,
PostgreSQL etc.) NoSQL, (MangoDB, Cassandra etc) and so on. Archives: Archives
of scanned documents, paper archives, customer correspondence records,
patient’s health records, student’s admission records, student’s assessment
records, and so on.
2. External data sources: data residing outside an organization’s Firewall. Public
web: Wikipedia, regulatory, compliance, weather, census etc.,
3. Both (internal + external sources) Sensor data, machine log data, social media,
business apps, media and docs.

2. Variety: Variety deals with the wide range of data types and sources of
data. Structured, semi-structured and Unstructured. Structured data: From
traditional transaction processing systems and RDBMS, etc. Semi-structured data:
For example Hypertext Markup Language (HTML), eXtensible Markup Language
(XML). Unstructured data: For example unstructured text documents, audios,
videos, emails, photos, PDFs , social media, etc.

Velocity: It refers to the speed of data processing. we have moved from the
days of batch processing to Real-time processing.

Veracity: Veracity refers to biases, noise and abnormality in data. The key
question is “Is all the data that is being stored, mined and analysed meaningful
and pertinent to the problem under consideration”.

Value: This refers to the value that big data can provide, and it relates directly
to what organizations can do with that collected data. It is often quantified as the
potential social or economic value that the data might create.
Volatility: It deals with “How long the data is valid? “
Validity: Validity refers to accuracy & correctness of data. Any data picked up
for analysis needs to be accurate.

Variability: Data flows can be highly inconsistent with periodic peaks

What are the 5 V's?

The 5 V's are defined as follows:
1. Velocity is the speed at which the data is created and how fast
it moves.
2. Volume is the amount of data qualifying as big data.
3. Value is the value the data provides.
4. Variety is the diversity that exists in the types of data.
5. Veracity is the data's quality and accuracy.

Velocity
Velocity refers to how quickly data is generated and how fast it
moves. This is an important aspect for organizations that need their
data to flow quickly, so it's available at the right times to make the
best business decisions possible.

An organization that uses big data will have a large and continuous
flow of data that's being created and sent to its end destination. Data
could flow from sources such as machines, networks, smartphones
or social media. Velocity applies to the speed at which this
information arrives -- for example, how many social media posts per
day are ingested -- as well as the speed at which it needs to be
digested and analyzed -- often quickly and sometimes in near real
time.
As an example, in healthcare, many medical devices today are
designed to monitor patients and collect data. From in-hospital
medical equipment to wearable devices, collected data needs to be
sent to its destination and analyzed quickly.

In some cases, however, it might be better to have a limited set of

collected data than to collect more data than an organization can
handle -- because this can lead to slower data velocities.

Volume
Volume refers to the amount of data that exists. Volume is like the
base of big data, as it's the initial size and amount of data that's
collected. If the volume of data is large enough, it can be considered
big data. However, what's considered to be big data is relative and
will change depending on the available computing power that's on
the market.

Value

Value refers to the benefits that big data can provide, and it relates
directly to what organizations can do with that collected data. Being
able to pull value from big data is a requirement, as the value of big
data increases significantly depending on the insights that can be
gained from it.

Variety

Variety refers to the diversity of data types. An organization might

obtain data from several data sources, which might vary in value.
Data can come from sources in and outside an enterprise as well. The
challenge in variety concerns the standardization and distribution of
all data being collected.

Unstructured data is data that's unorganized and comes in different

files or formats. Typically, unstructured data isn't a good fit for a
mainstream relational database because it doesn't fit into
conventional data models. Semi-structured data is data that hasn't
been organized into a specialized repository but has associated
information, such as metadata. This makes it easier to process than
unstructured data. Structured data, meanwhile, is data that has been
organized into a formatted repository. This means the data is made
more addressable for effective data processing and analysis.

Raw data also qualifies as a data type. While raw data can fall into
other categories -- structured, semi-structured or unstructured -- it's
considered raw if it has received no processing at all. Most often, raw
applies to data imported from other organizations or submitted or
entered by users. Social media data often falls into this category.

A more specific example could be found in a company that gathers a

variety of data about its customers. This can include structured data
culled from transactions or unstructured social media posts and call
center text. Much of this might arrive in the form of raw data,
requiring cleaning before processing.

Veracity

Veracity refers to the quality, accuracy, integrity and credibility of

data. Gathered data could have missing pieces, might be inaccurate
or might not be able to provide real, valuable insight. Veracity,
overall, refers to the level of trust there is in the collected data.

Data can sometimes become messy and difficult to use. A large

amount of data can cause more confusion than insights if it's
incomplete. For example, in the medical field, if data about what
drugs a patient is taking is incomplete, the patient's life could be
endangered.

The challenges with big data:

1. Data today is growing at an exponential rate. Most of the data
that we have today has been generated in the last two years.
The key question is : will all this data be useful for analysis how
will separate knowledge from noise.
2. How to host big data solutions outside the world.
3. The period of retention of big data.
4. Dearth of skilled professionals who possess a high level of
proficiency in data science that is vital in implementing Big data
solutions.
5. Challenges with respect to capture, curation, storage, search,
sharing, transfer, analysis, privacy violations and visualization.
6. Shortage of data visualization experts.
7. Scale : The storage of data is becoming a challenge for
everyone. 8. Security: The production of more and more data
increases security and privacy concerns.
8. Schema: there is no place for rigid schema, need of dynamic
schema.
9. Continuous availability: How to provide 24X7 support
10. Consistency: Should one opt for consistency or eventual
consistency.
11. Partition tolerant: how to build partition tolernant
systems that can take of both hardware and software failures.
12. Data quality: Inconsistent data, duplicates, logic conflicts,
and missing data all result in data quality challenges.

What is big data analytics?

Big data analytics examines and analyzes large and complex data sets known as
“big data.”
Through this analysis, you can uncover valuable insights, patterns, and trends
to make more informed decisions. It uses several techniques, tools, and
technologies to process, manage, and examine meaningful information from
massive datasets.
We typically apply big data analytics when data is too large or complicated for
traditional data processing methods to handle efficiently. The more
information there is, the greater the need for diverse analytical approaches,
quicker handling times, and a more extensive data capacity.

How does big data analytics work?

Big data analytics combines several stages and processes to extract insights.
Here’s a quick overview of what this could look like:

Data collection: Gather data

from variojus sources, such as
surveys, social media, websites,
databases, and transaction
records. This data can be
structured, unstructured, or semi-
structured.
1. Data storage: Store data in distributed systems or cloud-based solutions.
These types of storage can handle a large volume of data and provide
fault tolerance.
2. Data preprocessing: It’s best to clean and preprocess the raw data
before performing analysis. This process could involve handling missing
values, standardizing formats, addressing outliers, and structuring the
data into a more suitable format.
3. Data integration: Data usually comes from various sources in different
formats. Data integration combines the data into a unified format.
4. Data processing: Most organizations benefit from using distributed
frameworks to process big data. These break down the tasks into smaller
chunks and distribute them across multiple machines for parallel
processing.
5. Data analysis techniques: Depending on the goal of the analysis, you’ll
likely apply several data analysis techniques. These could
include descriptive, predictive, and prescriptive analytics using machine
learning, text mining, exploratory analysis, and other methods.
6. Data visualization: After analysis, communicate the results visually, like
charts, graphs, dashboards, or other visual tools. Visualization helps you
communicate complex insights in an understandable and accessible way.
7. Interpretation and decision making: Interpret the insights gained from
your analysis to draw conclusions and make data-backed decisions.
These decisions impact business strategies, processes, and operations.
8. Feedback and scale: One of the main advantages of big data analytics
frameworks is their ability to scale horizontally. This scalability enables
you to handle increasing data volumes and maintain performance, so
you have a sustainable method for analyzing large datasets.
It’s important to remember that big data analytics isn’t a linear process, but a
cycle.
You’ll continually gather new data, analyze it, and refine business strategies
based on the results. The whole process is iterative, which means adapting to
changes and making adjustments is key.
The importance of big data analytics
Big data analytics has the potential to transform the way you operate, make
decisions, and innovate. It’s an ideal solution if you’re dealing with massive
datasets and are having difficulty choosing a suitable analytical approach.
By tapping into the finer details of your information, using techniques and
specific tools, you can use your data as a strategic asset.
Big data analytics enables you to benefit from:
 Informed decision-making: You can make informed decisions based on
actual data, which reduces uncertainty and improves outcomes.
 Business insights: Analyzing large datasets uncovers hidden patterns and
trends, providing a deeper understanding of customer behavior and
market dynamics.
 Customer understanding: Get insight into customer preferences and
needs so you can personalize experiences and create more impactful
marketing strategies.
 Operational efficiency: By analyzing operational data, you can optimize
processes, identify bottlenecks, and streamline operations to reduce
costs and improve productivity.
 Innovation: Big data analytics can help you uncover new opportunities
and niches within industries. You can identify unmet needs and
emerging trends to develop more innovative products and services to
stay ahead of the competition.

Types of big data analytics

There are four main types of big data analytics—
descriptive, diagnostic, predictive, and prescriptive.
Collectively, they enable businesses to comprehensively understand their big
data and make decisions to drive improved performance.

Descriptive analytics
This type focuses on summarizing historical data to tell youwhat’s happened in
the past. It uses aggregation, data mining, and visualization techniques to
understand trends, patterns, and key performance indicators (KPIs).
Descriptive analytics helps you understand your current situation and make
informed decisions based on historical information.

Diagnostic analytics
Diagnostic analytics goes beyond describing past events and aims to
understand why they occurred. It separates data to identify the root causes of
specific outcomes or issues.
By analyzing relationships and correlations within the data, diagnostic analytics
helps you gain insights into factors influencing your results.

Predictive analytics
This type of analytics uses historical data and statistical algorithms to predict
future events. It spots patterns and trends and forecasts what might happen
next.
You can use predictive analytics to anticipate customer behavior, product
demand, market trends, and more to plan and make strategic decisions
proactively.

Prescriptive analytics
Prescriptive analytics builds on predictive analytics by recommending actions
to optimize future outcomes. It considers various possible actions and their
potential impact on the predicted event or outcome.
Prescriptive analytics help you make data-driven decisions by suggesting the
best course of action based on your desired goals and any constraints.

Q) What is NoSQL? What is the need of NoSQL? Explain

different types of NoSQL databases.
NoSQL Stands for Not Only SQL. These are non-relational, open source,
distributed databases. Features of NoSQL:
1. NoSQL databases are non-relational: They do not adhere to relational data
model. In fact either key-value pairs or document oriented or column oriented
or graph based databases.
2. Distributed: The data is distributed across several nodes in a cluster
constituted of low commodity hardware.
3. No Support for ACID properties: They do not offer support for ACID
properties of transactions. On the contrary, they adherence to CAP theorem.
4. No fixed table schema: NoSQL databases are becoming increasing popular
owing to their support for flexibility to the schema. They do not mandate for
the data to strict adhere to any schema structure at the time of storage.
Need of NoSQL:
1. It has scale out architecture instead of the monolithic architecture of
relational databases.
2. It can house large volumes of structured, semi-structured and unstructured
data.
3. Dynamic Schema: It allows insertion of data without a predefined schema.
4. Auto Sharding: It automatically spread data across an arbitrary numer of
servers or nodes in a cluster.
5. Replication: It offers good support for replication which in turn guarantees
high availability, fault tolerance and disaster recovery.

What are the advantages and disadvantage of

)

NoSQL?
Advantages:
 Big Data Capability
 No Single Point of Failure
  Easy Replication
 It provides fast performance and horizontal scalability. 
 Can handle structured, semi-structured, and unstructured data with 
equal effect NoSQL databases don't need a dedicated high-
performance server
 It can serve as the primary data source for online applications. 
 Excels at distributed database and multi-data centre operations 
 Eliminates the need for a specific caching layer to store data 
 Offers a flexible schema design which can easily be altered without
 downtime or service disruption

Disadvantages:
1. Limited query capabilities
2. RDBMS databases and tools are comparatively mature
3. It does not offer any traditional database capabilities, like
consistency
4. when multiple transactions are performed simultaneously.
When the volume of data increases it is difficult to maintain
unique
5. values as keys become difficult Doesn't work as well with
relational data
6. Open source options so not so popular for enterprises.
7. No support for join and group-by operations.

Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
BigData_1
No ratings yet
BigData_1
14 pages
Unit 1 Introduction to Data Analytics
No ratings yet
Unit 1 Introduction to Data Analytics
20 pages
Bigdata FinalAll (2)
No ratings yet
Bigdata FinalAll (2)
62 pages
03) Introduction to Database data management approches
No ratings yet
03) Introduction to Database data management approches
21 pages
Big Data Analytics Unit 1
No ratings yet
Big Data Analytics Unit 1
26 pages
Module 1
No ratings yet
Module 1
35 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
Big Data Analytics Compiled Notes
No ratings yet
Big Data Analytics Compiled Notes
130 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Bda Module 1
No ratings yet
Bda Module 1
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
64 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
Bba Unit-1
No ratings yet
Bba Unit-1
11 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Unit 1
No ratings yet
Unit 1
17 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
41 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
46 pages
Ds unit 3 notes
No ratings yet
Ds unit 3 notes
29 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Finance - Unit 4
No ratings yet
Finance - Unit 4
39 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Unit I Introtodbms
No ratings yet
Unit I Introtodbms
160 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
38 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Hand Book: Ahmedabad Institute of Technology
No ratings yet
Hand Book: Ahmedabad Institute of Technology
103 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
DOC-20231118-WA0008new Unit 3
No ratings yet
DOC-20231118-WA0008new Unit 3
15 pages
Sample Security Plan
No ratings yet
Sample Security Plan
9 pages
Unit 1 1
No ratings yet
Unit 1 1
10 pages
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Unit 4 DigitalData
No ratings yet
Unit 4 DigitalData
22 pages
Big Data: Presented By, Nishaa R
No ratings yet
Big Data: Presented By, Nishaa R
24 pages
Eds Unit 1
No ratings yet
Eds Unit 1
28 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
21 pages
Unit 5 Concepts of Big Data and Data Lake
No ratings yet
Unit 5 Concepts of Big Data and Data Lake
15 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Big Data Analytic
No ratings yet
Big Data Analytic
10 pages
File 1
No ratings yet
File 1
3 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
No ratings yet
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
3 pages
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
No ratings yet
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
68 pages
Data Science & Analytics Paper
No ratings yet
Data Science & Analytics Paper
55 pages
Unit-1-Part1-Big Data Analytics and Tools
No ratings yet
Unit-1-Part1-Big Data Analytics and Tools
12 pages
Answers For Sessional 1 BDA
No ratings yet
Answers For Sessional 1 BDA
11 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
J&J
No ratings yet
J&J
8 pages
Case Study 2008 Financial crisis
No ratings yet
Case Study 2008 Financial crisis
4 pages
3 Anaytics for Healthcare Management (1)
No ratings yet
3 Anaytics for Healthcare Management (1)
1 page
DSS
No ratings yet
DSS
16 pages
mis notes
No ratings yet
mis notes
14 pages
A STUDY ON AMAZON
No ratings yet
A STUDY ON AMAZON
9 pages
Case Study on Dbms & Rdbms
No ratings yet
Case Study on Dbms & Rdbms
36 pages
DBMS
No ratings yet
DBMS
6 pages
Current trending MIS
No ratings yet
Current trending MIS
6 pages
Compiler Construction: Chapter 1: Introduction To Compilation
No ratings yet
Compiler Construction: Chapter 1: Introduction To Compilation
65 pages
Ge MCB G100
No ratings yet
Ge MCB G100
6 pages
Tcon Training
100% (24)
Tcon Training
38 pages
Cyrus Innovation Brochure - v2
No ratings yet
Cyrus Innovation Brochure - v2
4 pages
Linux Administrator - A Real World Guide To Linux Certification Skills
100% (1)
Linux Administrator - A Real World Guide To Linux Certification Skills
363 pages
"Obscenity Is The Cleansing Process Whereas Pornography Only Adds To The Murk"-Henry Miller
No ratings yet
"Obscenity Is The Cleansing Process Whereas Pornography Only Adds To The Murk"-Henry Miller
10 pages
Knight's Tour - Group 6
No ratings yet
Knight's Tour - Group 6
23 pages
Compare These Models To The Business Plan Models in Microsoft Project
100% (1)
Compare These Models To The Business Plan Models in Microsoft Project
8 pages
PM Debug Info
No ratings yet
PM Debug Info
390 pages
Artificial Intelligence and Legal Analytics: New Tools For Law Practice in The Digital Age
No ratings yet
Artificial Intelligence and Legal Analytics: New Tools For Law Practice in The Digital Age
7 pages
Unit Vi
No ratings yet
Unit Vi
32 pages
National Nursing Core Competency 2012 by Jenny Cabanto
No ratings yet
National Nursing Core Competency 2012 by Jenny Cabanto
6 pages
Une Serie Des Exercices ALGORITHMIQUE En
No ratings yet
Une Serie Des Exercices ALGORITHMIQUE En
100 pages
737 MAX RTS Preliminary Summary V 1
100% (1)
737 MAX RTS Preliminary Summary V 1
96 pages
Bara' CV 2020
No ratings yet
Bara' CV 2020
2 pages
Makalah Karya Seni Rupa Murni - PDF
No ratings yet
Makalah Karya Seni Rupa Murni - PDF
32 pages
NI Mechatronics Machine Design Guide
No ratings yet
NI Mechatronics Machine Design Guide
46 pages
Marathon-Fluid Mechanics by Sandeep Jyani Sir WIFISTUDY PDF Version 2
No ratings yet
Marathon-Fluid Mechanics by Sandeep Jyani Sir WIFISTUDY PDF Version 2
443 pages
Entity Integrity and Referential Integrity
No ratings yet
Entity Integrity and Referential Integrity
3 pages
Script Freebtc
64% (14)
Script Freebtc
2 pages
Roblek2018 PDF
No ratings yet
Roblek2018 PDF
18 pages
Font License Agreement
No ratings yet
Font License Agreement
2 pages
Ac To DC Converter
No ratings yet
Ac To DC Converter
5 pages
Title of BSC Fyp: (Binding Page)
No ratings yet
Title of BSC Fyp: (Binding Page)
14 pages
SAP 1 Chap 5
No ratings yet
SAP 1 Chap 5
30 pages
BLAST Premier Rulebook 2023 Season
No ratings yet
BLAST Premier Rulebook 2023 Season
46 pages
About The VM-Series Firewall
No ratings yet
About The VM-Series Firewall
18 pages
Java Architecture
No ratings yet
Java Architecture
4 pages
Embedded Systems
No ratings yet
Embedded Systems
34 pages
Sony mxp-210
No ratings yet
Sony mxp-210
1 page

big data analytics

Uploaded by

big data analytics

Uploaded by

What is big data?

Big data is a combination of unstructured, semi-structured or structured

Define Big Data. What are the characteristics of Big Data?

Variability: Data flows can be highly inconsistent with periodic peaks

What are the 5 V's?

In some cases, however, it might be better to have a limited set of

Variety refers to the diversity of data types. An organization might

Unstructured data is data that's unorganized and comes in different

A more specific example could be found in a company that gathers a

Veracity refers to the quality, accuracy, integrity and credibility of

Data can sometimes become messy and difficult to use. A large

The challenges with big data:

What is big data analytics?

How does big data analytics work?

Data collection: Gather data

Types of big data analytics

Q) What is NoSQL? What is the need of NoSQL? Explain

What are the advantages and disadvantage of

You might also like