0% found this document useful (0 votes)
62 views18 pages

BD - Unit - I - Introduction To Big Data

The document provides an overview of big data including: 1. It defines big data and discusses why organizations care about it for better decision making and new products/services. 2. It outlines the syllabus which covers Hadoop frameworks, MapReduce, Hive, Pig and case studies. 3. It describes the characteristics of big data including volume, velocity, variety and more, and lists common big data sources like ERP systems, social media, sensors and transactions.

Uploaded by

Prem Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views18 pages

BD - Unit - I - Introduction To Big Data

The document provides an overview of big data including: 1. It defines big data and discusses why organizations care about it for better decision making and new products/services. 2. It outlines the syllabus which covers Hadoop frameworks, MapReduce, Hive, Pig and case studies. 3. It describes the characteristics of big data including volume, velocity, variety and more, and lists common big data sources like ERP systems, social media, sensors and transactions.

Uploaded by

Prem Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

BIG DATA

Syllabus

Unit-I : Introduction to Big Data


Unit-II : Hadoop Frameworks and HDFS
Unit-III : MapReduce
Unit-VI : Hive and Pig
Unit-V : Mahout, Sqoop and Case Study

1
Unit - I
 Introduction to Big Data: Big data is high-volume, high-velocity
and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight
and decision making is called Big Data.
 Big Data is extremely large data sets that may be analysed
computationally to reveal patterns, trends, and associations,
especially relating to human behaviour and interactions.
 The Big Data caption is

 Why do they care about Big Data?


 More knowledge leads to better customer engagement, fraud
prevention and new products.
 Big Data Matters for Aggregation, Statistics, Indexing, Searching,
Querying and Discovering Knowledge. 2
 Cost reduction
 Faster Analysis
 Better Decision Making
 New Products and Services

 Types of Data:
1. Structured Data- These data is organized in a highly
mechanized and manageable way. Ex: Tables, Transactions,
Legacy Data etc…
2. Unstructured Data- These data is raw and unorganized, it
varies in its content and can change from entry to entry. Ex:
Videos, images, audio, Text Data, Graph Data, social media etc.
3. Semi-Structured Data- Ex: XML Database, 50% structured and
50% unstructured.
3
Fig: Structure of Big Data 4
Big Data Advantages
 Flexible schema
 Massive scalability
 Cheaper to setup
 Understanding and Targeting Customers
 Understanding and Optimizing Business Process
 Improving Science and Research
 Improving Healthcare and Public Health
 Financial Trading
 Improving Sports Performance
 Improving Security and Law Enforcement
 No declarative query language
 Eventual consistency – higher performance
 Detect risks and check frauds
 Reduce Costs 5
Fig) Advantages of Big Data 6
Big Data Disadvantages

 Big data violates the privacy principle.


 Data can be used for manipulating customers.
 Big data may increase social stratification.
 Big data is not useful in short run.
 Faces difficulties in parsing and interpreting.
 Big data is difficult to handle-more programming
 Eventual consistency – fewer guarantees
7
Big Data Challenges

 Data Complexity
 Data Volume
 Data Velocity
 Data Variety
 Data Veracity
 Capture data
 Curation data
 Performance
 Storage data
 Search data
 Transfer data
 Visualization data
 Data Analysis
 Privacy and Security 8
Big Data Tools
 Big Data Tools are Hadoop, Cloudera, Datameer, Splunk, Mahout, Hive, HBase,
LucidWorks, R, MapR, Ubuntu and Linux flavors.

9
Big Data Applications
 Social Networks and Relationships
 Cyber-Physical Models
 Internet of Things (IoT)
 Retail Market
 Retail Banking
 Real Estate
 Fraud detection and prevention
 Telecommunications
 Healthcare and Research
 Automotive and production
 Science and Research
 Trading Analytics 10
Fig: Applications of Big Data Analytics 11
 Big Data consists of
1. Big Data Characteristics
2. Types of Data Sources
3. Types of Data Elements
4. Analytics Process Model
5. Outlier Detection
6. Sampling
7. Missing Values

12
1. Big Data Characteristics: Big data is high-volume, high-velocity
and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight
and decision making is called Big Data.
 Big Data is extremely large data sets that may be analysed
computationally to reveal patterns, trends, and associations,
especially relating to human behaviour and interactions.
 Earlier it was assessed in megabytes and gigabytes but now the
assessment is made in terabytes and peta bytes. 

 The characteristics are


1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value
6. Validity and
7. Visibility. 13
Fig: Characteristics of Big Data 14
1. Volume: Data size or the amount of Data or Data quantity or Data
at rest. Big data doesn't sample, it just observes and tracks what
happens. 

2. Velocity: Data speed or Speed of change or The content is changing


quickly or Data in motion. Big data is often available in real-time. 

3. Variety: Data types or The range of data types & sources or Data
with multiple formats. Big data draws from text, images, audio,
video; plus it completes missing pieces through data fusion.

4. Veracity: Data fuzzy & cloudy or Messiness or Can we trust the


data.
5. Value: Data alone is not enough, how can value be derived from it.

6. Validity: Ensure that the interpreted data is sound.


7. Visibility: Data from diverse sources need to be stitched together.
15
2) Big Data Sources: This brings more information to users'
applications without requiring that the data be held in a single
repository or cloud vendor proprietary data store.
 Ex: Sources are Amazon Redshift, HP Vertica, and MongoDB,
ERP, Social Media, Sensor Data, Transactions, Public Data etc

Fig: Big Data Sources 16


 Big Data Sources are

1. ERP Data: ERP data, or enterprise resource planning, is data used to


manage company resources. Ex: Financial materials, human
resources and other company assets.
2. Transactions Data: Transaction data are data describing an event
and is usually described with verbs. Typical transactions are:
Financial: orders, invoices, payments.
3. Public Data: Data can be classified as public if the information is
available to all employees and all individuals or entities external to
the corporation. Ex: Press releases, job descriptions and marketing
materials intended for the general public.
4. Social Media Data: Analyze unstructured data is primarily textual
comments for sentiments expressed in them.
5. Sensor Data: Sensor data is the output of a device that detects and
responds to some type of input from the physical environment.
17
Fig: Big Data Sources 18

You might also like