BD - Unit - I - Introduction To Big Data
BD - Unit - I - Introduction To Big Data
Syllabus
1
Unit - I
Introduction to Big Data: Big data is high-volume, high-velocity
and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight
and decision making is called Big Data.
Big Data is extremely large data sets that may be analysed
computationally to reveal patterns, trends, and associations,
especially relating to human behaviour and interactions.
The Big Data caption is
Types of Data:
1. Structured Data- These data is organized in a highly
mechanized and manageable way. Ex: Tables, Transactions,
Legacy Data etc…
2. Unstructured Data- These data is raw and unorganized, it
varies in its content and can change from entry to entry. Ex:
Videos, images, audio, Text Data, Graph Data, social media etc.
3. Semi-Structured Data- Ex: XML Database, 50% structured and
50% unstructured.
3
Fig: Structure of Big Data 4
Big Data Advantages
Flexible schema
Massive scalability
Cheaper to setup
Understanding and Targeting Customers
Understanding and Optimizing Business Process
Improving Science and Research
Improving Healthcare and Public Health
Financial Trading
Improving Sports Performance
Improving Security and Law Enforcement
No declarative query language
Eventual consistency – higher performance
Detect risks and check frauds
Reduce Costs 5
Fig) Advantages of Big Data 6
Big Data Disadvantages
Data Complexity
Data Volume
Data Velocity
Data Variety
Data Veracity
Capture data
Curation data
Performance
Storage data
Search data
Transfer data
Visualization data
Data Analysis
Privacy and Security 8
Big Data Tools
Big Data Tools are Hadoop, Cloudera, Datameer, Splunk, Mahout, Hive, HBase,
LucidWorks, R, MapR, Ubuntu and Linux flavors.
9
Big Data Applications
Social Networks and Relationships
Cyber-Physical Models
Internet of Things (IoT)
Retail Market
Retail Banking
Real Estate
Fraud detection and prevention
Telecommunications
Healthcare and Research
Automotive and production
Science and Research
Trading Analytics 10
Fig: Applications of Big Data Analytics 11
Big Data consists of
1. Big Data Characteristics
2. Types of Data Sources
3. Types of Data Elements
4. Analytics Process Model
5. Outlier Detection
6. Sampling
7. Missing Values
12
1. Big Data Characteristics: Big data is high-volume, high-velocity
and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight
and decision making is called Big Data.
Big Data is extremely large data sets that may be analysed
computationally to reveal patterns, trends, and associations,
especially relating to human behaviour and interactions.
Earlier it was assessed in megabytes and gigabytes but now the
assessment is made in terabytes and peta bytes.
3. Variety: Data types or The range of data types & sources or Data
with multiple formats. Big data draws from text, images, audio,
video; plus it completes missing pieces through data fusion.