It-222 Reviewer

The document discusses big data characteristics including volume, velocity, variety, variability, veracity and value. It also covers Hadoop, data analytics, predictive analytics, NoSQL and key components of big data including HDFS, MapReduce, Hive and Pig.

Uploaded by

nimfadelgado11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views3 pages

It-222 Reviewer

Uploaded by

nimfadelgado11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

IT 222 POINTERS TO REVIEW

BIG DATA CHARACTERISTICS

(1) VOLUME: Quantity of data to be stored
DATA INFORMATION DECISION MAKING CYCLE
• As the quantity of data needing to be stored
increases, the need for larger storage
devices increases as well.
Scaling up is keeping the same
number of systems but migrating
each one to a larger system •
Scaling out means the workload
exceeds server capacity, it is
spread out across several servers
(2) VELOCITY: Speed at which data is
entered into the system and must be
processed.
• Can be broken down into two categories:
CASE TOOLS Stream processing focuses on
Computer-aided Systems Engineering input processing, and requires
• Automated framework for SDLC analysis of the data stream as it
• Structured methodologies and powerful enters the system.
graphical interfaces Feedback loop processing refers
Front-end CASE tools to the analysis of the data to
provide support for planning, analysis, and produce actionable results.
design phases (3) VARIETY: Variations in the structure of
Back-end CASE tools provide support for data to be stored
coding and implementation phases Structured data fits in a predefined
Typical CASE tool has five components data model.
CASE TOOLS COMPONENT Unstructured data is not organized
Graphic design to fit into a predefined data model
• Produce structured diagrams (DFDs, Semi-structured data combines
ERDs, class diagrams, object diagrams) elements of both – some parts of
Screen painters and report generators the data fit a predefined model
• Produce the information system’s input while other parts do not.
and output formats (enduser interface) VARIABILITY: changing in the meaning of
Integrated repository data based on context
• Stores and cross-references the system • Sentimental analysis attempts to
design data; includes a comprehensive data determine the attitude of a
dictionary statement (positive, negative,
Analysis segment neutral)
• Provide a fully automated check on system VERACITY: trustworthiness of data
consistency, syntax, and completeness VALUE: the degree to which the data can
Program documentation generator be analyzed to provide meaningful insights •
MANAGING USERS AND ESTABLISHING VISUALIZATION: the ability to graphically
SECURITY present the data in such a way as to make it
User: uniquely identifiable object understandable to users
• Allows a given person to log on to the HADOOP
database • De facto standard for most Big Data
Role: a named collection of database storage and processing
access privileges • Java-based framework for distributing and
• Authorizes a user to connect to the processing very large data sets across
database and use system resources clusters of computers
Profile: name collection of settings Most important components:
• Controls how much of a resource a given • Hadoop Distributed File System
user can use (HDFS): low-level distributed file
processing system that can be used DATA ANALYTICS
directly for data storage • Subset of business intelligence (BI)
• MapReduce: programming model functionality that encompasses
that supports processing large data mathematical, statistical, and modeling
sets techniques used to extract knowledge from
HADOOP DISTRIBUTED FILE SYSTEM data
Approach based on several key * Continuous spectrum of
assumptions: knowledge acquisition that goes
• High volume: default block size is 64MB from discovery to explanation to
and can be configured to even larger values prediction
• Write-once, read-many: model simplifies • Explanatory analytics focuses on
concurrent issues and improves data discovering and explaining data
throughput characteristics based on existing data
• Streaming access: Hadoop is optimized for • Predictive analytics focuses on predicting
batch processing of entire files as a future data outcomes with a high degree of
continuous stream of data accuracy
• Fault tolerance: HDFS is designed to PREDICTIVE ANALYTICS
replicate data across many different devices • Refers to the use of advanced
so that when one fails, data is still available mathematical, statistical, and modeling tools
from another device to predict future business outcomes with a
• Data node communicates with name node high degree of accuracy
by regularly sending block reports and • Focuses on creating actionable
heartbeats models to predict future behaviors
HADOOP ECOSYSTEM and events
• Most BI vendors are dropping the
term data mining and replacing it
with predictive analytics
• Models used in customer service, fraud
detection, targeted marketing, and optimized
pricing
• Can add in many different ways
but needs to be monitored and
• Map Reduce Simplification Applications: evaluated to determine return on
Hive is a data warehousing system investment
that sits on top of HDFS and NoSQL
supports its SQL-like language • Name given to non-relational database
Pig compiles a high-level scripting technologies developed to address Big Data
language (Pig Latin) into challenges
MapReduce jobs for executing in • Key-value (KV) databases store data as a
Hadoop collection of key-value pairs organized as
• Data Ingestion Applications: buckets which are the equivalent of tables
Flume is a component for ingesting • Document databases store data in key-
data in Hadoop value pairs in which the value components
Sqoop is a tool for converting data are tag-encoded documents grouped into
back and forth between a relational logical groups called collections
database and the HDFS
• Direct Query Applications:
Hbase is a column-oriented NoSQL
database designed to sit on top of
the HDFS that quickly processes
sparse datasets
Impala was the first SQL-on-
Hadoop application
NoSQL
• Column-oriented databases refer to two
technologies:
• Column-centric storage: data
stored in blocks that hold data from
a single column across many rows
• Row-centric storage: data stored
in a block that holds data from all
columns of a given set of rows

• Graph databases store data on

relationship-rich data as a collection of
nodes and edges DBMS FACILITIES xxx
• Properties are the attributes of a ADVANTAGES OF SQL xxx
node or edge of interest to a user
• Traversal is a query in a graph DBMS facilitates:
database • Interpretation and presentation of
data
• Distribution of data and
information
• Preservation and monitoring of
data • Control over data duplication
and use

Hadoop 2 Quick Start Guide PDF
100% (1)
Hadoop 2 Quick Start Guide PDF
736 pages
Bda Test1 Key Answers
No ratings yet
Bda Test1 Key Answers
7 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Big Data Lec4
No ratings yet
Big Data Lec4
38 pages
Big Data Framework
No ratings yet
Big Data Framework
6 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
BDA 02 - Fundamentals
No ratings yet
BDA 02 - Fundamentals
64 pages
Adbms Final Reviewer Upd
No ratings yet
Adbms Final Reviewer Upd
6 pages
TIE - 21CS71 SIMP With Key Answers
No ratings yet
TIE - 21CS71 SIMP With Key Answers
19 pages
BAD601 Big Data Model Question Paper Solution Search Creators
No ratings yet
BAD601 Big Data Model Question Paper Solution Search Creators
50 pages
Biggdata
No ratings yet
Biggdata
24 pages
Parcial Cono 1 21
No ratings yet
Parcial Cono 1 21
21 pages
Parcial Cono 1 14
No ratings yet
Parcial Cono 1 14
14 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
48 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
A STUDY ON BIG DATA HADOOP Nandha Kumar
No ratings yet
A STUDY ON BIG DATA HADOOP Nandha Kumar
7 pages
(IJCST-V5I4P10) :M Dhavapriya
No ratings yet
(IJCST-V5I4P10) :M Dhavapriya
5 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
No ratings yet
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
33 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Data Science and Big Data UNIT 3
No ratings yet
Data Science and Big Data UNIT 3
11 pages
Digitization Week 3
No ratings yet
Digitization Week 3
13 pages
BIG Data1
No ratings yet
BIG Data1
49 pages
Ese Bda
No ratings yet
Ese Bda
28 pages
Big Data Unit 1 AKTU Notes
No ratings yet
Big Data Unit 1 AKTU Notes
87 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data and Hadoop Overview
100% (1)
Big Data and Hadoop Overview
17 pages
Lecture 2
No ratings yet
Lecture 2
25 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
Big Data
No ratings yet
Big Data
29 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Lecture8 - Big Data (Hadoop)
No ratings yet
Lecture8 - Big Data (Hadoop)
29 pages
Bigdata Unit1
No ratings yet
Bigdata Unit1
62 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
05 Database Management Systems
No ratings yet
05 Database Management Systems
37 pages
07 BigData DataAnalysis
No ratings yet
07 BigData DataAnalysis
66 pages
Lecture 8-Is Infrastructure DBMS
No ratings yet
Lecture 8-Is Infrastructure DBMS
34 pages
BDA Class3
No ratings yet
BDA Class3
15 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data Hadoop Training 8214944.ppsx
No ratings yet
Big Data Hadoop Training 8214944.ppsx
52 pages
Big Data S All Units
No ratings yet
Big Data S All Units
122 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
01 Unit-I Introduction To Big Data
No ratings yet
01 Unit-I Introduction To Big Data
11 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Chapter 5 ITM100
No ratings yet
Chapter 5 ITM100
5 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Last Min Preparation - Big Data
No ratings yet
Last Min Preparation - Big Data
5 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
DBeaver Essentials: Definitive Reference for Developers and Engineers
From Everand
DBeaver Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exp 9 - Merged
No ratings yet
Exp 9 - Merged
13 pages
Krishna Chaitanya 30.nov
No ratings yet
Krishna Chaitanya 30.nov
2 pages
Big Data and Hadoop - Suzanne
No ratings yet
Big Data and Hadoop - Suzanne
5 pages
Installing Hadoop in Ubuntu in Virtual Box Instructions
No ratings yet
Installing Hadoop in Ubuntu in Virtual Box Instructions
4 pages
Isilon Onefs 8.1.2 Release Notes
No ratings yet
Isilon Onefs 8.1.2 Release Notes
36 pages
UNIT V-Cloud Computing
No ratings yet
UNIT V-Cloud Computing
33 pages
Quick Look: HDFS: Assumptions and Goals
No ratings yet
Quick Look: HDFS: Assumptions and Goals
5 pages
Practice 2
No ratings yet
Practice 2
7 pages
SMAI - Big Data Engineer - JD
No ratings yet
SMAI - Big Data Engineer - JD
2 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Venkata Sai (Sr. GCP Data Engineer)
No ratings yet
Venkata Sai (Sr. GCP Data Engineer)
7 pages
Big Data All Kumar
No ratings yet
Big Data All Kumar
24 pages
Sezojudunodanibimepu
No ratings yet
Sezojudunodanibimepu
4 pages
Information Retrieval Journal
No ratings yet
Information Retrieval Journal
33 pages
Module 1 Cloud Computing
No ratings yet
Module 1 Cloud Computing
88 pages
External Tables: - Not Just Loading A CSV File Kim Berg Hansen Senior Consultant
No ratings yet
External Tables: - Not Just Loading A CSV File Kim Berg Hansen Senior Consultant
57 pages
AICTE SPONSORED Faculty Development Programme (FDP) On "DATA SCIENCE RESEARCH AND BIG DATA ANALYTICS"
No ratings yet
AICTE SPONSORED Faculty Development Programme (FDP) On "DATA SCIENCE RESEARCH AND BIG DATA ANALYTICS"
28 pages
A6515 BDA Question Bank
No ratings yet
A6515 BDA Question Bank
9 pages
Apache Storm Tutorial
100% (1)
Apache Storm Tutorial
64 pages
Amazon Emr Management Guide
No ratings yet
Amazon Emr Management Guide
314 pages
IoT Enabled Fuel Level Monitoring and Automatic Fuel Theft Detection System
No ratings yet
IoT Enabled Fuel Level Monitoring and Automatic Fuel Theft Detection System
8 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Lakehouse: A New Generation of Open Platforms That Unify Data Warehousing and Advanced Analytics
No ratings yet
Lakehouse: A New Generation of Open Platforms That Unify Data Warehousing and Advanced Analytics
8 pages
Cloud Computing Lab Manual
No ratings yet
Cloud Computing Lab Manual
50 pages
(Ebook) Cloud Computing Basics by T.B. Rehman & PHD
No ratings yet
(Ebook) Cloud Computing Basics by T.B. Rehman & PHD
86 pages
CH 6 Web Mining and Other Data Mining
No ratings yet
CH 6 Web Mining and Other Data Mining
19 pages
Varma Resume
No ratings yet
Varma Resume
6 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
55 pages
Use Iot To Advance Railway Predictive Maintenance Whitepaper
100% (1)
Use Iot To Advance Railway Predictive Maintenance Whitepaper
28 pages

It-222 Reviewer

Uploaded by

It-222 Reviewer

Uploaded by

IT 222 POINTERS TO REVIEW

BIG DATA CHARACTERISTICS

• Graph databases store data on

You might also like