0% found this document useful (0 votes)

11 views66 pages

Big Data Analytics - Overview

The document provides an overview of Big Data Analytics, discussing its significance, sources, and the technologies used to manage and analyze large data sets. It highlights the explosive growth of data, the challenges posed by its volume, variety, and velocity, and the emergence of new technologies like NoSQL databases and Apache Hadoop. Additionally, it outlines the skills required for data science and the applications of big data in various sectors.

Uploaded by

haslinda Abdul Sahak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views66 pages

Big Data Analytics - Overview

Uploaded by

haslinda Abdul Sahak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Overview of Big

Data Analytics MCSD 1053

DATA SCIENCE GOVERNANCE
Framework
OVERVIEW 1- BIG DATA ANALYTICS FRAMEWORK
Contents of the Overview

• What is Big Data? Why Should We Care?

• Who Uses Big Data & How?
• Big Data Skills & Technologies
Part 1

What is Big Data?

Why should We Care?

Copyright © 2015 Andy Koronios and Jing Gao, All Rights Reserved.
Some things are SO BIG,
that they have implication for
EVERYONE!
Big Data is one of those things
Source: Peppard 2011
Everything, Everywhere,
Intelligent, Instrumented &
Interconnected world!

The Internet of Things

More Data, More Often, From More Sources…..
Big Data Sources
COMPARATIVE VOLUMES

• KLSE STOCK EXCHANGE –

1TB/DAY

• ENTERPRISE DATA • MALAYSIA AIRLINES –

WAREHOUSE – 1 TB 1TB/MIN

Square Kilometre Array (SKA) radio telescope could generate more data per day than the entire Internet

11
Data Volumes Are Exploding!
Torrents of data • 40% increase per year
• 90% generated in the
last 2 years
• Only 5% is structured
Social Media & World Population
2014
2009

2010

Source:
McCridle Social Media Highlights, 2014
Everything we
do is leaving a
digital trace
Laptops and Smartphones Lead Data Traffic Growth
• 92 percent: Compound annual growth rate in data
traffic from 2010 to 2015.
• 5.6 billion: Number of personal devices connected to
mobile networks by 2015.
• 1.5 billion: Number of machine-to-machine nodes. 66
percent: Portion of data traffic allocated to video by
2015.
• 159 percent: Increase in global mobile data traffic
from 2009 to 2010.
• 129 percent: Compound annual growth rate of
mobile data traffic growth projected in Middle East
and Africa over 2010 to 2015.
• 248 petabytes: Amount of monthly data expected
from tablets in 2015. That's more than the entire
global mobile network in 2010.
• 295 petabytes: Amount of mobile data traffic
expected to come from machine-to-machine
connections in 2015.
• 613 kbps: Average smartphone connection speed in
2009.
• 4,404 kbps: Average smartphone connection speed in
2015.
The V’s of Big Data
‘Big’Data @ Rest & In Motion
t Millions of times
R e s
@ more data
a
Dat
DWH

Thousands of
times faster
D a ta i n
M o ti o n

0011010100100100100110100101010011100101001111001000100100010010001000100101
Data
Velocity?

■ Speed data is generated

■ Speed data is transferred
& analyzed

18
SELF-DRIVING CAR

•Data analyzed while generated, in memory

19
BIG DATA VARIETY
LIFE-CRITICAL DATA
4 Distruptive Technology Clusters in
4th Industrial Revolution
Trends of New Business Models
New Value Pools
Digital Capacity Trends

Information
Flow

Information
Stock

Information
Computation
Data Treatment Trends

03 Real time data

No more random
02 sampling 04 Merged data sources

Cheap non-traditional Self-learning

01 data sources 05 algorithm

27
Data Sources Trends

DIGITALLY PASSIVELY AUTOMATICALLY GEOGRAPHICALLY CONTINUOUSLY

GENERATED PRODUCED COLLECTED OR TEMPORALLY ANALYSED
The data are created A by product of our There is a system in TRACKABLE Information is
digitally, and can be daily lives or place that extracts Mobile phone relevant to human
stored using a series interaction with and stores the location data or call well-being and
of ones and zeros, digital services relevant data as it is duration time. development and
and can be generated can be analysed in
manipulated by real-time
computers.
Data Types Trends

DATA EXHAUST PHYSICAL SENSORS ONLINE INFORMATION CITIZEN REPORTING OR

Passively collected Satellite or infrared imagery Web content such as news CROWD-SOURCED DATA
transactional data from of changing landscapes, media and social media Information actively
people’s use of digital traffic patterns, light interactions (e.g. blogs, produced or submitted by
services (mobile phones, emissions, urban Twitter), news articles citizens through mobile
purchases, web searches) development and obituaries, e-commerce, job phone-based surveys,
and/or operational metrics topographic changes, etc; postings; this approach hotlines, user- generated
and other real-time data this approach focuses on considers web usage and maps, etc; While not
collected by agencies to remote sensing of changes content as a sensor of passively produced, this is a
monitor their projects and in human activity human intent, sentiments, key information source for
programmes (stock levels, perceptions, and want. verification and feedback
school attendance)

1 2 3 4
How well can we analyze
and use this ever increasing
volumes of data?
So what about ‘big data’?
1. A problem
The Volume, Variety and Velocity of data generated are stressing our IT
Systems and ability to handle the data

2. A Capability
That will allow us to squeeze more value from data

3. An Opportunity
To optimise processes, enhance decision-making & monetise data through
new business models
Part 2
Who Uses Big Data and How?
Big Data Analytics

Valuable
Advanced Insights
Big Data Analytics

Structured or Statistical Methods Identify Patterns

Unstructured Machine Learning Predict & Forecast
Artificial Intelligence… Optimization
Decision Making
Example:

Talent Scouting??
Data-Driven-Decision
Part 3
What are the Skill Sets and
Technologies underpinning Big Data?

Copyright © 2015 Andy Koronios and Jing Gao, All Rights Reserved.
Big Data Technology
• Relational databases failed to store and process Big Data.
• As a result, a new class of big data technology has emerged
and is being used in many big data analytics environments.
What is Big Data Technology?
• Set of tools or mechanisms that can make your computer process data that is
too big for it
High-level Declarative Language For Writing
Queries And Data Analysis

• PIG from Yahoo / Apache

• JAQL from IBM
• Hive from Facebook, etc
NoSQL Databases & Data Management Tools

• Store and manage data not using Structured Query Language

(SQL), relational database schema, or other common relational
database internal operations
• Non- relational database management system used where no fix
schemas are required and data is scaled horizontally
Categories of NoSQL databases
• KEY-VALUE PAIR (e.g. Cassandra)
• keys used to get Value from opaque Data blocks ♣ Hash map ♣ Tremendously fast
• DOCUMENT DATABASE (e.g. MongoDB and CouchDB)
• Again a key value store but value is in form of document. • Documents are not of fixed schemas •
Documents can be nested • Queries based on content as well as keys • Use cases: blogging
websites
• COLUMNAR DATABASE (eg. Microsoft Columnstore, SAP HANA)
• Works on attributes rather than tuples ♣ Key here is column name and value is contiguous column
values ♣ Best for aggregation queries ♣ Trend : select (1 or 2 column’s values ) where ( same or the
other column value ) = some value.
• GRAPH DATABASES (e.g. Neo4j and Giraph)
• A collection of nodes and edges • Nodes represent data while edge represent link between them •
Most dynamic and flexible
Apache Hadoop
• Open Source software framework
• Distributed, scalable system for large data sets on
commodity hardware
• Top-level written in Java
• Architecture:
• File system - Hadoop Distributed File Systems (HDFS)
• Process (Programming model) – MapReduce
• Major Users: Facebook, Yahoo, Amazon.com, Microsoft,
etc
Apache Hadoop Ecosystem
Hadoop Distributed File System (HDFS)

File NN

DN DN
DN
Client
DN DN

• HDFS stores data in distributed,scalable and fault- tolerant way

• Name node (NN) have metadata about data on DataNodes (DN)
• DN actually have data on them in form of blocks and they are capable of communicating
• Data is stored in form of compressed files across n number of commodity servers
• Data is stored in form of tables and columns with relation in them
Map Reduce
JT
File NN

TT DN DN TT
DN
Client
TT DN DN TT

• Mappers extract data from HDFS and put into maps

• Reducers aggregate the results produced by the mappers
• Job Tracker (JT) is the server component
• Find how many blocks in data
• Contact NN
• Send program to data node
• Task Tracker (TT) is the slave component
• Complete process in its DN
The Apache Hadoop Family
Name Description
Hadoop Common Common utilities
HDFS Distributed file system

YARN Job scheduling and cluster resource management

MapReduce Parallel processing of large data sets
Chukwa A data collection system for large distributed systems
HBase Scalable distributed db supporting structured data storage
Hive A DWH infrastructure providing logic-driven, ad hoc querying &
can cater unstructured data.
Mahout Scalable machine learning & data mining library (e.g. k-means
for data clustering, random forest and logistic regression for
data classification). Used widely to develop recommender
system for online businesses.
The Apache Hadoop Family
(cont.)
Name Description

Pig High level, procedural, data-flow language to process data, speed up code
and make it handier. It can extract, transform and load data (ETL)
Zookeeper High performance co-ordination service for distributed applications

Flume Responsible for collecting, aggregating and moving data into HDFS.

Sqoop (SQL to Hadoop) To transfer data between Hadoop clusters and relational databases (such as
Oracle or Microsoft SQL Server) that traditionally use SQL instructions.
Kerberos Provides authentication services in Hadoop clusters

Serengeti Virtualization tool that helps build virtual Hadoop cluster in the cloud

Spark Processing engine that performs at speeds up to 100 times faster than Map
Reduce for iterative algorithms or interactive data mining. Provides in-
memory cluster computing for speed. Supports Java, Scala, and Python APIs
and combines SQL, streaming and complex analytics .Hadoop, Mesos,
standalone, or in the cloud. Can access diverse data sources such as HDFS,
Cassandra, HBase, or S3
BIG DATA ARCHITECTURE
Data Sources Data
Data Storage Data Provisioning Data Discovery Applications
Consolidation
Structured & Unstructured jConnect jView Analytics Services

Internal Data dictionaries & data

System model descriptive Routine
select shard 1
report
Operational Business rules & dictionaries Client
Systems New data diagnostic
shard 2 s
extract Dashboard Business

Security & access control

Web data
• Social media predictive users
• News

……
Data Lake
Staging Database

• Forum Business
• Open/public transform Alert analysts
shard n prescriptive
Mixed media Data
Incremental scientists
integrate jClean visualization OLAP analysis
data
Machine data Knowledge
self-service Workers
datamart 1 dynamic extract Web apps
Spatial temporal load
data Smart Apps
Data Warehouse autonomous
• Map
• Land generation
datamart 2
Integrity Mobile apps
External data Data stream check Tools

……
Bigchain DB Smart Data Lake
• Weather data (Future) (Future)
• Commercial Automated
BI data
• Stock data datamart n system

replication archives
JCORP DATA ADVISORY: MongoDB Data Lake Design

COLLECTION: HR COLLECTION: FINANCE COLLECTION:

NEWS & SOCIAL
MEDIA
MANAGEMENT DASHBOARD: DATA SOURCES

INTERNAL SERVER SEMI-PRIVATE

SERVER

SAP HANA DB SQL SERVER SQL SERVER

DB DB

1. Projects Data
1. HR Data 1. Finance Data
2. Procurement stored-procedure
2. Intrapreneur Data
Data
MANAGEMENT DASHBOARD: DATA SOURCES

Data Data
Dashboard Info System Database Data Extraction Data Captured
Frequency Relationship

• Finance Periodic Financial SQL Server System Report -> Excel Monthly – 1. Closing Account
• Intrapreneur Reporting (FRP) & Sheet export to • Revenue
Account shared folder • Income
Consolidation Share folder-> Statement
Script -> MongoDB • Dimension
• Disclosure
• Corporate Info
2. Flows
3. Investment
4. Intercompany
5. Partner
6. Dimension
MANAGEMENT DASHBOARD: DATA SOURCES
Data
Dashboard Info System Database Data Extraction Data Frequency Data Captured
Relationship

• HR SAP SAP HANA SAP Ad Hoc Monthly – 1. HR Info

• Procureme Query -> Excel export to shared • New Staff
nt Sheet folder • Active
Staff
Share folder-> • Resigned
Script -> Staff
MongoDB • Payroll
(Departm
ent)

2. Material
Management
1. Purchasin
g
Projects UI Template / SQL Server DB Connection Monthly – direct 1. Projects
Excel (Internal connection 1. Project
Access) Info
2. Project
Timeline
SOCIAL MEDIA DASHBOARD

Dashboard Info System Analysis Method

• JCORP • In house 1. Sentiment analysis Vader Sentiment

• KPJ development
• QSR (KFC & 2. Social media engagement WordToVec
PIZZAHUT)
3. Related News Machine Learning

4. Related Word cloud

KPJ ANALYTICS DASHBOARD

Dashboard Info System Analysis Method

• KPJ Internal system (KPJ) Descriptive Deep Learning
Operation
• KPJ Predictive
Analytics
Random Forest
Process Job for Data Source Data Lake in MongoDB

Data
Data Sources Consolidation
Data Storage (DATA LAKE in MONGODB)

STAGING DATABASE CREATE API

Source: SAP HANA
HR (Personnel data) API: HR
HR1
PROCUREMENT HR2 HR DATA
HR3 TEMPLATE

HR4

EKKO
Source: SQL SERVER EKKN API: PROC
FINANCE EKPO

PROJECT MANAGEMENT IVR FINANCE DATA API:

TEMPLATE FINANCE
PFR

ACT

1. Data prepared by JCORP from server. API was created to MONGODB (flat file) was created and ready to be used for
2. Staging Database have 2 process: upload data from descriptive and diagnostic analysis (e.g. In TABLEAU).
a) PROCESS 1: Staging DB to
• For HR and FINANCE, the data was cleaned and prepared based on the template MONGODB
given by UTM Data team.
b)PROCESS 2:
• For PROC, the data from JCORP will be used directly to run the API.
Process Job for Data Lake (MongoDB) Data Warehouse /
Mart in MySQL Server

Data Storage
DATA LAKE (MONGODB)

CREATE API
MONGODB (flat file) was created and ready to be used for API created to Data warehouse to be used especially for predictive and prescriptive analysis.
analysis in TABLEAU. extract data
from data mart
into data
warehouse in
MySQL server
DASHBOARD - FINANCIAL
DASHBOARD – HUMAN RESOURCE
DASHBOARD - PROJECTS
DASHBOARD – KPJ OPERATIONS & PERFORMANCE
DASHBOARD – SOCIAL MEDIA
THANK YOU

In the Name of God for Mankind

www.utm.my

Data Science
No ratings yet
Data Science
87 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Big Data-One
No ratings yet
Big Data-One
9 pages
0 Principles of Big Data
No ratings yet
0 Principles of Big Data
70 pages
Big Data Intro
No ratings yet
Big Data Intro
32 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
31 pages
BDA Unit-1
No ratings yet
BDA Unit-1
33 pages
Big Data Analytics - Lecture Slides
No ratings yet
Big Data Analytics - Lecture Slides
72 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Unit 1 (Diagrams)
No ratings yet
Unit 1 (Diagrams)
10 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
BDA - Lecture 3
100% (1)
BDA - Lecture 3
17 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Processign Using Hadoop
No ratings yet
Processign Using Hadoop
44 pages
Big Data - 1
No ratings yet
Big Data - 1
46 pages
Hadoop & Big Data Overview
No ratings yet
Hadoop & Big Data Overview
23 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Module 1
No ratings yet
Module 1
54 pages
BIGDATAUNIT1 AKTUpdf
No ratings yet
BIGDATAUNIT1 AKTUpdf
33 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Chap 1
No ratings yet
Chap 1
41 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
0% (1)
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Beyond The Hype
No ratings yet
Beyond The Hype
30 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Big Data
No ratings yet
Big Data
31 pages
BigData AmberSahai1
No ratings yet
BigData AmberSahai1
32 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Info System Big-Data-by-Dex
No ratings yet
Info System Big-Data-by-Dex
37 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
43 pages
Detailednotes - Unit1 - Big Data
No ratings yet
Detailednotes - Unit1 - Big Data
22 pages
Big Data
No ratings yet
Big Data
63 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Types of Digital Data & Big Data
No ratings yet
Types of Digital Data & Big Data
136 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
BDS DS307 Unit-1
No ratings yet
BDS DS307 Unit-1
46 pages
Lecture8 - Big Data (Hadoop)
No ratings yet
Lecture8 - Big Data (Hadoop)
29 pages
Chapter 4-Data Governance Business Case-V2
No ratings yet
Chapter 4-Data Governance Business Case-V2
24 pages
Adiba Process Framework
No ratings yet
Adiba Process Framework
2 pages
Chapter 2 Data Literacy and DG Concepts
No ratings yet
Chapter 2 Data Literacy and DG Concepts
34 pages
01 Introduction 2024
No ratings yet
01 Introduction 2024
32 pages
Chapter 1-Intro To DG
No ratings yet
Chapter 1-Intro To DG
31 pages
Systems Integration
No ratings yet
Systems Integration
28 pages
TRIPOD BETA BRONZE - Learning Notes (Scanned Copy)
No ratings yet
TRIPOD BETA BRONZE - Learning Notes (Scanned Copy)
108 pages
030 Forklift Inspection
100% (2)
030 Forklift Inspection
1 page
Inherently Safer Design Review and Their Timing During Chemical Process Development and Design
100% (1)
Inherently Safer Design Review and Their Timing During Chemical Process Development and Design
12 pages
Inherent Safety Based Corrective Actions in Accident Prevention
No ratings yet
Inherent Safety Based Corrective Actions in Accident Prevention
5 pages
Operating System Concepts Test
No ratings yet
Operating System Concepts Test
11 pages
18nov-5th Sem Green Synthesis
No ratings yet
18nov-5th Sem Green Synthesis
21 pages
Partial Derivatives Quiz Analysis
No ratings yet
Partial Derivatives Quiz Analysis
8 pages
Getting Started With Excel: Comprehensive
0% (1)
Getting Started With Excel: Comprehensive
10 pages
160719a0cd3011 - 29094359708
No ratings yet
160719a0cd3011 - 29094359708
2 pages
lp4 Hydro
No ratings yet
lp4 Hydro
26 pages
Evolution of Handwriting Systems
100% (2)
Evolution of Handwriting Systems
38 pages
6089202f4e466 The Amorphous Nature of Agile No One Size Fits All
No ratings yet
6089202f4e466 The Amorphous Nature of Agile No One Size Fits All
42 pages
Product List
No ratings yet
Product List
42 pages
Flipkart Sample Opposition
100% (1)
Flipkart Sample Opposition
76 pages
MATH 1300-MIDTERM # 2-2012: For Long Answer Questions, YOU MUST SHOW YOUR WORK
No ratings yet
MATH 1300-MIDTERM # 2-2012: For Long Answer Questions, YOU MUST SHOW YOUR WORK
8 pages
Physics1 PDF
No ratings yet
Physics1 PDF
7 pages
Array Formulas
No ratings yet
Array Formulas
12 pages
Aspiring Entrepreneur's CV
No ratings yet
Aspiring Entrepreneur's CV
4 pages
WBI04 01 MSC 20200123
No ratings yet
WBI04 01 MSC 20200123
29 pages
In An Artist's Studio
50% (2)
In An Artist's Studio
4 pages
Bca Muj
No ratings yet
Bca Muj
4 pages
DiGi KaGB T&C
No ratings yet
DiGi KaGB T&C
5 pages
Ec PDF
No ratings yet
Ec PDF
1,602 pages
Vacuum Test Procedure (VCP)
No ratings yet
Vacuum Test Procedure (VCP)
5 pages
Purbasari and Purbararang Script
No ratings yet
Purbasari and Purbararang Script
22 pages
Assignment/ Tugasan HBEC4403 Social and Emotional Development of Young Children/ September 2023 Semester
No ratings yet
Assignment/ Tugasan HBEC4403 Social and Emotional Development of Young Children/ September 2023 Semester
12 pages
Reto 4
No ratings yet
Reto 4
5 pages
Plus One Notes - Eng
No ratings yet
Plus One Notes - Eng
11 pages
EMCP4.1 4.2 M05 CANExtMods EN INS
No ratings yet
EMCP4.1 4.2 M05 CANExtMods EN INS
14 pages
Chapter 4 (Answers)
No ratings yet
Chapter 4 (Answers)
5 pages
CRT Controller
No ratings yet
CRT Controller
42 pages
Lance Design For Argon Bubbling in Molten Metal
No ratings yet
Lance Design For Argon Bubbling in Molten Metal
12 pages
Statistical Tests - Handout PDF
No ratings yet
Statistical Tests - Handout PDF
21 pages
Ocular Ischemic Syndrome Case Report
No ratings yet
Ocular Ischemic Syndrome Case Report
18 pages

Big Data Analytics - Overview

Uploaded by

Big Data Analytics - Overview

Uploaded by

Overview of Big

Data Analytics MCSD 1053

• What is Big Data? Why Should We Care?

What is Big Data?

The Internet of Things

• KLSE STOCK EXCHANGE –

• ENTERPRISE DATA • MALAYSIA AIRLINES –

■ Speed data is generated

•Data analyzed while generated, in memory

03 Real time data

Cheap non-traditional Self-learning

DIGITALLY PASSIVELY AUTOMATICALLY GEOGRAPHICALLY CONTINUOUSLY

DATA EXHAUST PHYSICAL SENSORS ONLINE INFORMATION CITIZEN REPORTING OR

Structured or Statistical Methods Identify Patterns

• PIG from Yahoo / Apache

• Store and manage data not using Structured Query Language

• HDFS stores data in distributed,scalable and fault- tolerant way

• Mappers extract data from HDFS and put into maps

YARN Job scheduling and cluster resource management

Internal Data dictionaries & data

Security & access control

COLLECTION: HR COLLECTION: FINANCE COLLECTION:

INTERNAL SERVER SEMI-PRIVATE

SAP HANA DB SQL SERVER SQL SERVER

• HR SAP SAP HANA SAP Ad Hoc Monthly – 1. HR Info

Dashboard Info System Analysis Method

• JCORP • In house 1. Sentiment analysis Vader Sentiment

4. Related Word cloud

Dashboard Info System Analysis Method

STAGING DATABASE CREATE API

PROJECT MANAGEMENT IVR FINANCE DATA API:

In the Name of God for Mankind

You might also like