0% found this document useful (0 votes)

44 views

ITP250 Lecture 11 (2) - 4

Uploaded by

Husein Hage Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

ITP250 Lecture 11 (2) - 4

Uploaded by

Husein Hage Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Trends in databases

ITP 249
Lecture 11
Outline
• Big data
– The Features of Big Data
– What Drives Big Data?
– Big Data Applications
– Real-time Analytics and Its Impact
• In-memory db
• Columnar db
• Limitations of SQL
• NoSQL db
What is big data?
• Something large and full of information?
– Maybe but provides no information of what Big Data really is
• Universal Definition
– Extremely large data sets
– Grown beyond capacity of traditional tools
– Also the processes of leveraging the data (e.g. Analytics, BI, Data Mining)
• What kind of data?
– Every day we create 2.5 quintillion (2.5 × 1018) bytes of data
– 90 % of the data in the world today has been created in the last two years
– Sensors (IoTs), Blogs, Pics, Videos, E-commerce, GPS, etc
• Analytics and Research defines Big Data today
– More data, more analysis, more results
– Presents opportunity for deep analysis, pattern prediction and correlation
Structured vs. Unstructured Data
Structured Unstructured

• Strictly organized, common schema • No uniform structure

• Designed for management by computers • Designed for use by humans & devices
• Relational databases & spreadsheets • Word docs, PDFs, emails, videos, IoT sensor
• Standard search operations data, audio files, emails, HTML, & images
• Limited data visibility
With the rise of 4K video, medical images, IoT, digital
information, AI and analytics, the data explosion is
accelerating.

3X
Number of enterprises with 1PB+ unstructured
data grows from 2016 to 10174

90%
500

80 % 375

of all data was created

in the last 2 years1 Unstructured data3 250

Unstructured
331 EB
© Copyright IBM Corporation 2017

Object based storage capacity by 20212

File and Object
125

Projected
Structured block storage
Exabytes

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

5
The growing imperative of Business Data
Analytics have emerged for …to massive Interactive,
years from Transactional, Unstructured content
Structured data… Documents
Web Pages

Sales
transactions
Cameras

80 %
Databases

Text Messages
Is Unstructured
Emails

6
Who is using Big Data?
– Science / Reaseach (NASA / NOAA
– Pharma / Health
– Energy
– Media and Entertainment
– Manufacturing
– Finance
– All Businesses today leverage some form of big data
– References:
• http://www.cnbc.com/id/100792215
• http://video.cnbc.com/gallery/?video=3000168940
• http://www.cnbc.com/id/100638376
The Features of Big Data
• 7 ‘V’s that describe the features of big data
Volume
• Volume of data collected, stored, and shared is growing
faster than ever before
• Not all data are stored. Some are discarded, others are
archived. Even then the total volume is growing
Variety
• Source of data
• Form of data
• Business data,
social media data
• Multiple languages
• Formats – text,
voice, photos,
video, audio
Velocity
• Speed at which data are generated and collected
• Can also refer to how quickly data can be
processed
Variability
• Changes in the meaning of data over time or in
context (asset class over time)
• Data of unknown or indistinct type or structure
or format (number, text, emoji, etc)
• Sentiment analysis uses natural language
processing to derive the attitude of the writer
Veracity
• Reliability or truthfulness of data
• Errors and inaccuracies
• Separating noise from signal
Volatility
• Lifespan of data
• How long data are available
• How long should it be stored
Value
• Driving force of big data analytics is value
• Should provide benefit to someone
• Providing big data itself is a business
• Evaluate the benefit of investing in big data
against the cost
What continues to Drives Big Data?
• World is becoming more digital
• World is becoming more connected
• Electronic/digital devices are becoming more
economical (putting technology in the hands of
more people)
• Traditional forms of social communications are
being replaced with digital ones that are often
‘free’
Not just the Data =>
Big Data Applications
• Business Intelligence -> AI
• AI –> Machine Learning -> Deep Learning
• Application Caetogies:
– Statistical Applications (Trends)
– Predictive Analysis (Trends -> Predictions)
– Data modeling/Data Visualization
– What If scenarios
In-memory Databases (IMDB)
• Using RAM instead of hard disk for the database
• All relevant data are in memory all the time
• Speeds up queries to provide real time or near real time analytics
capabilities
• Innovations
• Data are stored in RAM
• Use of columnar storage for the relational database.
• Indexing (is free with columnar storage)
• Data compression
• Parallel data processing
• Partitioning data
• SAP HANA is an IMDB
Real-time Analytics and Its Impact

• Provide almost instantaneous feedback from analytics processing

• React to changing customer needs
• React to opportunities in real time
• Example of customer service in credit card companies
• Customer navigates the website but cannot find resolution
• Calls customer service rep
• Real time analytics helps improve customer experience
• Re-order call tree to the customer’s most likely reason for calling
• Prepopulate the rep’s screens
• Eliminate options from the phone tree based on customer’s browsing history
• Change language of chat or call
• Make promotional offers to customer
In-Memory Appliance Development
• Drivers
– Big data
– Predictive analytics
– Real-time analytics
– Self-service BI

• Enabling hardware innovations

– High-capacity RAM
– Multi-core processor architectures
– Massive parallel scaling
– Massively parallel processing (MPP)
– Large symmetric multiprocessors (SMP)

Image Source: Ralokota, R. (May 15, 2011). New tools for new times – primer on big data, Hadoop
and “in-memory” data clouds. Retrieved from http://practicalanalytics.wordpress.com/2011/05/15/new-tools-for-new-times-a-primer-o
n-big-data/
Performance Bottleneck Comparison
• Without high-capacity RAM  With high-capacity RAM
− Database stored on disk
− Database stored in memory
− Bottleneck: Latency between disk
− Bottleneck: Latency between
and RAM
CPU and RAM
− Orders of magnitude response
time improvements

Image Source:Morrison
, A. (2012). The art and science of new analytics technology. PwC Technology Forecast, 1, 31-43. Retrieved from http://www.pwc.com/en_US/
us/technology-forecast/2012/issue1/features/feature-art-science-analytics-technology.jhtml
Software That Leverages Hardware Innovations

Source: Plattner
, H. & Zeier, A. (2011). In Memory Data Management: An Inflection Point for Enterprise Applications. Retrieved from http://www3.weforum.org/docs/GITR/2012/GI
TR_Chapter1.7_2012.pdf
Another Innovation - Columnar Databases
Advantages Disadvantages
• Better I/O bandwidth utilization  Load times can be slow
• Higher cache efficiency  Less efficient for transactional
• Faster data aggregation processes
• High compression rates  Possibly slower relational
interfaces
• Column-based parallel processing
Columnar Storage Example
Country Customer Product Sold Pieces

USA 3000 DXTR1100 5

USA 4000 DXTR1100 21

Germany 23000 DXTR3100 12

Germany 17000 DXTR3100 34

Row table Column table

Row 1 USA 3000 DXTR1100 5 Column1 Column2 Column3 Column4

USA 3000 DXTR1100 5
Row 2 USA 4000 DXTR1100 21
USA 4000 DXTR1100 21

Row 3 DE 23000 DXTR3100 12 Germany 23000 DXTR3100 12

Row 4 DE 17000 DXTR3100 34 Germany 17000 DXTR3100 34

Super Simple App & Schema
Monolithic ERP Application with super simple
schema:
• Employee
• Salary
• Department
Modern Apps (Mobile/Social)

A new app comes along that needs to be ‘internet scale’. What if in

your schema…
• You need to add or remove fields, lots of them, frequently?
• You need another table with a ‘variable’ schema?

What if for your infrastructure…

• You need to scale out not up
• Writes are as numerous as reads
• Data in volume is high and growth rate is high
• Use is decentralized (web, mobile, IoT)
Limitations of SQL (RDBMS)
• Rigid schema, not easy to add columns
(attributes) as needed
• JOINs are expensive!
• Transaction handling is complex with millions of
concurrent users
• Requires some downtime
• Unstructured data is not easily handled
• Not adaptive to new requirements
NoSQL
• Not Only SQL
• Not based on relational databases
• They may support SQL like querying
• Based on key-value pairs
• Schema-less
• ACID transactions may be compromised to
increase performance, availability, speed.
Eventually consistent.
SQL vs. NoSQL
Enter NoSQL Data stores
• Key-Value: amazon dynamo
• Column: cassandra
• Graph DB: neo4j
• Document: mongodb
Key-Value Stores
• Use case:
– Quick lookups with no ‘relational’ component (no
joins)
– Quick and high scalability
– Often (mostly) in memory
• Example:

• Application:
– User session data between shared applications
Column Stores
• Use case:
– Super scalable
– Map Reduce support
• Example

• Application:
– Large scale realtime data logging (Finance, Web Analytics)
Graph DB
• Use Case:
– Dense network of strongly connected entities
– Nodes and relationships
– Graph Data Modeling
• Example:

• Application:
– Facebook graph search, Google knowledge graph,
Twitter
Document Store
• Use case:
– Semi-structured data with SQL-like queries
– Collections of related key-value pairs with variable
schemas
• Example:

• Application:
– Document driven web or other applications
Distributed Computing
• Apache Hadoop
• Distributed computing
• Parallel processing
When to…
Use an RDBMS when you need/have... Use NoSQL when you need/have...

Centralized applications (e.g. ERP) Decentralized applications (e.g. Web,

mobile and IOT)
Moderate to high availability Continuous availability; no downtime
Moderate velocity data High velocity data (devices, sensors, etc.)
Data coming in from one/few locations Data coming in from many locations
Primarily structured data Structured, with semi/unstructured
Complex/nested transactions Simple transactions
Primary concern is scaling reads Concern is to scale both writes and reads
Philosophy of scaling up for more Philosophy of scaling out for more
users/data users/data
To maintain moderate data volumes with To maintain high data volumes; retain
purge forever
What if you have both?
(and they are Big)
• SQL-like Distributed Query engines:
– Hive
– Presto
– Drill
– Impala
– Spark SQL
– Lingual

• Distributed computing platforms:

– Hadoop
– Spark
– Tez
When in doubt ask…
• What is the application use case(s)?
• What is the application(s) data model?
• What is the need for scalability on reads/writes?
• What is the query pattern for the application or
users?

2.3 Lab - Explore YANG Models Using The Pyang Tool
0% (3)
2.3 Lab - Explore YANG Models Using The Pyang Tool
2 pages
SBI SO Previous Year Paper IT Systems
0% (2)
SBI SO Previous Year Paper IT Systems
7 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
BDT 1
No ratings yet
BDT 1
49 pages
21ai402 Data Analytics Unit-1
No ratings yet
21ai402 Data Analytics Unit-1
37 pages
Unit 1.1 - Introduction to Big Data Analytics
No ratings yet
Unit 1.1 - Introduction to Big Data Analytics
19 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
Big Data
No ratings yet
Big Data
63 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
Unit-Iii CC&BD CS71
No ratings yet
Unit-Iii CC&BD CS71
89 pages
5 - Big Data Dimensions, Evolution, Impacts, and Challenges PDF
No ratings yet
5 - Big Data Dimensions, Evolution, Impacts, and Challenges PDF
11 pages
BDA-Unit-1
No ratings yet
BDA-Unit-1
23 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Big Data
No ratings yet
Big Data
31 pages
Big Data Presentation
No ratings yet
Big Data Presentation
22 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
The Next Frontier For Innovation, Competition and Productivity
No ratings yet
The Next Frontier For Innovation, Competition and Productivity
23 pages
5.innovating Big Data Analytic
No ratings yet
5.innovating Big Data Analytic
27 pages
Lecture 3-Introduction to Big Data
No ratings yet
Lecture 3-Introduction to Big Data
25 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Know Your Big Data
No ratings yet
Know Your Big Data
11 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Big Data Analytics PPT Fat 2
No ratings yet
Big Data Analytics PPT Fat 2
9 pages
Big-Data-ppt
No ratings yet
Big-Data-ppt
30 pages
Bda Aiml Note Unit 1
No ratings yet
Bda Aiml Note Unit 1
14 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Big Data Presentation
No ratings yet
Big Data Presentation
24 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Unit 1
No ratings yet
Unit 1
76 pages
Big Data Analytics
No ratings yet
Big Data Analytics
57 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
Unit 5 - Principles of Big Data 2
No ratings yet
Unit 5 - Principles of Big Data 2
14 pages
Cloud computing
No ratings yet
Cloud computing
86 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Articulo Examen Global - Ingles PROTEGIDO
No ratings yet
Articulo Examen Global - Ingles PROTEGIDO
10 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
Introduction to Data
No ratings yet
Introduction to Data
34 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Bigdata, Hadoop and HDFS: Evolution of Data
No ratings yet
Bigdata, Hadoop and HDFS: Evolution of Data
28 pages
DBMS Learning Material 1
No ratings yet
DBMS Learning Material 1
9 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Introduction To Big Data Analytics
100% (4)
Introduction To Big Data Analytics
112 pages
BDA PPT M1 P1 Big Data Stack
No ratings yet
BDA PPT M1 P1 Big Data Stack
44 pages
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
No ratings yet
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
33 pages
1 - Big Data
No ratings yet
1 - Big Data
204 pages
Bigdatappt
No ratings yet
Bigdatappt
31 pages
BDT Module 1
No ratings yet
BDT Module 1
107 pages
Hamid Seminar Doc
No ratings yet
Hamid Seminar Doc
57 pages
Unit 2
No ratings yet
Unit 2
35 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
Big Data
No ratings yet
Big Data
31 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Timescaledb: SQL Made Scalable For Time-Series Data: 1 Background
No ratings yet
Timescaledb: SQL Made Scalable For Time-Series Data: 1 Background
7 pages
springsaver
No ratings yet
springsaver
19 pages
RDBMS Important Questions
No ratings yet
RDBMS Important Questions
14 pages
Get Mastering Oracle SQL 1st Edition Sanjay Mishra free all chapters
100% (12)
Get Mastering Oracle SQL 1st Edition Sanjay Mishra free all chapters
82 pages
What Is RDBMS?: ID Name AGE Address Salary
No ratings yet
What Is RDBMS?: ID Name AGE Address Salary
5 pages
Plant Species Identification Based On Plant Leaf Using Computer Vision and Machine Learning Techniques
No ratings yet
Plant Species Identification Based On Plant Leaf Using Computer Vision and Machine Learning Techniques
12 pages
PDF
No ratings yet
PDF
2,001 pages
DPX Series Operator Training: FEMUR IMAGE: Acquisition and Analysis
No ratings yet
DPX Series Operator Training: FEMUR IMAGE: Acquisition and Analysis
35 pages
Kishan's Resume
No ratings yet
Kishan's Resume
1 page
Measures of Variability: - Range - Interquartile Range - Variance - Standard Deviation - Coefficient of Variation
No ratings yet
Measures of Variability: - Range - Interquartile Range - Variance - Standard Deviation - Coefficient of Variation
37 pages
My Resume in Innominds
No ratings yet
My Resume in Innominds
3 pages
DSS Ch03
No ratings yet
DSS Ch03
10 pages
Screenshot 2025-01-24 at 9.00.31 AM
No ratings yet
Screenshot 2025-01-24 at 9.00.31 AM
24 pages
SAP-HANA-Calculation-Views: Community
No ratings yet
SAP-HANA-Calculation-Views: Community
29 pages
SRS Blood Bridge System ESY-4.Docx
No ratings yet
SRS Blood Bridge System ESY-4.Docx
15 pages
SnowPro Advanced Data Engineer 1
No ratings yet
SnowPro Advanced Data Engineer 1
9 pages
Wytrt
No ratings yet
Wytrt
4 pages
Dataiku Datsheet
No ratings yet
Dataiku Datsheet
16 pages
DBMS - Bba Uni 3
No ratings yet
DBMS - Bba Uni 3
13 pages
Sid and Pid Together Form The Key For Catalog. The Catalog Relation Lists The
No ratings yet
Sid and Pid Together Form The Key For Catalog. The Catalog Relation Lists The
4 pages
Project Report on Local Air Quality Monitoring Dashboard
No ratings yet
Project Report on Local Air Quality Monitoring Dashboard
3 pages
2024_The Improvement of Accounting Work Efficiency and Quality Through Big Data Technology
No ratings yet
2024_The Improvement of Accounting Work Efficiency and Quality Through Big Data Technology
7 pages
Dynamic Web Design a Database A2
No ratings yet
Dynamic Web Design a Database A2
21 pages
Grafana
No ratings yet
Grafana
5 pages
Int 306
No ratings yet
Int 306
19 pages
Chapter 5 - Introduction To SQL: 5.1 SQL Basic Data Types
No ratings yet
Chapter 5 - Introduction To SQL: 5.1 SQL Basic Data Types
48 pages
Arun Nakirikanti
No ratings yet
Arun Nakirikanti
6 pages
PL 400
No ratings yet
PL 400
19 pages

ITP250 Lecture 11 (2) - 4

Uploaded by

ITP250 Lecture 11 (2) - 4

Uploaded by

Trends in databases

• Strictly organized, common schema • No uniform structure

of all data was created

Object based storage capacity by 20212

• Provide almost instantaneous feedback from analytics processing

• Enabling hardware innovations

USA 3000 DXTR1100 5

USA 4000 DXTR1100 21

Germany 23000 DXTR3100 12

Germany 17000 DXTR3100 34

Row table Column table

Row 1 USA 3000 DXTR1100 5 Column1 Column2 Column3 Column4

Row 3 DE 23000 DXTR3100 12 Germany 23000 DXTR3100 12

Row 4 DE 17000 DXTR3100 34 Germany 17000 DXTR3100 34

A new app comes along that needs to be ‘internet scale’. What if in

What if for your infrastructure…

Centralized applications (e.g. ERP) Decentralized applications (e.g. Web,

• Distributed computing platforms:

You might also like