Chap 3. NoSQL
Chap 3. NoSQL
NoSQL
Kanchan Doke
Asst. Professor, Dept. of Computer Engineering, B.V.C.O.E
Contents
2
Introduction
Business drivers
NoSQL Data Architecture Pattern
Key-Value Store
Graph Store
Column Family store
Document Store
RDBMS:therelational database
management system.
Attributes
Tuples Name
9
Need of NoSQL
6
Explosion of social media sites (Facebook, Twitter, Google etc.) with large data
needs.
The system response time becomes slow when you use RDBMS for massive
volumes of data.
Solution:
"scale up" our systems by upgrading our existing hardware. This process is
expensive.
"scaling out" is to
distribute database
load on multiple hosts
whenever the load
increases.
Kanchan Doke, Computer Dept, B.V.C.O.E.
4 Marks
CAP Theorem
10
Consistency –
All the servers in the system will have the same data so
anyone using the system will get the same copy regardless
of which server answers their request.
Availability –
The system will always respond to a request (even if it's not
the latest data or consistent across the system or just a
message saying the system isn't working)
Partition Tolerance –
The system continues to operate as a whole even if
individual servers fail or can't be reached..
Kanchan Doke, Computer Dept, B.V.C.O.E.
10 Marks
29
NoSQL Data Architecture Pattern
7
Example: Value
Key
{ {
"name": "Phil", "age": 26,
"name": "Phil",
"status": "A",
"age": 26,
"citiesVisited" : ["Chicago", "LA", "San Francisco"]
"status": "A"
} }
Documents can have differences in their attributes
But belongs to the same collection
A document can be
PDF
Microsoft word doc
XML
JSON file.
Kanchan Doke, Computer Dept, B.V.C.O.E.
Document Store….eg.
57
Examples:
MongoDB
CouchDB
DocumentDB
row
key value
column
column
row
key
...
Kanchan Doke, Computer Dept, B.V.C.O.E.
71
Relational databases
Tables (relations) consist of rows and columns
Columns have a type. Type information is stored once per column.
A rows contains just values for a record (no type information)
All rows in a table have the same columns and are homogenous
table
column type column type column type column typ column type
row e
key value value value value value
row
Example rows: key value value value value value
„foo“, „bar“, 25, 35.63
„bar“, „baz“, 42, -673.342
row0 header column0 value column1 value column2 value column3 value
row1 header column0 value column1 value column2 value column3 value
row2 header column0 value column1 value column2 value column3 value
Indexes
Key RowID Indexes on high-cardinality columns
1 0001B008D23A671A make accessing a single row very fast
2 0001B008D23A671B
3 0001B008D23A671C Key Fname Lname State Zip Phone Age Sex
column0 values
column0
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17
filesize
column1 values
column1
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17
filesize
Graph stores are ideal when you have many items that are related to each other in
complex ways and these relationships have properties.
Destination
web page Destination
web page
Property: URL
Property: URL
Kanchan Doke, Computer Dept, B.V.C.O.E.
97
Linking external data
When stored in a graph store, the two statements are independent and
may even be stored on different systems around the world.
Link metadata
Group ID the graph belongs to
The date and time the node
was created or last updated
Rules and inference are used when you want to run queries on
complex structures such as class libraries, taxonomies and rule-based
systems.
Integrating linked data is used with large amounts of open linked data
to do realtime integration and build mashups without storing data.
Kanchan Doke, Computer Dept, B.V.C.O.E.
99
Link analysis
Sometimes the best way to solve a business problem is to traverse
graph data.
As you add new contacts to your friends list, you might want to know if
you have any mutual friends.
need to get a list of your friends, and for each one of them get a list of
their friends (friends-of-friends).
Relational database :After the initial pass of listing out your
friends, the system performance drops dramatically!!!
Game Data-
Required backend dataset that need to scale quickly
Share and store high score of all users and data of
game for each player.
Open Linked Data-
Organization can publish dataset that can be
integrated by system
a. The left panel shows a shared RAM architecture, where many CPUs access a single
shared RAM over a high-speed bus. This system is ideal for large graph traversal.
b. The middle panel shows a shared disk system, where processors have independent
RAM but share disk using a storage area network (SAN).
c. The right panel shows an architecture used in big data solutions: cache-friendly, using
low-cost commodity hardware, and a shared-nothing architecture.
Kanchan Doke, Computer Dept, B.V.C.O.E.
Analyzing big data with a shared-nothing
117
architecture
Server 4
hash(“redis”) = 200
400
hash(“charsyam”) = 450
Key
hash(“udemy”) = 50 450
130