NoSQL databases - An introduction

An introduction to
NoSQL databases
POOYAN MEHRPARVAR
DEC 2014To get more references visit:
http://bit.ly/nosql_srbiau
1

What is covered in this
presentation:
 A brief history of data bases
 NoSQL why, what and when?
 Aggregate Data Models
 BASE vs ACID
 CAP theorem
 Polyglot persistence : the future of database systems
2

Why did we choose this topic?
 Is NoSQL replacing traditional databases?
 Where should we use NoSQL databases?
 Should we use NoSQL in any kind of projects?
3

A brief history of databases 4

Relational databases
Benefits of Relational databases:
Designed for all purposes
ACID
Strong consistancy, concurrency,
recovery
Mathematical background
Standard Query language (SQL)
Lots of tools to use with i.e: Reporting
services, entity frameworks, ...
Vertical scaling (up scaling)
Object / Object-relational databases
were not practical. Mainly because of
Impedance mismatch
5

Era of Distributed Computing
But...
 Relational databases were not built for
distributed applications.
Because...
 Joins are expensive
 Hard to scale horizontally
 Impedance mismatch occurs
 Expensive (product cost, hardware,
Maintenance)
6

Era of Distributed Computing
But...
 Relational databases were not built for
distributed applications.
Because...
 Joins are expensive
 Hard to scale horizontally
 Impedance mismatch occurs
 Expensive (product cost, hardware,
Maintenance)
And....
It’s weak in:
 Speed (performance)
 High availability
 Partition tolerance
7

Rise of Big data
Three V(s) of Bigdata:
 Volume
 Velocity
 Variety
8

Rise of Big data
 Wallmart: 1 million transactions per
hour
 Facebook: 40 billion photos
 People are talking about petabytes
today
10

NoSQL why, what and when?
 Google & Amazon bulit their own databases (Big table & Dynamo)
 Facebook invented Cassandra and is using thousands of them
 #NoSQL was a twitter hashtag for a conference in 2009
 The name doesn’t indicate its characteristics
 There is no strict defenition for NoSQL databases
 There are more than 150 NoSQL databases (nosql-database.org)
11

Characteristics of NoSQL databases
 Non relational
 Cluster friendly
 Schema-less
 21 century web
 Open-source
12

Characteristics of NoSQL databases
NoSQL avoids:
 Overhead of ACID transactions
 Complexity of SQL query
 Burden of up-front schema design
 DBA presence
 Transactions (It should be handled at
application layer)
Provides:
 Easy and frequent changes to DB
 Horizontal scaling (scaling out)
 Solution to Impedance mismatch
 Fast development
13

NoSQL is getting more & more popular 14

What is a schema-less datamodel?
In relational Databases:
 You can’t add a record which does not fit
the schema
 You need to add NULLs to unused items in
a row
 We should consider the datatypes. i.e :
you can’t add a stirng to an interger field
 You can’t add multiple items in a field
(You should create another table:
primary-key, foreign key, joins,
normalization, ... !!!)
15

What is a schema-less datamodel?
In NoSQL Databases:
 There is no schema to consider
 There is no unused cell
 There is no datatype (implicit)
 Most of considerations are done in
application layer
 We gather all items in an aggregate
(document)
16

What is Aggregation?
 The term comes from Domain Driven Design
 Shared nothing architecture
 An aggregate is a cluster of domain objects that can be treated as
a single unit
 Aggregates are the basic element of transfer of data storage - you
request to load or save whole aggregates
 Transactions should not cross aggregate boundaries
 This mechanism reduces the join operations to a minimal level
17

Aggregate Data Models
NoSQL databases are classified in four major datamodels:
 Key-value
 Document
 Column family
 Graph
Each DB has its own query language
21

Key-value data model
 The main idea is the use of a hash table
 Access data (values) by strings called keys
 Data has no required format – data may have any format
 Data model: (key, value) pairs
 Basic Operations:
Insert(key,value), Fetch(key),Update(key), Delete(key)
22

Key-value data model
 “Value” is stored as a “blob”
- Without caring or knowing what is inside
- Application is responsible for understanding the
data
 Main observation from Amazon (using Dynamo)
– “There are many services on Amazon’s platform
that only need primary-key access to a data
store.”
E.g. Best seller lists, shopping carts, customer
preferences, session management, sales rank,
product catalog
23

Column family data model
 The column is lowest/smallest instance of
data.
 It is a tuple that contains a name, a value
and a timestamp
24

Column family data model
Some statistics about Facebook Search (using Cassandra)
 MySQL > 50 GB Data
 Writes Average : ~300 ms
 Reads Average : ~350 ms
 Rewritten with Cassandra > 50 GB Data
 Writes Average : 0.12 ms
 Reads Average : 15 ms
25

Graph data model
 Based on Graph Theory.
 Scale vertically, no clustering.
 You can use graph algorithms easily
 Transactions
 ACID
26

Document-based datamodel
 Usually JSON like interchange model.
 Query Model: JavaScript-like or custom.
 Aggregations: Map/Reduce
 Indexes are done via B-Trees.
 unlike simple key-value stores, both keys
and values are fully searchable in
document databases.
27

Overview of a Document-based datamodel 29

A sample MongoDB query 33
MySQL:
MongoDB:
There is no join in MongoDB query
Because we are using an aggregate data model

What we need?
 We need a distributed database system having such
features:
 – Fault tolerance
 – High availability
 – Consistency
 – Scalability
34

What we need?
 We need a distributed database system having such
features:
 – Fault tolerance
 – High availability
 – Consistency
 – Scalability
Which is impossible!!!
According to CAP theorem
35

Should we...?
 In some cases getting an answer quickly is
more important than getting a correct
answer
 By giving up ACID properties, one can
achieve higher performance and scalability.
 Any data store can achieve Atomicity,
Isolation and Durability but do you always
need consistency?
 Maybe we should implement Asynchronous
Inserts and updates and should not wait for
confirmation?
36

BASE
Almost the opposite of ACID.
 Basically available: Nodes in the a distributed
environment can go down, but the whole
system shouldn’t be affected.
 Soft State (scalable): The state of the system and
data changes over time.
 Eventual Consistency: Given enough time, data
will be consistent across the distributed system.
37

CAP theorem
Consistency: Clients should
read the same data. There
are many levels of
consistency.
o Strict Consistency – RDBMS.
o Tunable Consistency –
Cassandra.
o Eventual Consistency –
Mongodb.
Availability: Data to be
available.
Partial Tolerance: Data to
be partitioned across
network segments due to
network failures.
39

CAP theorem in different SQL/NoSQL
databases
We can not achieve all the three items
In distributed database systems (center) Proven by Nancy Lynch et al. MIT labs.
40

CAP theorem : A simple proof 41

Polyglot persistence : the future of database
systems
 Future databases are the combination of SQL & NoSQL
 We still need relational databases
45

New approach to database systems:
 Integrated databases has its own
advantages and disadvantages
 With the advent of webservices it
seems now it’s the time to switch
to decentralized data bases
 Single point of failure, Bottlenecks
would be avoided
 Clustering & replication would be
much easier
47

Conclusion:
Before you choose NoSQL as a solution:
Consider these items, ...
 Needs a precise evaluation, Maybe NoSQL is not the right thing
 Needs to read lots of case study papers
 Aggregation is totally a different approach
 NoSQL is still immature
 Needs lots of hours of studing and working to expert in a particular
NoSQL db
 There is no standard query language
 Most of controls have to be implemented at the application layer
 Relational databases are still the strongest in transactional environments
and provide the best solutions in consistancy and concurrency control
48

Conclusion:
Before you choose NoSQL as a solution:
49

NewSQL a brief defenition
 NewSQL group was founded in 2011
Michael Stonebraker’s Definition …
 SQL as the primary interface.
 ACID support for transactions
 Non-locking concurrency control.
 High per-node performance.
 Parallel, shared-nothing architecture – each node is
independent and self-sufficient – do not share memory or storage
51

Technology is still in its infancy...
In 2000 no one even thought database
systems could be a hot topic again!
To get more references visit:
http://bit.ly/nosql_srbiau
52

References:
 NoSQL distilled, Martin Fowler
 Martin Fowler’s presentation at Goto conference
 www.mongodb.org
53

NoSQL databases - An introduction

More Related Content

What's hot(20)

Similar to NoSQL databases - An introduction(20)

Recently uploaded(20)

NoSQL databases - An introduction

Editor's Notes