0% found this document useful (0 votes)
5 views

NoSQL Databases

The presentation covers the evolution of databases, focusing on the rise of NoSQL as a solution for big data challenges, including its characteristics, data models, and the differences between ACID and BASE principles. It discusses the CAP theorem and the need for polyglot persistence, highlighting that while NoSQL offers flexibility and scalability, relational databases remain strong in transactional environments. The conclusion emphasizes careful evaluation before adopting NoSQL, as it may not always be the right choice for every project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

NoSQL Databases

The presentation covers the evolution of databases, focusing on the rise of NoSQL as a solution for big data challenges, including its characteristics, data models, and the differences between ACID and BASE principles. It discusses the CAP theorem and the need for polyglot persistence, highlighting that while NoSQL offers flexibility and scalability, relational databases remain strong in transactional environments. The conclusion emphasizes careful evaluation before adopting NoSQL, as it may not always be the right choice for every project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

2

What is covered in this


presentation:
 A brief history of data bases
 NoSQL why, what and when?
 Aggregate Data Models
 BASE vs ACID
 CAP theorem
 Polyglot persistence : the future of database systems
Why did we choose this topic? 3

 Is NoSQL replacing traditional databases?


 Where should we use NoSQL databases?
 Should we use NoSQL in any kind of projects?
A brief history of databases 4
Relational databases 5

Benefits of Relational databases:

 Designed for all purposes


 ACID
 Strong consistancy, concurrency,
recovery
 Mathematical background
 Standard Query language (SQL)
 Lots of tools to use with i.e: Reporting
services, entity frameworks, ...
Object / Object-relational databases
 Vertical scaling (up scaling) were not practical. Mainly because of
Impedance mismatch
Era of Distributed Computing 6

But...
 Relational databases were not built for
distributed applications.

Because...
 Joins are expensive
 Hard to scale horizontally
 Impedance mismatch occurs
 Expensive (product cost, hardware,
Maintenance)
Era of Distributed Computing 7

But...
 Relational databases were not built for
distributed applications.

Because...
 Joins are expensive
 Hard to scale horizontally
 Impedance mismatch occurs
 Expensive (product cost, hardware,
Maintenance)

And....
It’s weak in:
 Speed (performance)
 High availability
 Partition tolerance
Rise of Big data 8

Three V(s) of Bigdata:


 Volume
 Velocity
 Variety
Rise of Big data 9
Rise of Big data 10

 Wallmart: 1 million transactions per


hour
 Facebook: 40 billion photos
 People are talking about petabytes
today
NoSQL why, what and when? 11

 Google & Amazon bulit their own databases (Big table & Dynamo)
 Facebook invented Cassandra and is using thousands of them
 #NoSQL was a twitter hashtag for a conference in 2009
 The name doesn’t indicate its characteristics
 There is no strict defenition for NoSQL databases
 There are more than 150 NoSQL databases (nosql-database.org)
Characteristics of NoSQL databases 12

 Non relational
 Cluster friendly
 Schema-less
 21 century web
 Open-source
Characteristics of NoSQL databases 13

NoSQL avoids:
 Overhead of ACID transactions
 Complexity of SQL query
 Burden of up-front schema design
 DBA presence
 Transactions (It should be handled at
application layer)

Provides:
 Easy and frequent changes to DB
 Horizontal scaling (scaling out)
 Solution to Impedance mismatch
 Fast development
NoSQL is getting more & more popular 14
What is a schema-less datamodel? 15

In relational Databases:

 You can’t add a record which does not fit


the schema
 You need to add NULLs to unused items in
a row
 We should consider the datatypes. i.e :
you can’t add a stirng to an interger field
 You can’t add multiple items in a field
(You should create another table:
primary-key, foreign key, joins,
normalization, ... !!!)
What is a schema-less datamodel? 16

In NoSQL Databases:

 There is no schema to consider


 There is no unused cell
 There is no datatype (implicit)
 Most of considerations are done in
application layer
 We gather all items in an aggregate
(document)
What is Aggregation? 17

 The term comes from Domain Driven Design


 Shared nothing architecture
 An aggregate is a cluster of domain objects that can be treated as
a single unit
 Aggregates are the basic element of transfer of data storage - you
request to load or save whole aggregates
 Transactions should not cross aggregate boundaries
 This mechanism reduces the join operations to a minimal level
What is Aggregation? 18
What is Aggregation? 19
What is Aggregation? 20
Aggregate Data Models 21

NoSQL databases are classified in four major datamodels:

 Key-value
 Document
 Column family

 Graph

Each DB has its own query language


Key-value data model 22

 The main idea is the use of a hash table


 Access data (values) by strings called keys
 Data has no required format – data may have any format
 Data model: (key, value) pairs
 Basic Operations:
Insert(key,value), Fetch(key),Update(key), Delete(key)
Key-value data model 23

 “Value” is stored as a “blob”


- Without caring or knowing what is inside
- Application is responsible for understanding the
data

 Main observation from Amazon (using Dynamo)


– “There are many services on Amazon’s platform
that only need primary-key access to a data
store.”
E.g. Best seller lists, shopping carts, customer
preferences, session management, sales rank,
product catalog
Column family data model 24

 The column is lowest/smallest instance of


data.
 It is a tuple that contains a name, a value
and a timestamp
Column family data model 25

Some statistics about Facebook Search (using Cassandra)

 MySQL > 50 GB Data


 Writes Average : ~300 ms
 Reads Average : ~350 ms

 Rewritten with Cassandra > 50 GB Data


 Writes Average : 0.12 ms
 Reads Average : 15 ms
Graph data model 26

 Based on Graph Theory.


 Scale vertically, no clustering.
 You can use graph algorithms easily
 Transactions
 ACID
Document-based datamodel 27

 Usually JSON like interchange model.


 Query Model: JavaScript-like or custom.
 Aggregations: Map/Reduce
 Indexes are done via B-Trees.
 unlike simple key-value stores, both keys
and values are fully searchable in
document databases.
Document-based datamodel 28
Overview of a Document-based datamodel 29
Overview of a Document-based datamodel 30
Overview of a Document-based datamodel 31
Overview of a Document-based datamodel 32
A sample MongoDB query 33

MySQL:

MongoDB:

There is no join in MongoDB query


Because we are using an aggregate data model
What we need? 34

 We need a distributed database system having such


features:

 – Fault tolerance
 – High availability
 – Consistency
 – Scalability
What we need? 35

 We need a distributed database system having such


features:

 – Fault tolerance
 – High availability
 – Consistency
 – Scalability

Which is impossible!!!
According to CAP theorem
Should we...? 36

 In some cases getting an answer quickly is


more important than getting a correct
answer

 By giving up ACID properties, one can


achieve higher performance and scalability.

 Any data store can achieve Atomicity,


Isolation and Durability but do you always
need consistency?

 Maybe we should implement Asynchronous


Inserts and updates and should not wait for
confirmation?
BASE 37

Almost the opposite of ACID.


 Basically available: Nodes in the a distributed
environment can go down, but the whole
system shouldn’t be affected.
 Soft State (scalable): The state of the system and
data changes over time.
 Eventual Consistency: Given enough time, data
will be consistent across the distributed system.
BASE vs ACID 38
CAP theorem 39
 Consistency: Clients should
read the same data. There
are many levels of
consistency.

o Strict Consistency – RDBMS.


o Tunable Consistency –
Cassandra.
o Eventual Consistency –
Mongodb.

 Availability: Data to be
available.
 Partial Tolerance: Data to
be partitioned across
network segments due to
network failures.
CAP theorem in different SQL/NoSQL 40
databases

We can not achieve all the three items


In distributed database systems (center) Proven by Nancy Lynch et al. MIT labs.
CAP theorem : A simple proof 41
CAP theorem : A simple proof 42
CAP theorem : A simple proof 43
Which data model to choose 44
Polyglot persistence : the future of database 45
systems

 Future databases are the combination of SQL & NoSQL


 We still need relational databases
Overview of a polygot db 46
New approach to database systems: 47

 Integrated databases has its own


advantages and disadvantages

 With the advent of webservices it


seems now it’s the time to switch
to decentralized data bases

 Single point of failure, Bottlenecks


would be avoided

 Clustering & replication would be


much easier
Conclusion: 48
Before you choose NoSQL as a solution:
Consider these items, ...

 Needs a precise evaluation, Maybe NoSQL is not the right thing


 Needs to read lots of case study papers
 Aggregation is totally a different approach
 NoSQL is still immature
 Needs lots of hours of studing and working to expert in a particular
NoSQL db
 There is no standard query language
 Most of controls have to be implemented at the application layer
 Relational databases are still the strongest in transactional environments
and provide the best solutions in consistancy and concurrency control
Conclusion: 49
Before you choose NoSQL as a solution:
Say hello to... 50
NewSQL a brief defenition 51

 NewSQL group was founded in 2011

Michael Stonebraker’s Definition …

 SQL as the primary interface.


 ACID support for transactions
 Non-locking concurrency control.
 High per-node performance.
 Parallel, shared-nothing architecture – each node is
independent and self-sufficient – do not share memory or storage
52

Technology is still in its infancy...

In 2000 no one even thought database


systems could be a hot topic again!

To get more references visit:


http://bit.ly/nosql_srbiau
References: 53

 NoSQL distilled, Martin Fowler


 Martin Fowler’s presentation at Goto conference
 www.mongodb.org

You might also like