0% found this document useful (0 votes)
62 views

Saying "Yes" To Nosql: Overview

The document discusses the history and evolution of NoSQL databases. It begins with E.F. Codd's development of the relational model in 1970 and the creation of the first relational database management systems. SQL was developed in the early 1970s as a query language for relational databases. One early use of the term "NoSQL" referred to non-SQL interfaces to relational databases. More recently, NoSQL has come to refer to non-relational and distributed database systems that often sacrifice consistency for availability and flexibility. The document provides examples of both early and modern NoSQL databases.

Uploaded by

valer
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Saying "Yes" To Nosql: Overview

The document discusses the history and evolution of NoSQL databases. It begins with E.F. Codd's development of the relational model in 1970 and the creation of the first relational database management systems. SQL was developed in the early 1970s as a query language for relational databases. One early use of the term "NoSQL" referred to non-SQL interfaces to relational databases. More recently, NoSQL has come to refer to non-relational and distributed database systems that often sacrifice consistency for availability and flexibility. The document provides examples of both early and modern NoSQL databases.

Uploaded by

valer
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Saying “Yes” to NoSQL

Overview:

The Relational Model

Structured Query Language (SQL)

The “original” NoSQL Movement

NoSQL Today

Inspiration for this talk:



Dr. Ford

Dr. Kaner

Dr. Menezes
The Relational Model

E.F. Codd: (1923-2003)



Developed the relational model while at IBM San Jose Research Laboratory

IBM Fellow 1976

Turing Award 1981

ACM Fellow 1994

British, by birth

Associations:

Raymond F. Boyce

Hugh Darwen

C.J. Date

Nikos Lorentzos

David McGoveran

Fabian Pascal

2
The Relational Model

“A Relational Model of Data for Large Shared Data Banks,” E.F. Codd, Communications of the ACM, Vol. 13,

No. 6, June, 1970.

“Further Normalization of the Data Base Relational Model,” E.F. Codd, Data Base Systems, Proceedings of

6th Courant Computer Science Symposium, May, 1971.

“Relational Completeness of Data Base Sublanguages,” E.F. Codd, Data Base Systems, Proceedings of 6th

Courant Computer Science Symposium, May, 1971.

Plus others…

3
The Relational Model

“Employee”
The basic data model:
ID Last-Name Date-of-Birth Job-Category

Relations, tuples, attributes, domains 15394
21621
Jones
Smith
11/3/75
6/24/69
Software
Management
17852 Brown 8/14/72 Hardware

Primary & foreign keys 32904 Carson 10/29/64 Software
:
:

Normal forms

Query model:

Relational algebra – cartesian product, selection, projection, union, set-difference

Relational calculus

A primary theme:

Physical data independence

4
Relational Database Management Systems (RDBMS)

Database Management Systems Based on the Relational Model:



System R – IBM research project (1974)

Ingres – University of California Berkeley (early 1970’s)

Oracle – Rational Software, now Oracle Corporation (1974)

SQL/DS – IBM’s first commercial RDBMS (1981)

Informix – Relational Database Systems, now IBM (1981)

DB2 – IBM (1984)

Sybase SQL Server – Sybase, now SAP (1988)

5
Structure Query Language (SQL)

SQL is a language for querying relational databases.

History:

Developed at IBM San Jose Research Laboratory, early 1970’s, for System R

Credited to Donald D. Chamberlin and Raymond F. Boyce

Based on relational algebra and tuple calculus

Originally called SEQUEL

Language Elements:

Clauses, expressions, predicates, queries, statements, transactions, operators, nesting etc.

select o_orderpriority, count(*) as order_count


from orders
where o_orderdate >= date '[DATE]‘ and o_orderdate < date '[DATE]' + interval '3' month
and exists (select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate)

group by o_orderpriority
order by o_orderpriority;

6
SQL and the Relational Model

A text search of E.F. Codd’s early papers for “SQL” (or SEQUEL) reveals:

7
Relational Query Languages

Other Relational Query Languages:



Datalog

QUEL

Query By Example (QBE)

SQL variations

shell scripts, with relational extensions

8
The NoSQL RDBMS

One of first uses of the phrase NoSQL is due to Carlo Strozzi, circa 1998.

NoSQL:

A fast, portable, open-source RDBMS

A derivative of the RDB database system (Walter Hobbs, RAND)

Not a full-function DBMS, per se, but a shell-level tool

User interface – Unix shell

Based on the “operator/stream paradigm”

http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20Page

9
Operator/stream Paradigm

Commonly referenced papers:



“The Next Generation,” E. Schaffer and M. Wolf, UNIX Review, March, 1991, page 24.

“The UNIX Shell as a Fourth Generation Language,” E. Schaffer and M. Wolf, Revolutionary Software.

Regarding Database Management Systems:

“…almost all are software prisons that you must get into and leave the power of UNIX behind.”

“…large, complex programs which degrade total system performance, especially when they are run in a multi-user

environment.”

“…put walls between the user and UNIX, and the power of UNIX is thrown away.”

In summary:

Relational model => yes

UNIX => big yes

Big, COTS, relational DBMS => no

SQL => no

10
The NoSQL RDBMS

Getting back to Strozzi’s NoSQL RDBMS:



Based on the relational model

Based on UNIX and shell scripts

Does not have an SQL interface

In that sense, and interpreted literally, NoSQL means “no sql,” i.e., we are not using the SQL language.

11
NoSQL Today

More recently:
 The term has taken on different meanings
 One common interpretation is “not only SQL”

Most modern NoSQL systems diverge from the relational model or standard RDBMS functionality:
The data model: relations documents
tuples vs. graphs
attributes key/values
domains
normalization

The query model:relational algebra graph traversal


tuple calculus vs. text search
map/reduce

The implementation: rigid schemas vs. flexible schemas


(schema-less)

ACID compliance vs. BASE

In that sense, NoSQL today is more commonly meant to be something like “non-relational”

12
NoSQL Today

Motivation for recent NoSQL systems is also quite varied:


 “…there are significant advantages to building our own storage solution at Google,” Chang et. al., 2006
 Scalability, performance, availability, flexibility
 Speculation - $$$, control

MySQL vs. MongoDB:


• http://www.youtube.com/watch?v=b2F-DItXtZs

How “big” is the NoSQL movement?

Will they eventually eliminate the need for relational databases?

Is this another grand conspiracy by the government and, you know, that guy….

13
NoSQL Today
(a partial, unrefined list)

Hbase Cassandra Hypertable Accumulo Amazon SimpleDB SciDB Stratosphere flare


Cloudata BigTable QD Technology SmartFocus KDI Alterian Cloudera C-Store
Vertica Qbase–MetaCarta OpenNeptune HPCC Mongo DB CouchDB Clusterpoint ServerTerrastore
Jackrabbit OrientDB Perservere CoudKit Djondb SchemaFreeDB SDB JasDB
RaptorDB ThruDB RavenDB DynamoDB Azure Table Storage Couchbase Server Riak
LevelDB Chordless GenieDB Scalaris Tokyo Kyoto Cabinet Tyrant Scalien
Berkeley DB Voldemort Dynomite KAI MemcacheDB Faircom C-Tree HamsterDB STSdb
Tarantool/Box Maxtable Pincaster RaptorDB TIBCO Active Spaces allegro-C nessDBHyperDex
Mnesia LightCloud Hibari BangDB OpenLDAP/MDB/Lightning Scality Redis
KaTree TomP2P Kumofs TreapDB NMDB luxio actord Keyspace
schema-free RAMCloud SubRecord Mo8onDb Dovetaildb JDBM Neo4 InfiniteGraph
Sones InfoGrid HyperGraphDB DEX GraphBase Trinity AllegroGraph BrightstarDB
Bigdata Meronymy OpenLink Virtuoso VertexDB FlockDB Execom IOG Java Univ Netwrk/Graph Framework
OpenRDF/Sesame Filament OWLim NetworkX iGraph Jena SPARQL OrientDb
ArangoDB AlchemyDB Soft NoSQL Systems Db4o Versant Objectivity Starcounter
ZODB Magma NEO PicoList siaqodb Sterling Morantex EyeDB
HSS Database FramerD Ninja Database Pro StupidDB KiokuDB Perl solution Durus
GigaSpaces Infinispan Queplix Hazelcast GridGain Galaxy SpaceBase JoafipCoherence
eXtremeScale MarkLogic Server EMC Documentum xDB eXist Sedna BaseX Qizx
Berkeley DB XML Xindice Tamino Globals Intersystems Cache GT.M EGTM
U2 OpenInsight Reality OpenQM ESENT jBASE MultiValue Lotus/Domino
eXtremeDB RDM Embedded ISIS Family Prevayler Yserial Vmware vFabric GemFire Btrieve
KirbyBase Tokutek Recutils FileDB Armadillo illuminate Correlation Database FluidDB
Fleet DB Twisted Storage Rindo Sherpa tin Dryad SkyNet Disco
MUMPS Adabas XAP In-Memory Grid eXtreme Scale MckoiDDB Mckoi SQL Database
Oracle Big Data Appliance Innostore FleetDB No-List KDI Perst IODB

14
NoSQL Today

It is easy to find diagrams that look like this:


• http://www.vertabelo.com/blog/vertabelo-news/jdd-2013-what-we-found-out-about-databases

It is easy to find diagrams that look like this:


• http://db-engines.com/en/ranking_categories

It is easy to find diagrams that look like this:


• http://www.odbms.org/2014/11/gartner-2014-magic-quadrant-operational-database-management-systems-2/

15
Primary NoSQL Categories

General Categories of NoSQL Systems:



Key/value store

(wide) Column store

Graph store

Document store

Compared to the relational model:



Query models are not as developed.

Distinction between abstraction & implementation is not as clear.

16
Key/Value Store DynamoDB
Azure Table Storage
Riak
Rdis
Aerospike
FoundationDB
“Dynamo: Amazon’s Highly Available Key-value Store,” DeCandia, G., et al., SOSP’07, 21 st ACM LevelDB
Berkeley DB
Oracle NoSQL Database
Symposium on Operating Systems Principles. GenieDb
BangDB
Chordless
Scalaris
Tokyo Cabinet/Tyrant
The basic data model: Scalien
Voldemort
Dynomite

Database is a collection of key/value pairs KAI
MemcacheDB
Faircom C-Tree

The key for each pair is unique LSM
KitaroDB
HamsterDB
No requirement for normalization STSdb
(and consequently dependency TarantoolBox
Primary operations: preservation or lossless join) Maxtable
Quasardb
Pincaster

insert(key,value) RaptorDB
TIBCO Active Spaces
Allegro-C

delete(key) nessDB
HyperDex

update(key,value) SharedHashFile
Symas LMDB
Sophia

lookup(key) PickleDB
Mnesia
LightCloud
Hibari
OpenLDAP
Genomu
Additional operations: BinaryRage
Elliptics

variations on the above, e.g., reverse lookup Dbreeze
RocksDB
TreodeDB

iterators (www.nosql-database.org
www.db-engines.com
www.wikipedia.com)

17
Wide Column Store

“Bigtable: A Distributed Storage System for Structured Data,” Chang, F., et al., OSDI’06: Seventh

Symposium on Operating System Design and implementation, 2006.

Accumulo
The basic data model: Amazon SimpleDB
BigTable

Database is a collection of key/value pairs Cassandra
Cloudata
Cloudera

Key consists of 3 parts – a row key, a column key, and a time-stamp (i.e., the version) Druid
Flink
Hbase

Flexible schema - the set of columns is not fixed, and may differ from row-to-row Hortonworks
HPCC
Hyupertable
KAI
KDI
One last column detail: Warning #1! MapR
MonetDB
OpenNeptune

Column key consists of two parts – a column family, and a qualifier Qbase
Splice Machine
Sqrrl
(www.nosql-database.org
www.db-engines.com
www.wikipedia.com)

18
Wide Column Store

Column families

Row key

Personal data Professional data

First Date of Job Date of


ID Last Name Salary Employer
Name Birth Category Hire

Column qualifiers

19
Wide Column Store

Personal data Professional data

First Date of Job Date of


ID Last Name Salary Employer
Name Birth Category Hire

First Middle Last Job Hourly


ID Employer
Name Name Name Category Rate

First Last Job


ID Salary Employer Group Seniority Bldg # Office #
Name Name Category

Last Job Date of Insurance Emergency


ID Salary Employer
Name Category Hire ID Contact

Medical data

One “table”

20
Wide Column Store

Row key
t1
t0

First Date of Job Date of


ID Last Name Salary Employer
Name Birth Category Hire

Personal data Professional data

One “row”

One “row” in a wide-column NoSQL database table


=
Many rows in several relations/tables in a relational database

21
Graph Store
AllegroGraph
ArangoDB
Bigdata
Neo4j - “The Neo Database – A Technology Introduction,” 2006. Bitsy
BrightstarDB
DEX/Sparksee
Execom IOG
Fallen *
Filament
The basic data model: FlockDB
GraphBase

Directed graphs Graphd
Horton
HyperGraphDB

Nodes & edges, with properties, i.e., “labels” IBM System G Native Store
InfiniteGraph
InfoGrid
jCoreDB Graph
MapGraph
Meronymy
Neo4j
Orly
OpenLink virtuoso
Oracle Spatial and Graph
Oracle NoSQL Datbase
OrientDB
OQGraph
Ontotext OWLIM
R2DF
ROIS
Sones GraphDB
SPARQLCity
Sqrrl Enterprise
Stardog
Teradata Aster
Titan
Trinity
TripleBit
VelocityGraph
VertexDB
WhiteDB
(www.nosql-database.org
www.db-engines.com
www.wikipedia.com)

22
Document Store

MongoDB - “How a Database Can Make Your Organization Faster, Better, Leaner,” February 2015.
AmisaDB
ArangoDB
BaseX
Cassandra
The basic data model: Cloudant
Clusterpoint
Couchbase

The general notion of a document – words, phrases, sentences, paragraphs, sections, CouchDB
Densodb
Djondb
subsections, footnotes, etc. EJDB
Elasticsearch

Flexible schema – subcomponent structure may be nested, and vary from eXist
FleetDB
iBoxDB
document-to-document. Inquire
JasDB
MarkLogic

Metadata – title, author, date, embedded tags, etc. MongoDB
MUMPS

Key/identifier. NeDB
NoSQL embedded db
OrientDB
RaptorDB
RavenDB
RethinkDB
One implementation detail: SDB
SisoDB
Terrastore

Formats vary greatly – PDF, XML, JSON, BSON, plain text, various binary, ThruDB
(www.nosql-database.org
scanned image. www.db-engines.com
www.wikipedia.com)

23
ACID vs. BASE

Database systems traditionally support ACID requirements:



Atomicity, Consistency, Isolation, Durability

In a distributed web applications the focus shifts to:



Consistency, Availability, Partition tolerance

CAP theorem - At most two of the above can be enforced at any given time.

Conjecture – Eric Brewer, ACM Symposium on the Principles of Distributed Computing, 2000.

Proved – Seth Gilbert & Nancy Lynch, ACM SIGACT News, 2002.

Reducing consistency, at least temporarily, maintains the other two.

24
ACID vs. BASE

Thus, distributed NoSQL systems are typically said to support some form of BASE:

Basic Availability

Soft state

Eventual consistency*

“We’d really like everything to be structured, consistent and harmonious,…, but what we are faced with is a

little bit of punk-style anarchy. And actually, whilst it might scare our grandmothers, it’s OK...”

-Julian Browne

https://www.youtube.com/watch?v=pOe9PJrbo0s

25

You might also like