0% found this document useful (0 votes)
3K views

NoSQL - Database Revolution

The document summarizes the evolution of database management systems from punch cards to modern DBMS technologies. It discusses how punch cards were initially used for data storage and retrieval in the late 19th century. In the 1960s, the first navigational DBMS like IMS and IDS were developed. In 1970, Codd introduced the relational model which led to the emergence of relational DBMS and SQL. In the 2000s, RDBMS were widely adopted. However, new applications in the 2000s put pressure on RDBMS to scale, leading to the rise of NoSQL distributed databases to better support new applications.

Uploaded by

baskarchennai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

NoSQL - Database Revolution

The document summarizes the evolution of database management systems from punch cards to modern DBMS technologies. It discusses how punch cards were initially used for data storage and retrieval in the late 19th century. In the 1960s, the first navigational DBMS like IMS and IDS were developed. In 1970, Codd introduced the relational model which led to the emergence of relational DBMS and SQL. In the 2000s, RDBMS were widely adopted. However, new applications in the 2000s put pressure on RDBMS to scale, leading to the rise of NoSQL distributed databases to better support new applications.

Uploaded by

baskarchennai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Origin: Punch Cards to DBMS

Let's revisit the origin of DBMS to align our understanding before you step into NoSQL.

 Data storage and retrieval was a key focus area along with the evolution of computers.
 Towards the end of 19th Century, 'Punch cards' were leveraged for input, output, and data
storage. It provided a faster approach to key-in data, and to retrieve it.
 Later in 1960's, two famous DBMS was launched. IBM came up with Integrated Management
System (IMS), written for the Apollo program on System/360 and Integrated Database System
(IDS) by Charles W. Bachman.
 Both IDS and IMS were called as Navigational DBMS.

In 1970, E.F. Codd envisioned a new model of DBMS through his paper titled A Relational Model of
Data for Large Shared Data Banks which paved the way for the emergence of Relational DBMS
(RDBMS).

 RDBMS formulated a new methodology for storing data and processing large databases.

 The records (data) would be stored in 'table' with fixed-length records unlike the free-form list of
linked records in IDS, IMS.

 Later, databases like Ingres, query language like SQL got evolved.

The nuances and benefits of RDBMS had a wider reach, resulting in buy-in from different vendors, setting
a stage for an era of Database wars.

Many RDBMS such as Sybase, Microsoft SQL Server, Informix, MySQL, DB2, Oracle got
launched around the same time claiming better

 Performance

 Availability

 More functionalities

 Cost of storage

 Economy of usage.

With no alternates, the roots of RDBMS got completely entrenched by early 2000s.

Later in 2005, the difference and change in architectures design of applications between the client-
server era and the era of massive web-scale applications triggered lot of pressure on the

 Level of usage
 Volume of data considered

 knack of handling/monitoring change

on RDBMS that couldn't upscale through incremental innovation.

This started the era of Distributed Non-Relational Database Management System, later coined
as 'NoSQL', which was more aligned to New-Age applications.

'NoSQL' grabbed the attention on the database system that broke the practice of the traditional SQL
database.NoSQL - Journey Ahead!

Many big enterprises have started adopting alternative databases such as NoSQL moving away from their
traditional empire of RDBMS now.
As a result, they can save money, innovate more rapidly, yield better productivity and quicker ROI.
In this course, you will explore more into NoSQL and understand in detail about its types, modus
operandi, storage model and finally knowing how to make the right choice of NoSQL.
As you progress, there would be a parallel analogy with RDBMS to induce better understanding.

Eventual Consistency

 Asynchronous inserts and updates.


 Eventually all changes will propagate through the system.
 Eventually the updates reaches every node.

BASE
Basically Available Possibilities of faults but not fault of whole system.
Soft State Copies of data may be inconsistent.
Eventual Consistency Copies become consistent Eventually.

RDBMS - Vertical Scaling

 Architecture design runs well on a single machine.


 To handle larger volumes of operations is to upgrade the machine with a faster processor or more
memory.
 There is a limitation to size/level of scaling.

NoSQL - Horizontal Scaling

 NoSQL databases are intended to run on clusters of comparatively low-specification servers.


 To handle more data, add more servers to the cluster.
 Calibrated to operate with full throttle even with low-cost hardware.
 Relatively cheaper approach to handle increased
o Number of operations
o Size of the data.

RDBMS - High Maintenance

 Maintaining high-end RDBMS systems is expensive and requires trained workforce for


database management.

NoSQL - Low Maintenance

 NoSQL databases require minimal management, and it supports many features, which makes the
need for administration and tuning requirements becomes less. This covers
o Automatic repair
o Easier data distribution
o Simpler data models

RDBMS - Rigid Data Model:

 RDBMS requires data in structured format as per defined data model.

 As change management is a big headache in SQL with a strong dependency on primary/foreign


keys, ad-hoc data insertion becomes tougher.

NoSQL - No Schema/Data model:

 NoSQL database is schema-less so that data can be inserted into a database with ease, even
without any predefined schema.

 The format or data model could be changed anytime, without application disruption.

RDBMS - Separate Hardware

 The caching in typical RDBMS database requires separate infrastructure.


 As there is overhead, the logic of retrieval involves little delay.

NoSQL - Integrated:

 NoSQL database supports caching in system memory, so it increases data output performance.

Changing pH measure

Here you will get to know about core principles of DB processing.


RDBMS - ACID
 Atomicity: If any one element of a transaction fails then the entire transaction fails.
 Consistency: The transaction must adhere to all protocols/rules at all times.
 Isolation: No transaction has access to any other transaction that is in an intermediate or
unfinished state.
 Durability: Once the transaction is complete, it will continue to persist as complete and cannot
be undone.
NoSQL - BASE
 Basically Available: System does guarantee the availability of the data as per CAP
(Consistency, Availability and Partition Tolerance) Theorem.
 Soft state: The state of the system could change over time, so even during times without input.
 Eventual consistency: The system would eventually become consistent as it stops receiving
input.

Data replication is all about having your data geo-distributed through a non-interactive and reliable
process as a contingency measure to avoid loss of data.

Most of the NoSQL systems have data replication feature built-in.

Data replication in RDBMS is little difficult as they have not adopted Horizontal scaling.

NoSQL data replication is homogenous, in the sense data cannot be replicated from a NoSQL system to
RDBMS SQL system.

Three types of Data Replication include -

 Sharding Replication

 Master-Slave Replication

 Peer-to-Peer Replication.

Peer-to-Peer Replication Model

 Data is replicated across many nodes.


 Each node is equal.
 All nodes accept reads and writes.

Consistency

 Applicable on single Key as it involves get, put, or delete.


 Although optimistic writes could be performed, they are expensive to implement.
 In Distributed key-value store implementations like Riak, values will be replicated to other
nodes.

Buckets are like namespace keys, which reduces key collisions. Example - All Student keys may reside
in the Student bucket.

 With Buckets - 'write' is considered good only when the data is consistent across all the nodes
where the data is stored.

'Buckets' in Key-Value Dbs are similar to 'Tables' in RDBMS.

 Different key-value databases have different specifications of transactions, and they implement
transactions in different ways.
 There is No guarantee on the write operations.

 Riak has to write tolerance by leveraging quorum concept using W value—replication factor.

Example: Consider a Riak cluster,

 Replication factor of 7 (determines # of data copies be maintained across multiple nodes)

 Value of W value 4.

Then write is only reported as successful, when writes happen in at least four nodes.

 Design of 'Key' plays prominent role and this is achieved by

o using some Algorithm

o with user inputs (user-id, name, email-id)

o from timestamps/external data.

 Could be queried by the key/value associated with it.

 Querying based on an attribute of value column is not possible from DB.

 In some DBs, the value of the key is retrieved using the fetch API. Ex: Riak.

 Scalability of Key-Value database is achieved through sharding.


 In sharding, the value of the key determines on which node the key is stored.

For example, say you are sharding by the first character of the key.

if the key is k76151487d, which starts with an 'k', will be sent to a different node
than the key dgh396542.

Benefits

 Increase performance as more nodes can be added to the cluster.

Impact

 If the node used to store 'f' goes down, the data stored on that node becomes unavailable, nor
can new data be written with keys that start with f.

How to overcome this issue?

Riak DB leverages CAP Theorem to improve its scalability:

 N - # of nodes to store the key-value replicas.


 R - # of nodes to fetch data from.
 W - # of nodes to write data to.

For example, consider 5-node Riak cluster. And if you configure,

N = 3 => all data should be replicated to at least 3 nodes.


R = 2 => Any 2 nodes must respond to GET request to be considered successful.
W = 2 => PUT request is written to 2 nodes before the write is considered successful.

 Best practice is to choose a W value to match your consistency needs during bucket creation.

 Every web session is assigned a unique session-id value, which the applications store on
disk(logfile) / DB(RDBMS).

 Moving this to key-value DB will improve performance to great extent as every info about the
session could be

o Stored by a PUT request

o Retrieved using GET request

 The operation is very fast, as session info are stored in a single object.

Usage:

 Memcached for caching web applications and microapps,

 Riak when availability is an important criteria.

Key-Value would be best-fit to store user profile

 userId

 username and

 additional attributes

and user preferences

 language

 country

 timezone and

 user favorites and so on
All these information could be stored in a single object, so getting preferences of a user would just take
single GET operation.

On similar lines, product profiles could be stored as well.

 All e-commerce websites have shopping carts deeply linked with the user.
 The shopping cart details should be available at all times, across different browsers, devices,
machines, and sessions.
 Key-Value would be best-fit for this scenario, with all shopping related information put into 'value'
where the 'key' is the userid.

Usage

 Amazon uses its DynamoDB for storing its user's shopping cart details.

Key-value databases would not be the best fit in the few scenarios highlighted below.

 Relationships among Multiple-Data - There exist relationships between different sets of data or
correlation and the data between different sets of keys.

 Multi-operation Transactions -

If you are storing many keys and when there is a failure to save one of the keys, and you want to roll back
the rest of the operations.

 Query Data by 'value' - Searching the 'keys' based on some info found in the 'value' part of the
key-value pairs. Some exceptions include Riak Search or indexing engines such as Lucene or
Solr.

 Operation by groups - As operations are confined to one key at a time, there exists no way to
run several keys simultaneously.

 Developed by Facebook initially for inbox search feature and later handed over to Apache.

 Offers high scalability, availability and overcomes single point of failure problem.

 Writes at amazing speed without compromising on reading efficiency.

 Key variables providing a variety of outcomes:

N - # of copies of each data item.

W - # of copies of the data item that must be written.


R - # of copies while reading the data item.

 Replication Factor: Determines # data copies maintained across multiple nodes.

 Read/Write Consistency - Key configuration parameters during read/write operation.

ALL: to all nodes

ONE|TWO|THREE: a specified number of nodes.

QUORUM: to set nodes.

EACH_QUORUM: a set of nodes in each data center

LOCAL_QUORUM: a set of nodes in current data center only.

ANY: to any node.

 Open source storage engine

 Designed to support Hadoop ecosystem tools (Cloudera Impala, Apache Spark, and
MapReduce).

 Distributes data using horizontal partitioning.

 Supports low-latency random access and efficient analytical access patterns.

 Offers API for row-level inserts/updates/deletes.

There exist variations on the implementation of columnar paradigm within both traditional relational
systems and other NoSQL systems.

 SAP HANA (in-memory DB) provides support for column/row orientation on a table-by-table


basis.

 Oracle 12c “Database in Memory” incorporates column store.

 Oracle Exadata leverages Enhanced Hybrid Columnar Compression (EHCC) to achieve


a best-of-both-worlds combination of row and column storage technologies.
o Rows are stored within compression units of 1 MB reducing overhead for
performing row-level modifications.

o Columns stored together within smaller 8K blocks yielding high levels of


compression.

A Graph Store similar to OLAP in RDMS is – Graph Database


An XML document which satisfies the rules specified by W3C is – Well Formed XML
An API used in creating a Key-Value Pair in Key-Value datastore is - put(key, value)
An RDBMS equivalent component for a collection in a document database is – Table
An RDBMS equivalent component for a document in a document database is – Row
Document Store database contains data in the format of – All
HBase main server components include all except – HbaseMemStore
Graph databases are generally built for use with – OLTP
Cypher query language is associated with – Neo4j
Neo4j architecture is a self-driven and independent architecture because of – Both
The major components of a graph include the following, except – JSON
The key-value pair data storages include all, except - Network Attached Storage
In RDBMS, the attributes of an entity are stored in – Columns
In the Master-Slave Replication model, the slave node services – Read & Write Operations
in Key-Value Databases are similar to 'Tables' in RDBMS. – Buckets
In a Column Data Model, the number of columns that a row can have – Varies
Which among the following is the correct API call in Key-Value datastore? – put(key,value)
Which of the following factors influence(s) the choice of replication model? – All
Which Replication model supports database read and write operations in all the nodes? – Peer to Peer
Which Replication model has the strongest resiliency power? – All

_________distributes different data across multiple servers. – Sharding


_________is the syntax for retrieving specific elements from an XML document. – Xpath
A Riak Convergent Replicated Data Type (CRDT) includes ________. Maps/Sets/Counters
A column-database is used to store ________ versions of each cell. – Multiple
Cassandra has properties of both __________ and ____________ . Google Bigtable / Amazon Dynamo
Hash Table Design is similar to __________ - Key-Value database
In MongoDB, data is represented as a collection of __________ - JSON Documents
In the Master-Slave Replication model, different Slave Nodes contain __________. – Same Data
In a column-database, a row is uniquely identified by __________. – Column Family
JSON documents are built up of _________. – All
Limitation(s) of RDBMS is/are ______________. – Scalability/Database design complexity
NoSQL databases are designed to expand _________. – Horizontally
NoSQL database supports caching in __________. System Memory
The Document base unit of storage resembles __________ in an RDBMS. – Rows
The full form of 'CRUD' is _________. (Create, Read, Update and Delete)
The type of Graph Store that works in real-time is _ Graph Database
The column store has to perform _______ IO to insert a new value. - as many disk blocks
The scalability of Key-Value database is achieved through __________. Sharding Replication
The RDBMS 'table' equivalent terminology in Riak is ________. – Bucket
The MATCH clause is roughly equivalent to the _______ clause in SQL and the RETURN clause to a
______ clause. – Where, Select

In a columnar database, the columns are stored together on disk, achieving a higher compression ratio,
which is an expensive operation. – False
JSON is a lightweight substitute for XML. – True
Only Nodes have properties in the Graph database. – False
All NoSQL databases are similar. – False
Neo4j is an example of Document Store DB. – False
MongoDB read/write performance can be tuned with the help of Stored Procedures. – False
Columnar databases are preferable for OLTP systems. – False
In MongoDB, there is a similar feature of 'like' expression similar to that in RDBMS. – False
In HBase, 'Columns' are named and specified in the table definition. – False
Cassandra allows defining composite Primary Keys. – True
Document databases split a document into its constituent name/value pairs for indexing purposes. –
False
The horizontal scaling approach tends to be cheaper as the number of operations, and the size of the
data increases. – True
A Key-value store supports Secondary Indexes. – True

In NoSQL databases, the data can be stored ___________. – Multiple times


Graph-based database stores entities and the relationship between them as edges and nodes of a graph,
respectively.
Distributing the database provides us an option of using cheaper servers called __________ -
Commodity Servers
NoSQL entrusts upon a softer model called __________. – BASE
In the document-based database, documents are stored in the _________ part of the key-value store -
Value
NoSQL is a flexible database management system that provides a way to store _____________. – Both
A consistency model used in distributed computing to achieve high availability that informally
guarantees that, if no new updates are made to a given data item, all accesses to that item will return
the last updated value eventually is __________. – Eventually Consistent
NoSQL is classified into ________ types. – 4
MongoDB is an example of _________________. - Document
In document-based database, storage and retrieval is in the form of ___________. – Document
Master Slave and Sharding are examples of the _________ approach.- Horizontal Scaling
Cap theorem states that it is impossible for a distributed data store to simultaneously provide more than
_________ out of Consistency, Availability, Partition Tolerance. – Two
In Master-Slave databases, all reads are performed against the ___________. – Replicated Slaves
Vertical scaling is called __________. – Scaling Up
In NoSQL databases, data is stored in a ____________ manner. - Distributed

Graph-based database stores entities and the relationship between them as edges and nodes of a graph,
respectively. – True
NoSQL requires Schema like RDBMS. - False
Consistency of CAP theorem states that all the protocols must be satisfied by the transaction. There can
be half-completed transactions. – False

Which architecture does NoSQL follow? -Shared Nothing


Which model does RDBMS adapt? – ACID
Find the odd one out. – MySQL

You might also like