NoSQL - Database Revolution
NoSQL - Database Revolution
Let's revisit the origin of DBMS to align our understanding before you step into NoSQL.
Data storage and retrieval was a key focus area along with the evolution of computers.
Towards the end of 19th Century, 'Punch cards' were leveraged for input, output, and data
storage. It provided a faster approach to key-in data, and to retrieve it.
Later in 1960's, two famous DBMS was launched. IBM came up with Integrated Management
System (IMS), written for the Apollo program on System/360 and Integrated Database System
(IDS) by Charles W. Bachman.
Both IDS and IMS were called as Navigational DBMS.
In 1970, E.F. Codd envisioned a new model of DBMS through his paper titled A Relational Model of
Data for Large Shared Data Banks which paved the way for the emergence of Relational DBMS
(RDBMS).
RDBMS formulated a new methodology for storing data and processing large databases.
The records (data) would be stored in 'table' with fixed-length records unlike the free-form list of
linked records in IDS, IMS.
Later, databases like Ingres, query language like SQL got evolved.
The nuances and benefits of RDBMS had a wider reach, resulting in buy-in from different vendors, setting
a stage for an era of Database wars.
Many RDBMS such as Sybase, Microsoft SQL Server, Informix, MySQL, DB2, Oracle got
launched around the same time claiming better
Performance
Availability
More functionalities
Cost of storage
Economy of usage.
Later in 2005, the difference and change in architectures design of applications between the client-
server era and the era of massive web-scale applications triggered lot of pressure on the
Level of usage
Volume of data considered
This started the era of Distributed Non-Relational Database Management System, later coined
as 'NoSQL', which was more aligned to New-Age applications.
'NoSQL' grabbed the attention on the database system that broke the practice of the traditional SQL
database.NoSQL - Journey Ahead!
Many big enterprises have started adopting alternative databases such as NoSQL moving away from their
traditional empire of RDBMS now.
As a result, they can save money, innovate more rapidly, yield better productivity and quicker ROI.
In this course, you will explore more into NoSQL and understand in detail about its types, modus
operandi, storage model and finally knowing how to make the right choice of NoSQL.
As you progress, there would be a parallel analogy with RDBMS to induce better understanding.
Eventual Consistency
BASE
Basically Available Possibilities of faults but not fault of whole system.
Soft State Copies of data may be inconsistent.
Eventual Consistency Copies become consistent Eventually.
NoSQL databases require minimal management, and it supports many features, which makes the
need for administration and tuning requirements becomes less. This covers
o Automatic repair
o Easier data distribution
o Simpler data models
NoSQL database is schema-less so that data can be inserted into a database with ease, even
without any predefined schema.
The format or data model could be changed anytime, without application disruption.
NoSQL - Integrated:
NoSQL database supports caching in system memory, so it increases data output performance.
Changing pH measure
Data replication is all about having your data geo-distributed through a non-interactive and reliable
process as a contingency measure to avoid loss of data.
Data replication in RDBMS is little difficult as they have not adopted Horizontal scaling.
NoSQL data replication is homogenous, in the sense data cannot be replicated from a NoSQL system to
RDBMS SQL system.
Sharding Replication
Master-Slave Replication
Peer-to-Peer Replication.
Consistency
Buckets are like namespace keys, which reduces key collisions. Example - All Student keys may reside
in the Student bucket.
With Buckets - 'write' is considered good only when the data is consistent across all the nodes
where the data is stored.
Different key-value databases have different specifications of transactions, and they implement
transactions in different ways.
There is No guarantee on the write operations.
Value of W value 4.
Then write is only reported as successful, when writes happen in at least four nodes.
In some DBs, the value of the key is retrieved using the fetch API. Ex: Riak.
For example, say you are sharding by the first character of the key.
if the key is k76151487d, which starts with an 'k', will be sent to a different node
than the key dgh396542.
Benefits
Impact
If the node used to store 'f' goes down, the data stored on that node becomes unavailable, nor
can new data be written with keys that start with f.
Best practice is to choose a W value to match your consistency needs during bucket creation.
Every web session is assigned a unique session-id value, which the applications store on
disk(logfile) / DB(RDBMS).
Moving this to key-value DB will improve performance to great extent as every info about the
session could be
The operation is very fast, as session info are stored in a single object.
Usage:
userId
username and
additional attributes
and user preferences
language
country
timezone and
user favorites and so on
All these information could be stored in a single object, so getting preferences of a user would just take
single GET operation.
All e-commerce websites have shopping carts deeply linked with the user.
The shopping cart details should be available at all times, across different browsers, devices,
machines, and sessions.
Key-Value would be best-fit for this scenario, with all shopping related information put into 'value'
where the 'key' is the userid.
Usage
Amazon uses its DynamoDB for storing its user's shopping cart details.
Key-value databases would not be the best fit in the few scenarios highlighted below.
Relationships among Multiple-Data - There exist relationships between different sets of data or
correlation and the data between different sets of keys.
Multi-operation Transactions -
If you are storing many keys and when there is a failure to save one of the keys, and you want to roll back
the rest of the operations.
Query Data by 'value' - Searching the 'keys' based on some info found in the 'value' part of the
key-value pairs. Some exceptions include Riak Search or indexing engines such as Lucene or
Solr.
Operation by groups - As operations are confined to one key at a time, there exists no way to
run several keys simultaneously.
Developed by Facebook initially for inbox search feature and later handed over to Apache.
Designed to support Hadoop ecosystem tools (Cloudera Impala, Apache Spark, and
MapReduce).
There exist variations on the implementation of columnar paradigm within both traditional relational
systems and other NoSQL systems.
In a columnar database, the columns are stored together on disk, achieving a higher compression ratio,
which is an expensive operation. – False
JSON is a lightweight substitute for XML. – True
Only Nodes have properties in the Graph database. – False
All NoSQL databases are similar. – False
Neo4j is an example of Document Store DB. – False
MongoDB read/write performance can be tuned with the help of Stored Procedures. – False
Columnar databases are preferable for OLTP systems. – False
In MongoDB, there is a similar feature of 'like' expression similar to that in RDBMS. – False
In HBase, 'Columns' are named and specified in the table definition. – False
Cassandra allows defining composite Primary Keys. – True
Document databases split a document into its constituent name/value pairs for indexing purposes. –
False
The horizontal scaling approach tends to be cheaper as the number of operations, and the size of the
data increases. – True
A Key-value store supports Secondary Indexes. – True
Graph-based database stores entities and the relationship between them as edges and nodes of a graph,
respectively. – True
NoSQL requires Schema like RDBMS. - False
Consistency of CAP theorem states that all the protocols must be satisfied by the transaction. There can
be half-completed transactions. – False