SQL and No PDF
SQL and No PDF
MODULE 5
Syllabus
➢Introduction to NoSQL
➢Instead of the typical tabular structure of a relational database, NoSQL databases house
data within one data structure.
➢Since this non-relational database design does not require a schema, it offers
rapid scalability to manage large, typically unstructured data sets.
➢NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may
support SQL-like query languages
Features of NoSQL Databases:
➢Schema Agnostic: NoSQL Databases do not require any specific schema or
storage structure than traditional RDBMS.
➢High scalability: NoSQL databases use sharding for horizontal scaling. NoSQL
can handle a huge amount of data because of scalability, as the data grows
NoSQL scales. The auto itself to handle that data in an efficient manner.
➢Scalability: NoSQL databases are highly scalable, which means that they can
handle large amounts of data and traffic with ease. This makes them a good fit
for applications that need to handle large amounts of data or traffic
➢Performance: NoSQL databases are designed to handle large amounts of data
and traffic, which means that they can offer improved performance compared
to traditional relational databases.
➢Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which
means that they do not guarantee the consistency, integrity, and durability of data.
This can be a drawback for applications that require strong data consistency
guarantees.
➢Narrow focus: NoSQL databases have a very narrow focus as it is mainly
designed for storage but it provides very little functionality. Relational databases
are a better choice in the field of Transaction Management than NoSQL.
➢Lack of support for complex queries: NoSQL databases are not designed to
handle complex queries, which means that they are not a good fit for
applications that require complex data analysis.
➢Lack of maturity: NoSQL databases are relatively new and lack the maturity of
traditional relational databases. This can make them less reliable and less secure than
traditional databases.
➢Unavailability of GUI: GUI mode tools to access the database are not flexibly
available in the market.
➢Backup: Backup is a great weak point for some NoSQL databases like MongoDB.
➢Large document size: Some database systems like MongoDB and CouchDB store data
in JSON format. This means that documents are quite large (BigData, network
bandwidth, speed)
SQL NoSQL
These databases are not suited for These databases are best suited for
hierarchical data storage.. hierarchical data storage
These databases are best suited for complex These databases are not so good for
queries complex queries
Vertically Scalable Horizontally scalable
➢A data model is the model through which we perceive and manipulate our data.
➢In a database, the data model describes how we interact with the data in the database.
➢This is distinct from a storage model, which describes how the database stores and
manipulates the data internally.
➢The term “data model” often means the model of the specific data in an application.
➢The dominant data model is the relational data model, which is best
visualized as a set of tables.
➢ Each table has rows, with each row representing some entity of interest.
➢A column may refer to another row in the same or different table, which
constitutes a relationship between those entities
Aggregates
➢The relational model stores the information and divides it into tuples (rows).
➢A tuple is a limited data structure: It captures a set of values, and cannot nest one
tuple within another to get nested records, nor put a list of values or tuples within
another.
➢The term aggregate means a collection of objects that we use to treat as a unit.
➢These units of data or aggregates form the boundaries for ACID operation
➢Aggregate is a term that comes from Domain-Driven Design.
➢A black-diamond composition marker in UML to show how data fits into the
aggregation structure.
➢The customer contains a list of billing addresses; the order contains a list of order
items, a shipping address, and payments.
➢The link between the customer and the order isn’t within either aggregate—it’s a
relationship between aggregates.
➢Similarly, the link from an order item would cross into a separate aggregate structure
for products
// in customers }
{ ],
"customer": { "shippingAddress":[{"city":"Chicago"}]
"id": 1, "orderPayment":[
"name": "Martin", {
"billingAddress": [{"city": "Chicago"}], "ccinfo":"1000-1000-1000-1000",
"orders": [ "txnId":"abelif879rft",
{ "billingAddress": {"city": "Chicago"}
"id":99, }],
"customerId":1, }]
"orderItems":[ }
{ }
"productId":27,
"price": 32.45,
"productName": "NoSQL Distilled"
Consequences of Aggregate Orientation
➢Various data modeling techniques have provided ways of marking aggregate or
composite structures.
➢The problem is that modelers rarely provide any semantics for what makes an
aggregate relationship different from any other; where there are semantics, they vary.
➢It is, however, not a logical data property: It’s all about how the data is being used by
applications
➢Aggregation is not a logical data property It is all about how the data is being
used by applications.
➢An aggregate structure may be an obstacle for others but helps with some
data interactions.
➢Easy Replication.
➢It can handle Structured semi-structured and unstructured data with equal
effort.
Disadvantages
➢Document-based Databases
➢Key-value Stores
➢Column Family
➢Graph-based Databases
KEY-VALUE DATA MODEL
➢A key-value data model or database is also referred to as a key-value store.
➢In this, an associative array is used as a basic database in which an individual key is
linked with just one value in a collection.
➢An efficient and compact structure of the index is used by the key-value store
to have the option to rapidly and dependably find value using its key.
➢Examples:
➢ Couchbase:
➢ Amazon DynamoDB:
➢ Riak:
➢ Aerospike:
➢ Berkeley DB:
➢Features:
➢ For storing, getting, and removing data, key-value databases utilize simple
functions.
➢ It is very easy to use. Due to the simplicity of the database, data can accept
any kind, or even different kinds when required.
➢ Its response time is fast due to its simplicity, given that the remaining
environment near it is very much constructed and improved.
➢ The key-value store database is not refined. You cannot query the database
without a key.
DOCUMENT DATA MODEL
➢A Document Data Model is a lot different than other data models because it
stores data in JSON, BSON, or XML documents.
➢In this data model, we can move documents under one document, and apart
from this, any particular elements can be indexed to run queries faster.
➢Often documents are stored and retrieved in such a way that it become close to
the data objects that are used in many applications which means very less
translations are required to use data in applications.
➢JSON is a native language that is often used to store and query data.
➢This is a data model that works as a semi-structured data model in which the
records and data associated with them are stored in a single document which
means this data model is not completely unstructured.
➢ Amazon DocumentDB
➢ MongoDB
➢ Cosmos DB
➢ ArangoDB
➢ Couchbase Server
➢ CouchDB
➢Features:
➢ Flexible Schema: Schema is very flexible to support this statement one must
know that not all documents in a collection need to have the same fields.
➢ Distributed and Resilient: Document data models are dispersed which is the
reason behind horizontal scaling and distribution of data.
➢ Manageable Query Language: These data models are the ones in which query
language allows the developers to perform CRUD (Create Read Update
Destroy) operations on the data model
➢Advantages:
➢ Schema-less: These are very good in retaining existing data at massive volumes
because there are no restrictions in the format and the structure of data storage.
➢ Open formats: It has a very simple build process that uses XML, JSON, and its other
forms.
➢ Built-in versioning: It has built-in versioning which means as the documents grow
in size there might be a chance they can grow in complexity. Versioning decreases
conflicts.
➢Disadvantages:
➢ Consistency Check Limitations: One can search the collections and documents that are
not connected to an author collection but doing this might create a problem in the
performance of database performance.
➢ Security: Nowadays many web applications lack security which in turn results in the
leakage of sensitive data. So it becomes a point of concern, one must pay attention to
web app vulnerabilities.
COLUMN-FAMILY MODEL
➢Most databases have a row as a storage unit which helps write performance.
➢ There are many scenarios where “writes” are rare, but often need to “read” a
few columns of many rows simultaneously.
➢In this situation, storing groups of columns for all rows as the basic storage unit is
better, which is why these databases are called column stores.
➢In Columnar Data Model instead of organizing information into rows, it does in
columns.
➢This makes them function the same way that tables work in relational databases.
➢The column-family model is as a two-level aggregate structure.
➢As with key-value stores, the first key is often described as a row identifier,
picking up the aggregate of interest.
➢Each column has to be part of a single column family, and the column acts as
a unit for access, with the assumption that data for a particular column family
will be usually accessed together.
➢Since the database knows about these common groupings of data, it can use
this information for its storage and access behavior.
ADVANTAGES DISADVANTAGES
➢Document stores make the content of the aggregate available to the database to
form indexes and queries.
➢Riak, a key-value store, allows you to put link information in metadata, supporting
partial retrieval and link-walking capability
➢Graph Based Data Model is a type of Data Model which tries to focus on
building the relationship between data elements.
➢As the name suggests Graph-Based Data Model, each element here is stored
as a node, and the association between these elements is often known as Links.
➢ Association is stored directly as these are the first-class elements of the data
model. These data models give us a conceptual view of the data.
➢These are the data models which are based on topographical network
structure.
➢Nodes: These are the instances of data that represent objects which is to be
tracked.
➢ In these data models, the nodes which are connected together are connected
physically and the physical connection among them is also taken as a piece of data.
➢ This data model reads the relationship from storage directly instead of calculating
and querying the connection steps.
➢ Like many different NoSQL databases these data models don’t have any schema as
it is important because schema makes the model well and good and easy to edit.
➢ Small User Base: The user base is small which makes it very difficult to get
support when running into a system.
SCHEMA LESS DATABASES
➢A common theme across all the forms of NoSQL databases is that they are schema less.
➢A document database effectively does the same thing, since it makes no restrictions on
the structure of the documents you store.
➢Column-family databases allow you to store any data under any column.
➢Graph databases add new edges and freely add properties to nodes and edges.
➢With a schema, figure out in advance what need to store, but that can be hard to
do.
➢This allows to change data storage easily as learn more about our project.
➢Furthermore, if we find we don’t need some things anymore, we can just stop
storing them, without worrying about losing old data as we would if we delete
columns in a relational schema
➢As well as handling changes, a schemaless store also makes it easier to deal with
nonuniform data: data where each record has a different set of fields.
➢A schema puts all rows of a table , which becomes awkward if you have different
kinds of data in different rows.
➢Either end up with lots of columns that are usually null, or end up with
meaningless columns.
➢Schemaless ness avoids this, allowing each record to contain just what it needs—
no more, no less.
➢Challenges
➢ Schemaless Databases:
➢ Ideal for nonuniform data and rapid development; flexible within an aggregate.
➢ Relational Databases:
➢ Better for uniform data and controlled schema changes; ensures data integrity
and optimization.
➢Both types of databases have their own advantages and challenges, and the
choice depends on the specific requirements of the application.
Materialized Views
➢Aggregate-Oriented Data Models:
➢ Advantage: Useful for accessing all data for an order in a single unit.
➢ Disadvantage: Difficult to answer queries like total sales of a product over time
without reading every order.
➢Relational Databases:
➢Views: Defined by computation over base tables, providing a way to look at data
differently.
➢Materialized Views:
➢ Materialized views are effective for data that is read heavily but can stand
being somewhat stale
➢ Eager Approach:
➢ Batch Jobs:
➢ Outside Database:
➢ Read data, compute the view, and save it back to the database.
➢ Within Database:
➢Depending on a distribution model, data can store that will give the ability to
handle larger quantities of data, the ability to process a greater read or write
traffic, or more availability in the face of network slowdowns or breakages.
➢Replication takes the same data and copies it over multiple nodes.
➢Run the database on a single machine that handles all the reads and writes to
the data store.
➢The advantage is, it eliminates all the complexities that the other options
introduce; it’s easy for operations people to manage and easy for application
developers to reason about.
➢A lot of NoSQL databases are designed around the idea of running on a cluster, it
can make sense to use NoSQL with a single-server distribution model if the data
model of the NoSQL store is more suited to the application.
➢If the data usage is mostly about processing aggregates, then a single-server
document or key-value store may well be worthwhile because it’s easier on
application developers
Sharding
➢Sharding is a database partitioning technique that divides data into smaller subsets
based on a key or a criterion, or shards, and stores them on separate servers.
➢ It's a core feature of NoSQL databases, which are designed for distributed
computing and automatic sharding
➢Sharding allows the data to be distributed across multiple servers, which can
improve scalability, performance, availability, and load balancing
➢How does sharding work?
➢ The sharding function can be based on various criteria, such as a hash value, a
range, a list, or a custom logic.
➢ The sharding function should ensure that the data is evenly distributed among
the shards, and that the shards are easy to locate and access.
➢ The sharding function also defines the sharding key, which is the attribute or the
combination of attributes that identifies the shard for a given data item.
➢Benefits : Scalability, performance, and availability.
➢ It can help scale the database horizontally by adding more servers or nodes as the data
grows, thus reducing load and bottlenecks on a single server and increasing throughput
and storage capacity.
➢ Sharding can also improve performance by reducing query latency and network traffic
since queries can be executed on smaller and more relevant subsets of data.
➢ Sharding can improve availability by providing redundancy and fault tolerance, avoiding
single points of failure, and supporting replication and backup for data consistency and
durability.
➢Drawbacks: Complexity, Consistency, and Cost.
➢ It can increase the complexity of the database design, management, and maintenance, as
well as the cost of the database.
➢ Sharding requires careful planning and implementation to avoid data imbalance, hotspots,
or fragmentation.
➢ It can compromise the consistency of the database by introducing the possibility of data
inconsistency, duplication, or loss.
➢ It can create challenges for enforcing data integrity and atomicity of transactions.
➢ It can also create issues for performing complex queries, joins, or aggregations.
➢ it can demand more skills and expertise for managing the sharded database and resolving
potential problems or conflicts.
Master-Slave Replication
➢This master is the authoritative source for the data and is usually responsible
for processing any updates to that data.
➢ Horizontal Scaling: Add more slave nodes to handle increased read requests.
➢ Limitation: The master node's ability to process updates and pass them on
limits scalability for write-heavy datasets.
➢ 2. Read Resilience:
➢ Handling Master Failure: Slaves can continue to handle read requests if the master fails.
➢ Write Limitation: No writes can be processed until the master is restored or a new
master is appointed.
➢ Quick Recovery: Slaves can be quickly appointed as a new master, speeding up recovery
➢ Single-Server Store with Hot Backup: All traffic goes to the master, with the slave acting
as a backup.
➢ Separate Paths: Ensure different read and write paths in the application.
➢ Testing: Conduct tests to ensure reads still occur if writes are disabled.
Drawbacks
➢1. Inconsistency:
➢ Propagation Delay: Different clients may see different values due to delayed
propagation of changes.
➢ Hot Backup Concern: Updates not passed to the backup are lost if the master fails
➢2. Master-slave replication helps with read scalability but doesn’t help with the scalability
of writes.
➢3. It provides resilience against the failure of a slave, but not of a master.
➢All the replicas have equal weight, and can all accept writes, and the loss of any of
them doesn’t prevent access to the data store.
➢In this All servers are considered peers, and any server can update the same data
at the same time.
➢ Node Failure Resilience: Can handle node failures without losing data access.
➢Complications:
➢If both master-slave replication and sharding is used, there have multiple
masters, but each data item only has a single master.
➢In this model, there might have tens or hundreds of nodes in a cluster with
data sharded over them.
➢Should a node fail, then the shards on that node will be built on the other
nodes