0% found this document useful (0 votes)
25 views9 pages

Unit IV Rdbms

The document discusses the evolution and significance of relational databases, highlighting their strengths such as data persistence, concurrency management, and integration capabilities. It also addresses the challenges posed by impedance mismatch and the emergence of NoSQL databases, which offer alternative data models like key-value and document stores to handle large-scale data and improve application development. The document concludes by exploring aggregate data models and their advantages in managing complex records compared to traditional relational approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views9 pages

Unit IV Rdbms

The document discusses the evolution and significance of relational databases, highlighting their strengths such as data persistence, concurrency management, and integration capabilities. It also addresses the challenges posed by impedance mismatch and the emergence of NoSQL databases, which offer alternative data models like key-value and document stores to handle large-scale data and improve application development. The document concludes by exploring aggregate data models and their advantages in managing complex records compared to traditional relational approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT - IV: NoSQL : The Value of Relational Databases - Impedance Mismatch - Application and Integration Databases -

Attack of the Clusters - The Emergence of NoSQL - Aggregate Data Models : Aggregates - Key-Value and Document
Data Models - Column-Family Stores - Summarizing Aggregate-Oriented Databases.
1. The value of Relational databases
Relational databases have become such an embedded part of our computing culture that it’s easy to take them for
granted. It’s therefore useful to revisit the benefits they provide.
1.1 Getting at persistent Data
A database’s main job is to safely store large amounts of information for a long time. Unlike main memory, which is fast
but temporary, databases use backing storage (like disks) that keeps data even when the computer shuts down. Unlike
simple files, databases make it easy to store huge amounts of data and quickly access just the parts you need.
1.2 Concurrency
In big business (enterprise) applications, many people often use the same data at the same time, and some may even try
to change it. Most of the time, they’re working on different parts of the data, but sometimes two people might try to
update the same piece, like booking the same hotel room. To prevent mistakes like double-bookings, we need a way to
carefully manage these interactions.
Handling this (called concurrency) is tricky and full of possible errors, even for skilled programmers. Since enterprise
systems can have many users and other systems accessing data all at once, the chance of problems is high.
Relational databases help with this by using transactions to control all access to the data. Transactions don’t solve every
problem (for example, you can still get an error if two people try to book the same room at the same time), but they
make concurrency much easier to manage.
Transactions are also useful for handling errors. If something goes wrong while making changes, you can roll back the
transaction, which undoes the partial changes and keeps the data clean.
1.3 Integration
Big business systems often involve many different applications, built by different teams, that need to work together. This
teamwork can be tricky because the applications — and the teams behind them — must share information. If one
application updates some data, the others need to see that update too.
A common solution is called shared database integration. This means all the applications store and use data in the same
database. By doing this, each application can easily access the others’ data, and the database’s built-in concurrency
control makes sure everything stays consistent, even when many applications (or users) are using it at the same time.
Example: Imagine an online store. The website app, the inventory app, and the shipping app all need access to the same
product data. If someone buys the last laptop, the website should instantly show “out of stock,” and the shipping app
should prepare the delivery. A shared database ensures that all apps stay in sync.
1.4 A (Mostly) Standard Model
Relational databases became popular because they offer important benefits in a standardized way. This means that
once developers and database experts learn the basics of how relational databases work, they can apply that knowledge
to many different projects.
Even though different relational databases (like Oracle, MySQL, or PostgreSQL) have small differences, the main ideas
stay the same: their SQL languages are very similar, and transactions work in almost the same way across all of them.
2. Impedance Mismatch
Relational databases have many strengths, but they aren’t perfect. A big problem for developers is the “impedance
mismatch”—the gap between how data is stored in relational databases (as tables and rows) and how data is
represented in programming languages (as objects, lists, or nested structures).
Because of this, developers must translate between the two, which can be frustrating. In the 1990s, many thought
object-oriented databases (which stored data like programming languages do) would replace relational databases. But
while object-oriented programming became popular, object-oriented databases faded away. Relational databases stayed
strong because of their standard SQL language and their role in integrating different systems.
Tools like ORM frameworks (e.g., Hibernate, iBATIS) helped reduce the pain by automating much of the translation work,
but they can also cause performance issues if overused. By the 2000s, relational databases still dominated, though
cracks in their dominance began to appear.

3. Application and Integration Databases


Relational databases won over object-oriented databases mainly because SQL made it easy for different applications to
share and access the same data. This “shared database” approach kept data consistent, but it also made databases more
complex. If one application wanted to change something, it had to coordinate with all the others. Different teams often
had different needs, which made managing the database harder.
Another approach is the “application database,” where a database is used only by a single application and its team. This
makes it easier to manage, update, and keep consistent, since the same team controls both the database and the
application code. To connect with other applications, data is shared through interfaces and services, not directly through
the database.
In the 2000s, web services became popular for this kind of integration. They allowed applications to talk over HTTP and
exchange richer data (like XML or JSON), not just simple tables like in SQL. This made communication more flexible.
With application databases, teams had more freedom to choose different database types, even non-relational ones. Still,
most teams stuck with relational databases because they were familiar, reliable, and good enough for most needs.
4. Attack of the Clusters
In the early 2000s, after the dot-com crash, some big websites grew massively in size. They started collecting huge
amounts of data—like user activity, logs, maps, and social networks—while also serving millions of users.
To handle this growth, companies needed more computing power. They had two options:
 Scale up: Buy bigger, more powerful machines (but these are very expensive and have limits).
 Scale out: Use lots of smaller, cheaper machines in a cluster. Clusters are also more reliable because even if one
machine fails, the system keeps running.
But relational databases were not built for clusters. Some solutions existed, like Oracle RAC (using shared disks) or
sharding (splitting data across different servers). However, these approaches caused big problems—sharding was
complex, lost important features like cross-database queries and transactions, and felt unnatural to developers. On top
of that, relational databases had expensive licenses that became even costlier in cluster setups.
This gap pushed companies to look for alternatives. Google and Amazon, who were dealing with massive clusters and
huge data, created new systems: Google BigTable and Amazon Dynamo. These ideas inspired others to build databases
designed specifically for clusters.
At first, it seemed like only giant companies needed such solutions, but over time, more organizations also started
capturing and analyzing large data sets—and running into the same problems. This made cluster-based databases a
serious challenge to the dominance of relational databases.
5. The Emergence of NoSQL
The term NoSQL first appeared in the 1990s for a small open-source relational database that didn’t use SQL. But the
meaning we know today started in 2009 at a meetup in San Francisco, where developers of new nonrelational databases
(like Cassandra, MongoDB, CouchDB, HBase) gathered. The name “NoSQL” was picked mainly because it was short and
good for a hashtag—not because it had a clear definition.
What NoSQL means today:
 Not strictly defined, but usually refers to open-source, distributed, nonrelational databases.
 Don’t use standard SQL (some have SQL-like query languages).
 Often designed for clusters, with flexible consistency options.
 Often schema-less (you can add fields freely).
 Includes different types like document stores, key-value stores, column stores, and graph databases.
 Grew from the needs of big web systems in the 2000s.
People sometimes say NoSQL = “Not Only SQL”, meaning relational databases are still useful, but now we have more
options. This idea is called polyglot persistence—using different types of databases for different needs instead of always
picking relational ones.
Why use NoSQL?
1. To handle big data and clusters.
2. To make application development easier, with simpler and more flexible data handling.
6. Aggregate Data Models
A data model is how we view and work with data in a database. It’s different from a storage model, which is how the
database stores data internally. Ideally, we don’t need to worry about the storage model, but sometimes we must
understand it for better performance.
Often, “data model” can mean the structure of a specific application (like customers, orders, products). But here, it
means the way a database organizes data (the metamodel).
For decades, the relational model has dominated—data stored in tables (like spreadsheets) with rows (entities) and
columns (attributes). Relationships connect rows across tables.
With NoSQL, the shift is away from relational tables. NoSQL databases use four main models:
 Key-value
 Document
 Column-family
 Graph
The first three share an aggregate orientation, meaning they focus on storing data as grouped chunks (aggregates) rather
than breaking it into many tables.

6.1 Aggregates
In the relational model, data is stored in tuples (rows). Tuples are simple: each row just holds values. You can’t nest one
row inside another or store lists inside them. This simplicity makes operations clear, since everything works on rows.
Aggregate orientation is different. It allows complex records that can include lists or nested data. Key-value, document,
and column-family databases all use this idea. We call such a complex record an aggregate.
The term comes from Domain-Driven Design, where an aggregate is a group of related objects treated as one unit.
Aggregates are updated together (atomically) and make it easier to:
 manage consistency,
 replicate or shard across clusters,
 and work with in applications, since developers often use these structures directly.
6.1.1 Example of Relations and Aggregates
Suppose we need to build an e-commerce site to sell products online. We must store details about users, products,
orders, shipping, billing, and payments.
In a relational database, we would design a normalized model with multiple linked tables, ensuring no data is repeated
and relationships are maintained.
In a NoSQL (aggregate-oriented) approach, we would group related data together into larger units (aggregates), such as
keeping order details, shipping, and billing info inside the same record.
Again, we have some sample data, which we’ll show in JSON format as that’s a common representation for data in
NoSQL land.
// in customers
{
"id": 1,
"name": "Martin",
"billingAddress": [{ "city": "Chicago" }]
}
// in orders
{
"id": 99,
"customerId": 1,
"orderItems": [
{
"productId": 27,
"price": 32.45,
"productName": "NoSQL Distilled"
}
],
"shippingAddress": [{ "city": "Chicago" }],
"orderPayment": [
{
"ccinfo": "1000-1000-1000-1000",

"txnId": "abelif879rft",
"billingAddress": { "city": "Chicago" }
}
]
}
In this model, we have two main aggregates:
 Customer → contains billing addresses.
 Order → contains order items, shipping address, and payments (each payment also has its own billing address).
Addresses are copied into each aggregate instead of using IDs. This avoids unwanted changes (e.g., an old shipping
address should stay the same for past orders).
Relationships between aggregates are kept separate—for example:
 Customer ↔ Order link.
 Order item ↔ Product link.
Sometimes product info (like name) is repeated inside the order to avoid extra lookups (denormalization).
The key idea: when designing aggregates, think about how the data will be accessed. For instance, we could also design
it so all customer orders are stored inside the customer aggregate.
Using the above data model, an example Customer and Order would look like this:
// in customers
{
"customer": {
"id": 1,
"name": "Martin",
"billingAddress": [{ "city": "Chicago" }],
"orders": [
{
"id": 99,
"customerId": 1,
"orderItems": [
{
"productId": 27,
"price": 32.45,
"productName": "NoSQL Distilled"
}
],

"shippingAddress": [{ "city": "Chicago" }]

"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}
}
6.1.2 Consequences of Aggregate Orientation
Relational databases store data well with tables and relationships, but they don’t recognize aggregates (like “an order =
items + shipping address + payment”). They treat all relationships the same, so the database can’t optimize how data is
grouped or stored. That’s why relational databases—and graph databases—are called aggregate-ignorant.
Being aggregate-ignorant isn’t always bad. It makes data flexible for many uses (e.g., analyzing product sales), but it can
be harder to work with when we want to treat related data as a single unit.
Aggregate-oriented databases (like many NoSQL types) explicitly group data (e.g., an order with all its details). This helps
in clusters, because the whole aggregate can live on one node, reducing cross-node queries.
For transactions, relational databases allow ACID operations across many rows/tables. NoSQL aggregate databases
usually support atomic updates only within a single aggregate. If multiple aggregates need updating together, the
application must handle it.
So:
 Relational & graph DBs = aggregate-ignorant, flexible, ACID across tables.
 Aggregate-oriented DBs = good for clusters, atomic per aggregate, simpler for many use cases.
7. Key-Value and Document Data Models
Key-value and document databases are both aggregate-oriented.
 In a key-value database, each aggregate is just a “blob” of data linked to a key. The database doesn’t know
what’s inside—it only stores and retrieves it by key. This gives full freedom but limited access.
 In a document database, the aggregate has a defined structure (like JSON). The database can look inside it, run
queries on fields, return only parts of it, and build indexes.
Key differences:
 Key-value: Simple, flexible, but only key lookups.
 Document: Structured, supports queries, indexing, and partial retrieval.
In practice, the boundary blurs—key-value stores add features (like metadata, lists, or search), and document databases
still use IDs for key-based lookups. But generally:
 Key-value = lookup by key.
 Document = query by internal structure.
8. Column-Family Stores
Google’s Bigtable was one of the first big NoSQL databases and inspired others like HBase and Cassandra. Although its
name suggests a table, it’s better to think of it as a two-level map.
Before NoSQL, column stores (like C-Store) still used SQL but stored data by columns instead of rows to make reading a
few columns across many rows faster.
Bigtable-style databases (called column-family databases) take a different approach:
 Data is grouped into column families.
 Each row (aggregate) is identified by a key.
 Inside each row, data is stored as a map of columns.
 You can fetch the whole row or just a specific column (e.g., get('1234', 'name')).
So, column-family databases mix the ideas of key-value stores and structured access, letting you read data either by row
or by column group.
In column-family databases, data can be seen in two ways:
 Row-oriented: Each row is like an aggregate (e.g., a customer) with column families grouping related data
(profile, order history).
 Column-oriented: Each column family defines a type of record (e.g., customer profiles), and rows are like a join
of these records.
Unlike relational tables, rows in column-family databases don’t have to share the same columns—you can add new
columns freely. But creating new column families is rare and more complex.
In Cassandra, rows exist only in one column family, but these families can have supercolumns (nested columns), similar
to Bigtable’s families.
Column families can be:
 Skinny: few columns, same structure across rows (like records).
 Wide: many columns, often different per row, useful for modeling lists (e.g., order items).
Wide column families can also sort columns, so you can query data ranges (e.g., by date + ID like 20111027-1001).
So, column families give databases a flexible two-dimensional structure—part record-like, part list-like.

9. Summarizing Aggregate-Oriented Databases


We’ve now seen the three main aggregate-oriented data models:
 All share: an aggregate with a key for lookup. The aggregate stays on one node in a cluster and is the unit for
atomic updates.
 Key-Value model: aggregate is a black box—you can only get the whole thing by key (no queries, no partial
access).
 Document model: aggregate is readable by the database—supports queries and partial retrievals, but since
there’s no schema, optimization is limited.
 Column-Family model: aggregate is divided into column families—adds structure that helps the database
improve access and storage.

You might also like