0% found this document useful (0 votes)
24 views42 pages

Ut 53

Key-value databases are simple NoSQL stores that allow data access primarily via a primary key, with operations including getting, putting, or deleting values. Riak, a popular key-value store, organizes keys into buckets to manage data and offers features like conflict resolution, transactions, and scaling through sharding. The design of keys and the management of consistency, transactions, and query capabilities are crucial for effective use of key-value stores.

Uploaded by

ckesava474
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views42 pages

Ut 53

Key-value databases are simple NoSQL stores that allow data access primarily via a primary key, with operations including getting, putting, or deleting values. Riak, a popular key-value store, organizes keys into buckets to manage data and offers features like conflict resolution, transactions, and scaling through sharding. The design of keys and the management of consistency, transactions, and query capabilities are crucial for effective use of key-value stores.

Uploaded by

ckesava474
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Key-Value Databases

A key-value store is a” simple hash table, primarily


used when all access to the database is via
primary key” .
Think of a table in a traditional RDBMS with two
columns, such as ID and NAME, the ID column
being the key and NAME column storing the value.
In an RDBMS, the NAME column is restricted to
storing data of type String.
The application can provide an ID and VALUE and
persist the pair; if the ID already exists the current
value is overwritten, otherwise a new entry is
created..
Terminology compares in Oracle and Riak
key-value databases
1. Key-value stores are the simplest NoSQL data
stores to use from an API perspective.
2. The client can either get the value for the key,
put a value for a key, or delete a key from the
data store.
3. The value is a blob that the data store just
stores, without caring or knowing what’s inside.
4. it’s the responsibility of the application to
understand what was stored.
5. Since key-value stores always use primary-key
access, they generally have great performance
the popular key-value databases
Riak [Riak],
Redis (often referred to as Data Structure server)
[Redis],
Memcached DB and its flavors [Memcached],
Berkeley DB [Berkeley DB],
HamsterDB (especially suited for embedded use)
[HamsterDB],
Amazon DynamoDB [Amazon’s Dynamo] (not
open-source), and
Project Voldemort [Project Voldemort] (an open-
source implementation of Amazon DynamoDB).
keeping discussions focus on Riak

Riak store keys into buckets, which are just a way to


segment the keys(think of buckets as flat name
spaces for the keys).
If we wanted to store
•user session data,
•shopping cart information, and
•user preferences in Riak
we could just store all of them in the same bucket with
a single key and single value for all of these objects.
In this scenario, we would have a single object that
stores all the data and is put into a single bucket
The downside of storing all the different objects
(aggregates) in the single bucket, would be that one
bucket would store different types of aggregates,
increasing the chance of key conflicts.

An alternate approach would be


to append the name of the object to the key, such
as 288790b8a421_userProfile,
so that we can get to individual objects as they are
needed
Change the key design to segment the data in a
single bucket

We could also create buckets which store specific


data.
In Riak, they are known as domain buckets
allowing the serialization and de serialization to be
handled by the client driver.
Key-Value Store Features
Some of the features we will discuss for all the
NoSQL data stores are
consistency,
transactions,
query features,
structure of the data, and .
scaling
Consistency
Consistency is applicable only for
operations on a single key,
since these operations are either a get, put, or
delete on a single key.

Optimistic writes can be performed, but are very


expensive to implement, because a change in value
cannot be determined by the data store.
Riak has two ways of resolving conflicts
update conflicts: either the newest write wins and
older writes loose, or
both (all) values are returned allowing the client to
resolve the conflict.
In Riak, these options can be set up during the
bucket creation.
Buckets are just a way to namespace keys so that
key collisions can be reduced—
If we need data in every node to be consistent,
we can increase the Number Of Nodes To Respond
To Write set by w to be the same as n Val.

Of course doing that will decrease the write


performance of the cluster.
To improve on write or read conflicts, we can
change the allow Siblings flag during bucket
creation:
If it is set to false, we let the last write to win and
not create siblings.
Transactions
Different products of the key-value store kind have
different specifications of transactions.

Many data stores do implement transactions in


different ways.

Riak uses the concept of quorum (“Quorums,” p.


57) implemented by using the W value —
replication factor—during the write API call.
Assume we have a Riak cluster with a replication
factor of 5 and we supply the W value of 3.
When writing,
“the write is reported as successful only when
it is written and reported as a success on at least
three of the nodes”.

This allows Riak to have write tolerance;


In our example, with N equal to 5 and with a W
value of 3, the cluster can tolerate N - W = 2
nodes being down for write operations, though we
would still have lost some data on those nodes for
Query Features
All key-value stores can query by the key—and
that’s about it.
If you have requirements to query by using some
attribute of the value column, it’s not possible to
use the database:
Your application needs to read the value to figure
out if the attribute meets the conditions.
Query by key also has an interesting side effect.
Some key-value databases get around this by
providing the ability to search inside the value, such
as Riak Search that allows you to query the data
just like you would query it using Lucene indexes.
While using key-value stores,
lots of thought has to be given to the design of the
key.

Can the key be generated using some algorithm?


 Can the key be provided by the user (user ID,
email, etc.)? Or
derived from timestamps or
other data that can be derived outside of the
database?
These query characteristics make
key-value stores likely candidates for storing session
data (with the session ID as the key),
shopping cart data,
user profiles, and so on.
The expiry_secs property can be used to expire keys
after a certain time interval, especially for
session/shopping cart objects.
Structure of Data
Key-value databases don’t care what is stored in the
value part of the key-value pair.
The value can be a blob, text, JSON, XML, and so
on.
In Riak, we can use the Content-Type in the POST
request to specify the data type.
Scaling
Many key-value stores scale by using sharding
(“Sharding,” p. 38).
With sharding, the value of the key determines on
which node the key is stored.
Let’s assume we are sharding by
the first character of the key; if the key is
f4b19d79587d, which starts with an f, it will be sent
to different node than the key ad9c7a396542.
This kind of sharding setup can increase
performance as more nodes are added to the
cluster.
Sharding also introduces some problems.
If the node used to store f goes down,
the data stored on that node becomes
unavailable, nor can new data be written
Data stores such as Riak allow you to control the
aspects of the CAP Theorem (“The CAP Theorem,”
p. 53):
N (number of nodes to store the key-value replicas),
R (number of nodes that have to have the data
being fetched before the read is considered
successful), and W (the number of nodes the write
has to be written to before it is considered
successful).
Let’s assume
we have a 5-node Riak cluster.
Setting N to 3 means that all data is replicated to at
least three nodes,
setting R to 2 means any two nodes must reply to a
GET request for it to be considered successful, and
setting W to 2 ensures that the PUT request is
written to two nodes before the write is considered
successful.
These settings allow us to fine-tune node failures
for read or write operations.
Based on our need, we can change these values for
better read availability or write availability.
Generally speaking choose a W value to match
your consistency needs; these values can be set as
defaults during bucket creation.

You might also like