DB - Sharding
Now what on Earth is Database Sharding ? 🧐
In simple terms DB sharding means splitting our dataset into multiple smaller
components
Lets suppose you started a bakery 🍰 in a busy city and suddenly it started
gaining popularity and and suddenly there is an excessive demand for your cakes
and of course you are getting paid very well as well 💰
So you hire a few employees and if you aren't able to handle costumers you open
different branches of your bakery in different area of city. This is exactly what
database shrading is.
In case of very huge dataset residing in a DB we split it up into multiple Databases
🪓 In other words, Sharding is a method of splitting and storing a single logical
dataset in multiple databases
And those distributed multiple databases are collectively referred as your main DB
🔁
DB - Sharding 1
Types Of Sharding 🔁
There are two types of sharding 👇
Vertical Sharding
Horizontal Sharding
These are similar to horizontal and vertical scaling
Horizontal Sharding:
This one is pretty easy, in this case records are distributed across DB
Vertical Sharding:
Vertical sharding involves splitting your data by column rather than rows. This is
less commonly used but can be powerful in certain situations.
DB - Sharding 2
Well, Well everything comes with a price so lets discuss some the pros and cons
of sharding ⚖️
Advantages ✅
⬇️
1. No Single Point of Failure: This reduces the risk of single point of failure if
one of part of DB goes down on a part of application is down as other DBs are
working
2. Faster queries response: Shraded DBs perform faster lookup as they hv less
data to look into
3. Scalability: Shrading felicitates horizontal scaling 🔼
4. Reduces Load on DB: The main advantage of shrading is that it reduces stress
📶 from the DB as the data is distributed and not stored in a single DB
Dis-Advantages 🚫
1. Very Complicated: It should be noted that DB sharding is extremely
complicated architecture and should be last resort ⌛
2. Emerging Hot-spots: If data isn't evenly distributed then the load on the Db
with more data might be under load and becomes unbalanced, which is also
known as database hotspot. In this case any benefits of sharding the database
is cancelled out
DB - Sharding 3
3. Un-Sharding: Once you've sharded your data it is almost impossible to revert
back 🔙 (in case of large DB) so as already said this should be your choice
4. No Native Support: Most of the traditional DBs that are used in production do
not have support sharding by themselves 💀 so we have to implement the
complex architecture by yourself
Okay, So this seems to be a deadly thing ⚰️
Here are few thing that we must try out before thinking about shrading
Possible Alternative To shrading 〽️
1. Database Indexing 📇
A simple and easy try-out to optimise DB would be creating and Index table for
DB. This should be first try out for optimising Database queries
2. External Caching
Most of the DBs have built in internal caching but you can implement an extra
layer of caching using something like redis
These dedicated servers are used only for caching and sit in front of the
database. They store the most frequently accessed data and if possible will
handle the response. In the case they don't have the requested data stored, the
request will be forwarded to the DB. 🧑🌾
3. Read Replicas ©️
Read replicas are clone of DBs that receives only read request. If any modification
occur in data it happens in the main DB and then are reflected in the replicated
DBs
DB - Sharding 4
But the trade-off over here is they stall data, what is some data is updated in the
main DB but haven't be updated in the replicated DBs. This may fetch older
version if data 📂
4. Vertical Scaling 🚦
Yes, Yes, Yes i know this is a very lame thing to tell but still try for once if you
haven't 😆
Okay!, Now assume that you have no other option left other than shard you DB.
Here are the few types of sharding you should be aware of.
Suppose there is a new entry in our database, now how do we decide which shard
of Db it should go to. Here different sharding strategies come into play
Types of Sharding Strategies 🥢
1. Key Based Sharding:
Key based sharding decides the position of new write through an particular value
taken-up from a new record.
Let us assume that there's a new user entry and we have used Key Based
sharding in our DB. In this case a primary attribute of our record such as User ID in
this case is passed through an hash function and that function will decide in which
shard the new entry should go to.
We can also use IP address of user as Key if we want shard user based on their
geographical location.
DB - Sharding 5
2. Range Based Sharding:
This strategy involves clustering of records based primary key. Consider the
above example we want to store all the user whose ID lies in between 1 and 100 in
DB1 and from 101 to 200 in DB2.
Or we want to group them based on their DOB those who are bon in Jan goes to
DB1 and who are born in Feb goes in DB2.
Range based sharding does not protect data from being unevenly distributed,
leading to the database hotspots ♨️
2. Directory Based Sharding:
This is an dynamic sharding technique. Directory based sharding is flexible as
compared to range-based sharding and key-based sharding. Range-based
sharding limits you to specifying range of values, while key-based sharding limits
you to using fixed hash-based function ⛽
With directory based sharding there is a need to connect to lookup table before
every read or write which will degrade the application performance. Lookup table
can also become a single point of failure🐉
There are may sharding strategies other than these and some are even a
combination of these 3 but these three are the building basic blocks of sharding.
you can explore further 🚀
🤓
DB - Sharding 6
Happy Sharding 🤓
DB - Sharding 7