Introduction to NoSQL :
NoSQL (Not Only SQL) is a non-relational database used to manage unstructured data. It is a
distributed database system designed to work in virtual environments, providing mechanisms
for data storage and retrieval with a focus on scalability, high performance, availability, and
agility.
It was developed in response to the need to store a large volume of user-related data. NoSQL
databases are designed to scale easily and to handle products and objects that need to be
frequently accessed, updated, and changed, keeping up with the needs of the modern
industry.
Limitations of Traditional Relational Databases:
Relational databases:
Are not designed to handle frequent changes or unstructured data.
Do not take advantage of cheap storage and processing power from commodity hardware.
Are less agile in handling big data and dynamic applications.
Key Features of NoSQL:
1. Not Only SQL – SQL and other query languages can be used.
2. Non-relational and schema-free – No fixed structure required.
3. No JOINs – Avoids complex join operations.
4. Distributed architecture – Runs on multiple processors/nodes.
5. Horizontally scalable – Add more machines instead of upgrading one.
6. Open-source options – Many available for free.
7. Easy data replication – For better performance and backup.
8. Simple API usage – Easy to implement.
9. Handles huge volumes of data – Efficient at big data processing.
10. Can be run on commodity hardware – Follows shared nothing concept.
Why NoSQL? :
A traditional database model is not suitable for all types of applications, especially those with:
Unstructured or unpredictable data
Need for easy scalability
Real-time processing
NoSQL fits this need perfectly because of its:
High performance
Flexible structure
Scalability
Capability to handle dynamic data
Although NoSQL may not provide full ACID (Atomicity, Consistency, Isolation, Durability)
properties, it guarantees BASE properties:
Basically Available
Soft State
Eventually Consistent
This is achieved through its distributed and fault-tolerant architecture.
CAP Theorem (Brewer’s Theorem) :
CAP Theorem says a distributed system cannot guarantee all three of the following at the
same time:
Consistency – All nodes show the same data at the same time.
Availability – Every request gets a response (success or failure).
Partition Tolerance – System continues working despite network failure.
NoSQL often compromises consistency in favor of availability and partition tolerance.
BASE Transactions (Opposite of ACID) :
BASE stands for:
Basically Available – System responds to every request, even if the data is not consistent.
Soft State – System state can change over time even without input (due to eventual
consistency).
Eventually Consistent – All changes will eventually reflect across all nodes, but not
immediately.
Characteristics of BASE:
Weak consistency (stale data is okay)
Focus on availability
Best effort system
Approximate answers are acceptable
Optimistic in design
Simpler and faster than ACID systems
BASE Case Scenarios:
If data is consistent and available with no partition, then data is replicated and available in
both servers (A and B).
If data is available and partitioned, then it's not consistent. Example: Server A has new
data, B has old.
If data is consistent and partitioned, then it may not be available (B is waiting for update
from A).
Examples of NoSQL Implementations :
There are around 150 NoSQL databases in the market. Some popular ones include:
Google BigTable
Apache Hadoop
MapReduce
SimpleDB
MemcacheDB
NoSQL Business drivers :
Today’s businesses need fast, scalable, and always-available data storage systems. Traditional
relational database systems (RDBMS), which work on a single CPU, often fail to keep up with
the increasing demands of data processing, speed, and variety of data. This is where NoSQL
databases come in.
Businesses today need to:
Handle large and variable amounts of data
Make quick decisions based on real-time data
Be flexible with changing data types and needs
NoSQL addresses these needs through four major business drivers:
1. Volume :
Organizations now generate huge volumes of data. RDBMS systems often fail due to
limitations in single CPU performance. When dealing with large datasets, distributed
processing using clusters of commodity (low-cost) machines becomes necessary.
This has led to the development of distributed systems like:
Apache Hadoop
HDFS
MapR
HBase
These systems break large data into smaller chunks and process them in parallel.
2. Velocity :
Velocity refers to the speed at which data is generated and processed.
For example:
E-commerce websites handle thousands of reads and writes per second.
During sales or discounts, traffic spikes slow down RDBMS systems due to multiple
indexes.
NoSQL systems handle these high-speed real-time operations efficiently and ensure low
response time, even during heavy traffic.
3. Variability :
Data often comes in different formats and structures. In RDBMS, changing the schema (table
design) for new data fields is difficult and can affect the entire system.
Example: If you want to store a special field for a few customers, you need to change the
entire table schema. This creates a sparse matrix (empty fields for others) and affects
performance.
NoSQL systems offer schema-less models, allowing storage of different kinds of data without
any rigid structure.
4. Agility :
Handling complex queries in RDBMS requires multiple nested queries and object-relational
mapping layers (ORM) using frameworks like Hibernate or Java. This slows down development
and updates.
NoSQL simplifies this by:
Supporting easy data retrieval
Reducing the need for complex SQL queries
Adapting quickly to changes in business requirements
Key Business Features of NoSQL :
1. 24x7 Availability
No single point of failure
Data and functions are replicated across multiple nodes
Even if a node fails, others continue operations without data loss
Dynamic updates can be made without downtime
2. Location Transparency
Read/write data from any location without knowing the physical location of the node
Data is synchronized across regions
Ensures fast local access and global availability
3. Schema-less Data Model
Accepts structured, semi-structured, and unstructured data
Handles large volumes of data efficiently
Suitable for flexible and unpredictable data patterns
Delivers fast performance for both read and write operations
4. Modern Transaction Analysis
NoSQL does not require strict ACID transactions
Uses CAP theorem for consistency: data can be immediately or eventually consistent
across nodes
Suitable for customer reviews, branding, strategy planning, etc., where JOINs and foreign
keys are unnecessary
5. Architecture for Big Data
NoSQL databases support modern architectures by offering:
Scalability
Data distribution
Continuous availability
Support for multi-data centers
Big data architecture includes:
Huge data source handling (terabytes to petabytes)
Real-time data streaming instead of batch processing
Storage using Hadoop, MongoDB, Cassandra, Neo4j, etc.
Support for various compute methods (MapReduce, streaming, batch)
6. Analytics and Business Intelligence
NoSQL enables real-time data mining and analytics
Helps in quick decision-making
Extracts valuable insights from high-volume, complex datasets
Provides integrated analytics that traditional RDBMS struggle to offer
NoSQL Data architectural patterns :
NoSQL databases are designed for flexibility, scalability, and high performance. Based on the
data structure they use, there are four main types of NoSQL data stores:
Types of NoSQL Data Stores:
Key-Value Store
Column Store
Document Store
Graph Store
1. Key-Value Store
A key-value store stores data as a pair of key and value, just like a dictionary.
The key is unique and is used to find the value.
The value can be in formats like String, JSON, or Binary (BLOB).
It is schema-less, meaning no fixed structure is required.
How it works:
Internally uses a hash table to store data.
Keys can be system-generated or custom.
Buckets group keys logically (not physically), so same key names can exist in different
buckets.
The real key is a combination of bucket + key.
Basic Operations (APIs):
Operation Description
Get(key) Retrieves value using the key
Put(key, value) Stores or updates value with the key
Multi-Get(key1, key2...) Retrieves multiple values
Delete(key) Deletes the value for the key
Rules:
1. Distinct Keys: All keys must be unique.
2. No Queries on Values: You cannot search within values
Weaknesses:
No consistency: Cannot update part of the value.
No querying: Cannot search based on value.
As data grows, performance can become difficult to manage
Use Cases:
Caching
Session storage
Image stores
Dictionaries (word-definition pairs)
2. Column Store / Wide Column Store
Stores data in columns instead of rows. It is good for storing large and sparse datasets.
Key Concepts:
A row key and column name together identify the cell.
Data is grouped in Column Families, which are like categories of related columns
Each cell stores data with a timestamp for versioning.
Very fast for reading data from specific columns.
Structure Format:
<Row Key, Column Family, Column Name, Timestamp> : Value
How it differs from Key-Value:
Supports grouping of columns.
Allows fast reading of selected columns.
Used in analytical systems (OLAP).
Cassandra Data Model Highlights:
Keyspace: Like a database for one application.
Column Family: Stores data related to a specific topic.
Row Key: Unique identifier for each row.
Columns can be added dynamically.
Use Cases:
Analytics
Time-series data
IoT (Internet of Things) systems
Social media posts
3. Document Store
A document store is like a smart key-value store, where the value is a document (usually in
JSON or XML format).
Features:
Documents are semi-structured and self-describing.
Each document has a unique key (ID).
All properties inside the document are indexed for fast search.
Can store nested data (tree structure) directly.
How it works:
1. You can search by any field inside the document.
2. Uses Document Path to access specific nested values.
Example Path: Employee[id='2300']/Address/street/BuildingName
Advantages Over Key-Value Store:
Allows searching inside documents.
Supports complex data and hierarchies.
Supports queries on values.
Use Cases:
Content management systems
User profiles
Ad services (MongoDB sends real-time ads to millions)
Real-time analytics
4. Graph Store
A graph store uses nodes and relationships to represent and store data.
It is based on graph theory.
Structure:
Nodes: Entities (e.g., person, product)
Relationships: Connections between nodes (e.g., follows, friend)
Properties: Data stored inside nodes or relationships (key-value pairs)
Key Benefits:
Great for storing and exploring complex relationships.
No need for complex joins like in RDBMS.
Fast traversal between connected nodes.
Use Cases:
Social networks (Facebook, LinkedIn)
Recommendation systems
Fraud detection
Video platforms (YouTube, Flickr)
Variations of NoSQL architectural patterns :
A NoSQL architectural pattern refers to how a NoSQL database system is structured or
designed to store, manage, and retrieve data efficiently — especially for big data, distributed
systems, and real-time applications.
NoSQL databases are schema-less, distributed, and horizontally scalable. But based on system
needs, the architectural design can vary. Let’s explore those variations:
Major NoSQL Data Models (Core Patterns):
1. Key-Value Store
Stores data as a pair: Key → Value
Example: Redis, Riak
Variation:
Can be distributed across multiple servers for scalability.
Federated architecture allows multiple independent key-value databases to work
together.
2. Document Store
Stores semi-structured data like JSON, XML.
Example: MongoDB, CouchDB
Variation:
Can be used in IoT systems, where sensors push data into JSON-like documents.
Data can be temporarily stored or permanently archived.
3. Column Family Store
Stores data in columns instead of rows.
Example: Apache Cassandra, HBase
Variation:
Hash table + content-addressable network to improve distribution and data
lookup.
Scalable distributed architecture using shared-nothing design and load balancers.
4. Graph Store
Stores entities as nodes and relationships as edges.
Example: Neo4j, Amazon Neptune
Variation:
Often used in social networks or enterprise collaboration platforms.
Architectural Variations Based on Implementation Style:
1. Distributed Architecture
Data is split and stored on multiple servers at different locations.
Benefits: High availability, fault tolerance, scalability.
Used in:
Global-scale apps (Netflix, Facebook)
Content delivery platforms
2. Federated Architecture
Manages independent and heterogeneous databases across various sites.
Each database is autonomous but can work together as one logical system.
Used in:
Healthcare systems
Academic research platforms
IoT-Centric NoSQL Architecture
With the rise of Internet of Things (IoT):
Data from multiple sensors needs to be processed as a single stream.
Middleware (software between database and app) helps:
Integrate streams
Temporarily store or archive data
Enable real-time querying
Example:
Using a document store to store sensor readings as JSON
Using Pub/Sub model (EventJava) for live updates
Scalable and Flexible NoSQL Patterns:
System Requirement-Based Variations:
Using NoSQL to manage Big data:
Why NoSQL for Big Data?
NoSQL databases are built to manage:
Massive data volumes
High-speed read/write operations
Flexible and changing data structures
Real-time processing needs
They achieve this by:
Distributing data using hash rings
Replicating data for faster reads
Spreading queries across multiple DataNodes
Use Cases of NoSQL in Big Data
1. Recommendation Systems
Used in:
Movies, music, books
E-commerce product suggestions
Learning platforms and travel sites
NoSQL helps create personalized user experiences by managing large and fast-changing user
data in real-time. It supports flexible data models, making it ideal for building and updating
user profiles on the fly.
2. User Profile Management
NoSQL easily handles:
User preferences
Authentication
Online transactions
As user numbers grow, so does the complexity of data. NoSQL supports flexible schema and
quick read/write operations, making it perfect for managing evolving user profiles.
3. Real-Time Data Handling
For businesses needing instant insights from live transactions:
Hadoop is good for batch analytics but not real-time
NoSQL is built for real-time operational data
A combination of NoSQL and Hadoop helps manage both real-time and analytical needs
4. Content Management
Includes managing:
Images, audio, video
Reviews, ratings, and comments
Relational databases struggle with unstructured data. NoSQL supports semi-structured and
unstructured data using a flexible schema, making it ideal for content-heavy applications.
5. Catalog Management
Large companies handle product/service catalogs with many categories and updates. NoSQL:
Allows easy data aggregation
Supports multiple applications accessing the same database
Simplifies management with its schema-less model
6. 360-Degree Customer View
Enterprises often need to combine:
Structured data (from apps)
Unstructured data (from social media)
NoSQL helps by:
Storing diverse data types
Supporting APIs for integration
Allowing attribute updates without affecting other apps
This ensures a unified customer experience and better service delivery.
7. Internet of Things (IoT)
IoT generates large amounts of fast, varied data. NoSQL:
Handles volume, variety, and velocity
Supports real-time global access
Enables agile scaling for millions of connected devices
8. Fraud Detection
Financial services must detect fraud in milliseconds when processing transactions. NoSQL:
Supports low-latency data access
Integrates in-memory cache for real-time checks
Enables fast decision-making for fraud prevention
Understanding Types of Big Data Problems
1. Read-Heavy vs Read-Write Data:
Read-Mostly Data: Images, event logs, sensor data – rarely changes.
Read-Write Data: Transactions that require ACID properties (Atomicity, Consistency,
Isolation, Durability) and high availability.
2. Log File Management
Logs record all events like clicks, errors, or transactions. Earlier, storing these was expensive.
Now, NoSQL allows:
Low-cost storage
Easy access and analysis
Analyzing Big Data with Shared-Nothing Architecture
Architectural Models:
Shared RAM: CPUs share memory – good for graph stores.
Shared Disk: CPUs have their own memory but share disk – used in some enterprise
systems.
Shared-Nothing: No shared resources – best for scalability using commodity hardware.
Cache Friendliness:
Key-value and document stores are cache-friendly.
Graph and row stores are not – they can’t be easily referenced using a short key.
Choosing Data Distribution Models
NoSQL makes data distribution easier by focusing on aggregates. Two key techniques:
1. Sharding
Splits the database horizontally (row-wise)
Each partition is called a shard
Used to improve scalability and performance
2. Replication
Duplicates data across different servers
Increases data availability and fault tolerance
Some systems like Riak use both sharding and replication for optimal performance.
Introduction to MongoDB:
MongoDB is a NoSQL database designed to handle large amounts of unstructured or semi-
structured data. Unlike traditional databases that store data in rows and columns (tables),
MongoDB stores data in a document format using JSON-like structures, making it highly
flexible and scalable. In simple words: MongoDB is a database that stores information in a
format similar to JSON, allowing you to store complex data types easily and quickly without
worrying about strict table structures.
MongoDB is
1. Cross-platform
2. Open source
3. Non-relational
4. Distributed
5. NoSQl
6. Document-oriented data store
Terms used:
1. Database
A database in MongoDB is a container that holds multiple collections.
Each database gets its own set of files and is stored separately on the server.
Equivalent to a database in RDBMS.
One MongoDB server can host multiple databases.
Example: A database named collegeDB may contain collections like students, teachers, results.
2. Collection
A collection is like a table in a relational database, but more flexible.
It holds multiple documents (records).
Documents in the same collection can have different structures (unlike SQL tables which
have fixed columns).
Example: A students collection may have:
3. Document
A document is the basic unit of data in MongoDB.
It’s a JSON-like structure (called BSON – Binary JSON).
Can include various data types: strings, numbers, arrays, even other documents.
Each document has a unique _id field (like a primary key).
Example Document:
4. Support for Dynamic Queries
MongoDB provides support for dynamic queries using a rich query language based on JSON.
You can query using field values, ranges, conditions, pattern matching, etc.
No need for strict joins or SQL syntax.
Example:
This returns all students whose age is greater than 20. This feature is extremely useful in big
data environments, where the structure of data might change frequently.
5. Storing Binary Data
MongoDB allows storing binary data (files, images, videos) using a feature called GridFS.
Useful when you need to store large files greater than 16MB.
MongoDB splits the files into smaller chunks and stores them across multiple documents.
Used in:
Media storage systems
Backup systems
Document storage applications
6. Replication
Replication is the process of duplicating data across multiple servers.
MongoDB uses Replica Sets to handle replication.
A Replica Set has:
One primary node (accepts all writes)
One or more secondary nodes (maintain copies)
If the primary fails, a secondary is promoted to primary → Ensures high availability.
Helps in:
Data recovery
System fault-tolerance
Load balancing of read operations
7. Sharding
Sharding is MongoDB’s method of horizontal scaling.
It splits large datasets across multiple machines (shards).
Each shard holds a portion of the data.
A query router handles requests and forwards them to the correct shard.
Sharding is essential in big data applications to:
Handle huge volumes of data
Distribute load
Maintain performance
Example: E-commerce site using sharding to manage millions of user transactions and product
records.
8. Uploading Information in Place
This refers to MongoDB’s ability to:
Update documents directly in the database.
You don’t have to retrieve the entire document, modify, and then reinsert.
This updates only the age field in Priya’s document—in place, without touching other fields.
Useful in real-time apps where data changes frequently (e.g., live dashboards, IoT feeds).
Datatypes in MongoDB:
1. Double:
Used to store floating point (decimal) numbers.
Example: {"price": 99.99}
2. Integer (Int32 and Int64):
Used to store whole numbers.
MongoDB automatically chooses between:
Int32: For 32-bit integers.
Int64: For 64-bit integers (useful in large number operations).
Example: {"quantity": 100}
3. String:
Used to store textual data.
It must be UTF-8 encoded.
Most common data type used.
Example: {"name": "Laptop"}
4. Boolean:
Stores true/false values.
Useful for flags like status, enabled, etc.
Example: {"inStock": true}
5. Date:
Stores current date or any date/time value.
Stored as number of milliseconds since Unix epoch (Jan 1, 1970).
Supports date queries.
Example: {"createdAt": new Date()}
6. ObjectId:
A 12-byte unique identifier for every document in a collection.
Automatically generated by MongoDB.
Example: {"_id": ObjectId("507f1f77bcf86cd799439011")}
7. Document (Embedded Document)
Stores nested objects (documents inside documents).
Allows complex structures.
Example:
8. Array:
Stores multiple values in a single key.
Can hold multiple types (strings, numbers, documents).
Example: {"hobbies": ["reading", "coding", "music"]}
9. Binary Data:
Used to store binary files like images, PDFs, audio files.
Stored in BSON format.
Example: GridFS uses this to store large files.
10. Null:
Stores null value (i.e., no value exists).
Example: {"middleName": null}
11. Code
Used to store JavaScript code in documents.
Useful for running code logic directly in MongoDB (though used rarely).
Example:
12. Regular Expression
Used to store regex patterns for string matching.
Example: {"username": /john/i}
MongoDB Query language: {BOOK}