0% found this document useful (0 votes)
110 views15 pages

How MongoDB Stores Data Internally

MongoDB stores data using a document-oriented model with a hierarchy of databases, collections, and documents, utilizing BSON for efficient storage. The default storage engine, WiredTiger, employs a B-Tree structure for data organization and includes features like journaling for durability and various compression techniques. Understanding these internals aids in schema design, performance tuning, and troubleshooting storage issues.

Uploaded by

biharmesudhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views15 pages

How MongoDB Stores Data Internally

MongoDB stores data using a document-oriented model with a hierarchy of databases, collections, and documents, utilizing BSON for efficient storage. The default storage engine, WiredTiger, employs a B-Tree structure for data organization and includes features like journaling for durability and various compression techniques. Understanding these internals aids in schema design, performance tuning, and troubleshooting storage issues.

Uploaded by

biharmesudhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

How MongoDB Stores

Data Internally

@ sanuj bansal
Introduction
When we use MongoDB, we usually think of simple JSON
documents.

But have you ever wondered how those documents are stored?

Understanding the internals helps with:


✅Schema design
✅Performance tuning
✅Index optimization
✅Troubleshooting storage issues

@ sanuj bansal
MongoDB’s High-Level
Data Structure
MongoDB uses a document-oriented model with the following
hierarchy:
Database: Top-level container
Collection: Groups of documents (like SQL tables)
Document: JSON-like data (stored as BSON)

MongoDB is schema-flexible: documents in the same collection


can have different fields.

@ sanuj bansal
What is BSON?
MongoDB doesn’t actually store documents as raw JSON.
Instead, it uses BSON (Binary JSON).

Why BSON?
Faster to encode/decode
Supports rich types like Date, ObjectId, Decimal128
More compact and optimized for machine parsing

Example Comparison:

@ sanuj bansal
Collections and
Namespace Files
Each collection is internally mapped to a namespace file.

For example:
myDatabase.users collection maps to a namespace like:
myDatabase.0
This file stores metadata and actual document locations.

Collections are stored in data files, indexed by namespace


files.

@ sanuj bansal
WiredTiger – The
Default Storage Engine
Since MongoDB 3.2, WiredTiger is the default storage
engine.

Main Functions of WiredTiger:


Write-Ahead Logging (WAL)
Compression
Caching and memory management
Document-level locking

Each collection has two main files:


.wt file for data
.wt file for each index

@ sanuj bansal
How WiredTiger Stores
Documents
WiredTiger uses a B-Tree data structure.
Each node contains sorted key-value pairs.
Internal nodes: Pointers to children
Leaf nodes: Contain actual data

Data is split into:


Pages (smallest unit of I/O)
Extents (group of pages)
Blocks (raw chunks written to disk)

@ sanuj bansal
Journaling – Ensuring
Durability
Before data is committed to the .wt file, it's first written to
the journal.

Why Journaling?
Ensures ACID durability
Allows recovery after crash
Uses write-ahead logging

Journals are flushed to disc every 100ms (configurable).

@ sanuj bansal
Write Path – Step by
Step
Here’s what happens when you insert a document:
1.Document is validated and converted to BSON
2.Stored in in-memory cache (WiredTiger’s cache)
3.Logged in the journal (WAL)
4.Indexes are updated
5.Eventually flushed to .wt files during checkpointing

Checkpointing writes stable snapshots of memory → disk →


safe storage.

@ sanuj bansal
Read Path – Step by
Step
When you query MongoDB:
1.Query is parsed & optimized
2.Index is used (if available)
3.Data is fetched from memory or disk
4.BSON is decoded → JSON
5.Returned to client

@ sanuj bansal
Indexes and Storage

Indexes are stored as separate B-trees in WiredTiger.


Common indexes:
Single Field
Compound Index
Multikey Index
Geospatial / Text

Each index lives in its own .wt file and is updated during
inserts/updates.
Index updates are part of the same journaling and
checkpoint process.

@ sanuj bansal
Compression
Techniques
MongoDB compresses data to reduce disk I/O.

Supported Compression Types:


Snappy (default): Fastest, good for general use
zlib: Higher compression, slower
zstd: Balanced performance and compression

Compression applies to both data and indexes.

@ sanuj bansal
Memory Management

MongoDB uses an internal cache managed by WiredTiger.


Acts like a RAM-based storage layer
Frequently accessed documents are cached
Cache size is configurable (defaults to ~50% of RAM)

Performance can degrade if working set > available


memory.

@ sanuj bansal
Data File Organization

Each database has its own folder inside the dbPath.

For WiredTiger:
collection-*.wt – Data files for collections
index-*.wt – Index files
WiredTiger.wt – Metadata and global state
journal/ – Write-ahead logs

You can see these files in your MongoDB data directory.

@ sanuj bansal
Follow For More
Such Content !

Sanuj Bansal
Senior Developer

You might also like