How MongoDB Stores
Data Internally
@ sanuj bansal
Introduction
When we use MongoDB, we usually think of simple JSON
documents.
But have you ever wondered how those documents are stored?
Understanding the internals helps with:
✅Schema design
✅Performance tuning
✅Index optimization
✅Troubleshooting storage issues
@ sanuj bansal
MongoDB’s High-Level
Data Structure
MongoDB uses a document-oriented model with the following
hierarchy:
Database: Top-level container
Collection: Groups of documents (like SQL tables)
Document: JSON-like data (stored as BSON)
MongoDB is schema-flexible: documents in the same collection
can have different fields.
@ sanuj bansal
What is BSON?
MongoDB doesn’t actually store documents as raw JSON.
Instead, it uses BSON (Binary JSON).
Why BSON?
Faster to encode/decode
Supports rich types like Date, ObjectId, Decimal128
More compact and optimized for machine parsing
Example Comparison:
@ sanuj bansal
Collections and
Namespace Files
Each collection is internally mapped to a namespace file.
For example:
myDatabase.users collection maps to a namespace like:
myDatabase.0
This file stores metadata and actual document locations.
Collections are stored in data files, indexed by namespace
files.
@ sanuj bansal
WiredTiger – The
Default Storage Engine
Since MongoDB 3.2, WiredTiger is the default storage
engine.
Main Functions of WiredTiger:
Write-Ahead Logging (WAL)
Compression
Caching and memory management
Document-level locking
Each collection has two main files:
.wt file for data
.wt file for each index
@ sanuj bansal
How WiredTiger Stores
Documents
WiredTiger uses a B-Tree data structure.
Each node contains sorted key-value pairs.
Internal nodes: Pointers to children
Leaf nodes: Contain actual data
Data is split into:
Pages (smallest unit of I/O)
Extents (group of pages)
Blocks (raw chunks written to disk)
@ sanuj bansal
Journaling – Ensuring
Durability
Before data is committed to the .wt file, it's first written to
the journal.
Why Journaling?
Ensures ACID durability
Allows recovery after crash
Uses write-ahead logging
Journals are flushed to disc every 100ms (configurable).
@ sanuj bansal
Write Path – Step by
Step
Here’s what happens when you insert a document:
1.Document is validated and converted to BSON
2.Stored in in-memory cache (WiredTiger’s cache)
3.Logged in the journal (WAL)
4.Indexes are updated
5.Eventually flushed to .wt files during checkpointing
Checkpointing writes stable snapshots of memory → disk →
safe storage.
@ sanuj bansal
Read Path – Step by
Step
When you query MongoDB:
1.Query is parsed & optimized
2.Index is used (if available)
3.Data is fetched from memory or disk
4.BSON is decoded → JSON
5.Returned to client
@ sanuj bansal
Indexes and Storage
Indexes are stored as separate B-trees in WiredTiger.
Common indexes:
Single Field
Compound Index
Multikey Index
Geospatial / Text
Each index lives in its own .wt file and is updated during
inserts/updates.
Index updates are part of the same journaling and
checkpoint process.
@ sanuj bansal
Compression
Techniques
MongoDB compresses data to reduce disk I/O.
Supported Compression Types:
Snappy (default): Fastest, good for general use
zlib: Higher compression, slower
zstd: Balanced performance and compression
Compression applies to both data and indexes.
@ sanuj bansal
Memory Management
MongoDB uses an internal cache managed by WiredTiger.
Acts like a RAM-based storage layer
Frequently accessed documents are cached
Cache size is configurable (defaults to ~50% of RAM)
Performance can degrade if working set > available
memory.
@ sanuj bansal
Data File Organization
Each database has its own folder inside the dbPath.
For WiredTiger:
collection-*.wt – Data files for collections
index-*.wt – Index files
WiredTiger.wt – Metadata and global state
journal/ – Write-ahead logs
You can see these files in your MongoDB data directory.
@ sanuj bansal
Follow For More
Such Content !
Sanuj Bansal
Senior Developer