Why B-Trees and B+ Trees are better than AVL Trees for Disk-Based Storage
1. Disk Access is Slow
o Reading from disk takes much longer than reading from memory.
o So, we want to minimize the number of disk reads.
2. AVL Trees Are Binary
o Each node has only 1 key and 2 children.
o For large data, the tree becomes tall, so we need many disk reads to find a key.
3. B-Trees/B+ Trees Are Multiway
o Each node can store many keys (like 50–100 keys per node).
o This makes the tree shorter, so we need fewer disk reads.
4. Block-Friendly
o Disks read data in blocks (e.g., 4 KB).
o B/B+ tree nodes fill a block completely, making disk usage efficient.
o AVL tree nodes waste most of the disk block.
5. Better for Range Queries
o B+ trees store all keys in leaf nodes linked in order, making it easy to get all
keys between two values.
o AVL trees need to jump around, causing many disk reads.
6. Less Rebalancing
o AVL trees need rotations often on insert/delete → more disk writes.
o B/B+ trees allow some flexibility → fewer writes.
1. AVL Tree on Disk
Each node has 1 key (or a few keys if optimized for memory, but still small).
Disk blocks are typically 4 KB — storing a single AVL node wastes most of the block.
Searching requires traversing many levels, each possibly causing a disk read.
Disk Block Layout for AVL Tree:
[ Block 1 ] -> Node 50
[ Block 2 ] -> Node 30
[ Block 3 ] -> Node 70
[ Block 4 ] -> Node 20
...
(Each node may require a separate disk access)
Search example: To find key 20:
Access root (Node 50) → 1 disk read
Go to left child (Node 30) → 1 disk read
Go to left child (Node 20) → 1 disk read
Total: 3 disk reads for 3 keys (not counting rebalancing writes)
2. B+ Tree on Disk
Each node stores many keys (e.g., 100–200 keys) fitting exactly in a disk block.
Internal nodes guide the search; leaf nodes store all keys linked sequentially.
Searching or range query touches very few blocks.
Disk Block Layout for B+ Tree:
[ Block 1 ] -> Internal Node: Keys 10, 50, 90 (pointers to children)
[ Block 2 ] -> Leaf Node: Keys 1–10, next leaf pointer
[ Block 3 ] -> Leaf Node: Keys 11–20, next leaf pointer
[ Block 4 ] -> Leaf Node: Keys 21–30, next leaf pointer
...
Search example: Find key 20:
- Access root internal node (Block 1) → 1 disk read
- Access leaf node (Block 3) → 1 disk read
- **Total: 2 disk reads** for 10 keys
Range query example (keys 11–30):
Access leaf Block 3 → keys 11–20
Follow leaf pointer → Block 4 → keys 21–30
Only 2 disk reads for 20 keys (very efficient)