MongoDB architecture
This section offers a summary of some of the aspects of a typical MongoDB deployment. Read on to learn more about cluster architecture, sharded clusters, and collections.
Clusters
The typical MongoDB configuration is a replica set, also referred to as a cluster: a group of mongod
instances that maintain the same dataset. mongod
is the primary MongoDB daemon. By default, a replica set is a three-node configuration. Each node contains a complete copy of your database, allowing for redundancy and high availability: if one node goes down, you can still access the other two.
Nodes are generally categorized as primary or secondary nodes. The primary node receives all write operations from the client. Write operations are recorded on the oplog, which is a collection that keeps a record of all operations performed on the primary node. The secondary nodes then replicate this log and perform the operations on their datasets to ensure that they match the data on the primary node. Read operations are performed on the primary node by default. However, users can specify that operations should be performed on different nodes, such as the nearest node or explicitly secondary nodes.
If a primary node becomes unresponsive or unavailable, a healthy secondary node is promoted to primary. The replica set holds an election to choose which of the secondaries becomes the new primary. Various factors affect elections, such as responsiveness, node priority, and freshness of the data.

Figure 1.1: Replicated cluster architecture
Sharded clusters
Sharded clusters are useful when you want to further divide your data into replicated partitions while maintaining efficiency. Sharding is most useful when your application requires large datasets and high-throughput operations. Each sharded cluster consists of shards, or replica sets, over which data is distributed at the collection level. They also include mongos
instances, which act as an interface between the client and server, routing a query to the appropriate shard or shards, as well as merging resultant data. The final aspect of a sharded cluster is a config server, which is a replica set that stores replication and configuration metadata for the cluster.
Starting in MongoDB 8.0, you can now unshard collections, as well as move unsharded collections between shards on sharded clusters. This offers you more flexibility in how you want to optimize your cluster’s performance.

Figure 1.2: Sharded cluster architecture
To learn more about replication and sharding and their enhancements in MongoDB 8.0, check out Chapter 2, MongoDB Architecture. Also see the MongoDB documentation for the most up-to-date information.
Collections
A collection is a grouping of MongoDB documents. It is the equivalent of a table in a relational database. In MongoDB, collections do not enforce a schema for the data that is inserted into them.

Figure 1.3: Documents in a collection
Collections are assigned a UUID, or universally unique identifier. The UUID is especially useful in a sharded cluster when identifying which chunks of data belong to a specific collection.
When executing a query, MongoDB must scan every document in a collection to return query results. However, you can create indexes on specific fields at the collection level to more efficiently traverse through the data. For example, if you have a collection full of employee data and you often need to look up employee information based on the employee ID, you can create a collection-level index on the ID field to improve query performance.
Indexes store a small portion of the collection’s data in a B-tree data structure, which allows searches in logarithmic time (O(logn)
). The index stores the value of a specific field or set of fields ordered by the value of the field, allowing for easier traversal through the data.