OPTIONS

Replica Set Data Synchronization

In order to maintain up-to-date copies of the shared data set, members of a replica set sync or replicate data from other members. MongoDB uses two forms of data synchronization: initial sync to populate new members with the full data set, and replication to apply ongoing changes to the entire data set.

Initial Sync

Initial sync copies all the data from one member of the replica set to another member. A member uses initial sync when the member has no data, such as when the member is new, or when the member has data but is missing a history of the set’s replication.

When you perform an initial sync, MongoDB does the following:

  1. Clones all databases. To clone, the mongod queries every collection in each source database and inserts all data into its own copies of these collections.

  2. Applies all changes to the data set. Using the oplog from the source, the mongod updates its data set to reflect the current state of the replica set.

  3. Builds all indexes on all collections.

    When the mongod finishes building all index builds, the member can transition to a normal state, i.e. secondary.

To perform an initial sync, see Resync a Member of a Replica Set.

Replication

Replica set members replicate data continuously after the initial sync. This process keeps the members up to date with all changes to the replica set’s data. In most cases, secondaries synchronize from the primary. Secondaries may automatically change their sync targets if needed based on changes in the ping time and state of other members’ replication.

For a member to sync from another, the buildIndexes setting for both members must have the same value/ buildIndexes must be either true or false for both members.

Beginning in version 2.2, secondaries avoid syncing from delayed members and hidden members.

Validity and Durability

In a replica set, the set can have at most one primary and only the primary can accept write operations. [1] Secondaries apply operations from the primary asynchronously to provide eventual consistency.

Journaling provides single-instance write durability. Without journaling, if a MongoDB instance terminates ungracefully, you must assume that the database is in an invalid state.

In MongoDB, clients can see the results of writes before they are made durable:

  • Regardless of write concern, other clients can see the result of the write operations before the write operation is acknowledged to the issuing client.
  • Clients can read data which may be subsequently rolled back.

Multithreaded Replication

MongoDB applies write operations in batches using multiple threads to improve concurrency. MongoDB groups batches by namespace and applies operations using a group of threads, but always applies the write operations to a namespace in order.

While applying a batch, MongoDB blocks all reads. As a result, secondaries can never return data that reflects a state that never existed on the primary.

Pre-Fetching Indexes to Improve Replication Throughput

To help improve the performance of applying oplog entries, MongoDB fetches memory pages that hold affected data and indexes. This pre-fetch stage minimizes the amount of time MongoDB holds the write lock while applying oplog entries. By default, secondaries will pre-fetch all Indexes.

Optionally, you can disable all pre-fetching or only pre-fetch the index on the _id field. See the replIndexPrefetch setting for more information.

[1]In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with {w: majority} write concern. The node that can complete {w: majority} writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary.