Franck Pachot for MongoDB

Posted on Jul 23 • Edited on Jul 29

Lock-Free Wait-on-Conflict and Fail-on-Conflict in MongoDB

#mongodb #database #transaction #consistency

Concurrency control in databases has two main modes for handling conflicts. Traditionally, these are labeled "pessimistic" and "optimistic," but those names can be misleading. Let’s define them by what actually happens from a user point of view:

Wait-on-Conflict: the operation blocks until the conflict is resolved, typically waiting for a concurrent transaction to either commit or abort and release its locks. For most Online Transaction Processing (OLTP) workloads, this process is usually quick, unless a long-running transaction is blocking progress or there is a deadlock in which two transactions are waiting on each other.
Fail-on-Conflict: Instead of waiting, the operation immediately raises an error to the application. Now it’s up to the application to decide: retry after a short pause (hoping the conflict is gone), or just propagate the error. The application may also have done some non-transactional operations that need to be canceled or compensated.

In SQL databases, where multi-statement transactions are the norm because of normalization to multiple tables, you’ll see this as an explicit WAIT or NOWAIT on locking reads (LOCK, SELECT FOR UPDATE), or a consequence of the isolation level. READ COMMITTED tries to wait and detect deadlocks, SERIALIZABLE can fail with a serialization error.

MongoDB’s document-oriented approach is straightforward, providing developers the freedom to choose what best fits their needs.

Single-document updates without an explicit transaction will wait on conflict, until the operation timeout. The document model encourages encapsulation of a business transaction within one document, keeping concurrency simple and fast.
Multi-document operations within an explicit transaction, however, fail on conflict. Here, MongoDB expects your application to do the retry logic and error handling: since you started the transaction, you're responsible for finishing it.

The need to implement retry logic in the application is not a limitation of MongoDB. A database cannot transparently cancel a user-defined transaction and retry it because it lacks knowledge of the application's prior actions, like sending emails or writing to files. Only the application can cancel or compensate for actions taken before retrying the transaction.
In contrast to multi-statement transactions, single-statement transactions or auto-commit commands can be canceled and restarted before control is returned to the application. That's how MongoDB provides a lock-free wait-on-conflict behavior for single-document updates: it is internally a fail-on-conflict optimistic concurrency control (OCC), to avoid locks, but with internal retries to appear as a wait-on-conflict to the user.

Demonstration

Here’s what this looks like in practice. I start with a one-document collection:

AtlasLocalDev atlas [direct: primary] test> 
db.demo.drop();

true

AtlasLocalDev atlas [direct: primary] test> 
db.demo.insertOne({ _id: 'x' , value: 0 });

{ acknowledged: true, insertedId: 'x' }

I start a first transaction, in session A, that updates the document, but do not commit it yet:


//first transaction A updates one doc

AtlasLocalDev atlas [direct: primary] test> 
sessionA = db.getMongo().startSession();

{ id: UUID('ac8f6741-4ebb-4145-acb2-6b861a390c25') }

AtlasLocalDev atlas [direct: primary] test> sessionA.startTransaction();

AtlasLocalDev atlas [direct: primary] test> 
dbA = sessionA.getDatabase(db.getName());

test

AtlasLocalDev atlas [direct: primary] test> 
dbA.demo.updateOne({ _id: 'x' }, { $inc: { value: 2 } });

{
  acknowledged: true,
  insertedId: null,
  matchedCount: 1,
  modifiedCount: 1,
  upsertedCount: 0
}

Explicit transaction

While this transaction is ongoing, I start another one in session B and try to update the same document:


// second transaction B tries to update the same doc

AtlasLocalDev atlas [direct: primary] test> 
sessionB = db.getMongo().startSession();

{ id: UUID('a5302cab-9688-43db-badd-e3691b30a15b') }

AtlasLocalDev atlas [direct: primary] test> sessionB.startTransaction();

AtlasLocalDev atlas [direct: primary] test> 
dbB = sessionB.getDatabase(db.getName());

test

AtlasLocalDev atlas [direct: primary] test> 
Date();

Wed Jul 23 2025 10:59:37 GMT+0000 (Greenwich Mean Time)

AtlasLocalDev atlas [direct: primary] test> 
dbB.demo.updateOne({ _id: 'x' }, { $inc: { value: 4 } });

MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction.

AtlasLocalDev atlas [direct: primary] test> 
Date();

Wed Jul 23 2025 10:59:40 GMT+0000 (Greenwich Mean Time)

AtlasLocalDev atlas [direct: primary] test> sessionB.abortTransaction();

The conflict was immediately detected and the application gets a WriteConflict error and can retry it.

Single-document

This time, I'll retry without starting an explicit transaction:


// do the same without starting a transaction

AtlasLocalDev atlas [direct: primary] test> 
Date();

Wed Jul 23 2025 10:59:49 GMT+0000 (Greenwich Mean Time)

AtlasLocalDev atlas [direct: primary] test> 
dbB.demo.updateOne({ _id: 'x' }, { $inc: { value: 8 } });

The session blocks, it is waiting until the document is free. I leave the transaction in session A to show what happens with long transactions.

After one minute, the update is successful:

{
  acknowledged: true,
  insertedId: null,
  matchedCount: 1,
  modifiedCount: 1,
  upsertedCount: 0
}

AtlasLocalDev atlas [direct: primary] test> 
Date();

Wed Jul 23 2025 11:00:48 GMT+0000 (Greenwich Mean Time)

My two transactions are not serializable, as they update the same document at the same time, so if one was successful, the other must abort. This is what happened, the transaction in session A was automatically canceled after a one minute timeout. I can see that if I try to commit:

AtlasLocalDev atlas [direct: primary] test> 
sessionA.commitTransaction();

MongoServerError[NoSuchTransaction]: Transaction with { txnNumber: 2 } has been aborted.

AtlasLocalDev atlas [direct: primary] test> 
sessionA.endSession();

Finally, the result is consistent, with the changes committed by session B:

AtlasLocalDev atlas [direct: primary] test> 
db.demo.find();

[ { _id: 'x', value: 8 } ]

Internals

I looked at the call stack while it was waiting and here is the Flame Graph:

It shows the update attempts. updateWithDamages is not as terrible as it sounds, and is just an incremental update that logs the changes ("damages") rather than re-writing the whole document. logWriteConflictAndBackoff is the internal retry. Each write conflict in PlanExecutorExpress, increment an attempt counter and calls logWriteConflictAndBackoff → logAndBackoff → logAndBackoffImpl with the number of attempts. The code shows that the wait depends on it:

For the first 4 attempts: No sleep, immediate retry.
For attempts 4–9: Sleep 1 ms each retry.
For attempts 10–99: Sleep 5 ms each retry.
For attempts 100–199: Sleep 10 ms each retry.
For 200 and greater: Sleep 100 ms each retry.

This mitigates resource contention by increasingly slowing down repeated conflicts with an exponential-to-linear backoff while still getting a chance to get the conflict resolved. In one minute, we can estimate that it has performed approximately 4 + 6 + 90 + 100 + 585 = 785 write conflict retries under this backoff schedule, spending most of that time in the 100 ms sleep interval.

Internally, WiredTiger enforces snapshot isolation at the storage layer. When a MongoDB operation tries to modify a document, WiredTiger checks that the current snapshot timestamp still matches what’s on disk. If another operation has modified the document since that snapshot was taken, WiredTiger detects that the update’s view is stale. Instead of acquiring a long-term lock, WiredTiger uses optimistic concurrency control and returns WT_ROLLBACK to MongoDB’s server layer to signal the conflict.
At this point, the MongoDB server recognizes the error, aborts the in-progress write attempt, and releases any snapshot or locks it has held on behalf of the operation. It then yields to allow other work, starts a new WiredTiger transaction with a fresh snapshot timestamp, and retries the update operation.
After a successful update (with a clean, current snapshot and no conflict), or if a timeout is reached, the process ends, and MongoDB will either commit the change or return an error to the application.

Mongostat

I have run the wait-on-conflict situation again:

AtlasLocalDev atlas [direct: primary] test> 
Date();

Wed Jul 23 2025 08:48:58 GMT+0000 (Greenwich Mean Time)

AtlasLocalDev atlas [direct: primary] test> 
dbB.demo.updateOne({ _id: 'x' }, { $inc: { value: 8 } });

{
  acknowledged: true,
  insertedId: null,
  matchedCount: 1,
  modifiedCount: 1,
  upsertedCount: 0
}

AtlasLocalDev atlas [direct: primary] test> 
Date();
Wed Jul 23 2025 08:50:18 GMT+0000 (Greenwich Mean Time)

During these 80 seconds, I displayed the writeConflicts metric with mongostat:

mongostat -o="insert,query,update,delete,metrics.operation.writeConflicts.rate(),time"

insert query update delete writeConflicts.rate()                time
    *0    *0     *0     *0                     0 Jul 23 08:49:01.172
    *0    *0     *0     *0                     0 Jul 23 08:49:02.173
    *0    *0     *0     *0                     0 Jul 23 08:49:03.172
    *0    *0     *0     *0                     0 Jul 23 08:49:04.172
    *0    *0     *0     *0                     0 Jul 23 08:49:05.172
...
    *0    *0     *0     *0                     0 Jul 23 08:50:15.172
    *0    *0     *0     *0                     0 Jul 23 08:50:16.171
    *0    *0     *0     *0                     0 Jul 23 08:50:17.172
    *0    *0     *0     *0                   981 Jul 23 08:50:18.172
    *0    *0     *0     *0                     0 Jul 23 08:50:19.172
    *0    *0     *0     *0                     0 Jul 23 08:50:20.172

It shows 981 write conflict retries during those 80 second. Early retries go fast, adding just a few milliseconds. As time goes on, the loop spends the vast majority of those 80 seconds in the late-stage sleep(100ms) phase, pushing the retry count close to elapsed_time / 100ms (plus early faster backoffs).

Here is the exact calculation:

First 4: 0ms
Attempts 4–9 (6): 6×1=6ms
Attempts 10–99 (90): 90×5=450ms
Attempts 100–199 (100): 100×10=1,000ms
Attempts 200–980 (781): 781×100=78,100ms The total time is 78 seconds and we have 981 write conflicts. The measure matches the algorithm.

Note that with mongostat you won't see individual retries until the moment the operation finally succeeds and increments the write conflict counter. If you see large spikes, it's time to look at data modeling patterns or long-running operations.

Profiling

Another way to look at it is with the MongoDB profiler:

db.setProfilingLevel(2);
Date();
dbB.demo.updateOne({ _id: 'x' }, { $inc: { value: 8 } });
Date();
db.setProfilingLevel(0);
db.system.profile.find().sort({ ts: -1 }).limit(1).pretty();

Here is the output:

[
  {
    op: 'update',
    ns: 'test.demo',
    command: {
      q: { _id: 'x' },
      u: { '$inc': { value: 8 } },
      multi: false,
      upsert: false
    },
    keysExamined: 1014,
    docsExamined: 1014,
    nMatched: 1,
    nModified: 1,
    nUpserted: 0,
    keysInserted: 0,
    keysDeleted: 0,
    writeConflicts: 1013,
    numYield: 1013,
    locks: {
      ReplicationStateTransition: { acquireCount: { w: Long('1015') } },
      Global: { acquireCount: { r: Long('1'), w: Long('1015') } },
      Database: { acquireCount: { w: Long('1015') } },
      Collection: { acquireCount: { w: Long('1015') } }
    },
    flowControl: { acquireCount: Long('1014') },
    readConcern: { level: 'local', provenance: 'implicitDefault' },
    storage: { data: { txnBytesDirty: Long('932') } },
    cpuNanos: 145071200,
    millis: 82965,
    planSummary: 'EXPRESS_IXSCAN { _id: 1 },EXPRESS_UPDATE',
    planningTimeMicros: 71,
    totalOplogSlotDurationMicros: 135,
    execStats: {
      isCached: false,
      stage: 'EXPRESS_UPDATE',
      keyPattern: '{ _id: 1 }',
      indexName: '_id_',
      keysExamined: 1014,
      docsExamined: 1014,
      nReturned: 0,
      nWouldModify: 1,
      nWouldUpsert: 0,
      nWouldDelete: 0
    },
    ts: ISODate('2025-07-23T17:18:48.082Z'),
    client: '172.17.0.1',
    appName: 'mongosh 2.5.0',
    allUsers: [],
    user: ''
  }
]

The single update of one document (Modified: 1) has been retried 1014 times (keysExamined: 1014, docsExamined: 1014) because the first 1013 were conflicting with the concurrent transaction (writeConflicts: 1013). It took in total an elapsed time of 83 seconds (millis: 82965) without consuming resources as each retry waited (numYield: 1013) so the CPU usage is minimal, less than 0.2% (cpuNanos: 145071200). The conflict resolution, with optimistic concurrency control, is lock-free while waiting. It only briefly acquires some write intent locks for each retry (acquireCount: { w: Long('1015') }).

The single update of one document (nModified: 1) was retried 1,014 times (keysExamined: 1014, docsExamined: 1014), as the first 1,013 attempts conflicted with a concurrent transaction (writeConflicts: 1013). The entire process took about 83 seconds (millis: 82965), but resource usage remained minimal: each retry yielded control (numYield: 1013), so CPU time used was less than 0.2% (cpuNanos: 145071200). This conflict resolution uses optimistic concurrency control and is lock-free while waiting. MongoDB only briefly acquires write intent locks for each retry (acquireCount: { w: Long('1015') }), never blocking other work for the whole time.

Summary

In MongoDB, single-document operations do not require explicit transactions and utilize a "wait-on-conflict" method, allowing for seamless, automatic conflict handling. This is achieved through optimistic concurrency control, where MongoDB retries updates on conflict with exponential backoff and operates in a lock-free manner while waiting. With a proper document model, single-document operations are the general case, as there's no need to normalize to multiple tables like in SQL databases.

In contrast, multi-document operations that involve explicit transactions adopt a “fail-on-conflict” strategy. When a conflict arises, MongoDB immediately returns an error, requiring your application to handle cancellation and retrying the transaction. This is necessary because some operations may not be visible to the database. Such transactions must be kept short, and are canceled if they take longer than one minute. The reason for this limitation is that,like other MVCC databases, lengthy transactions consume more resources to maintain a consistent snapshot of the database state at the start of the transaction.

DEV Community