Franck Pachot for MongoDB

Posted on Jul 21

Database latency with PostgreSQL and MongoDB: it's the data model that makes it fast

#mongodb #database #postgres #edge

A business transaction ideally involves a single roundtrip to the database. MongoDB allows a single document to hold all transaction data, simplifying sharding and scaling. In contrast, SQL normalization spreads data across multiple tables, necessitating multi-statement transactions and multiple roundtrips, unless using stored procedures, which are often less preferred due to language and deployment differences from application code. This results in higher latency when the application is closer to the user and farther from the database.

PostgreSQL transaction with pgbench
PostgreSQL single auto-commit call
MongoDB multi-document transaction
MongoDB single-document transaction
Summary: where is your business logic?

PostgreSQL transaction with pgbench

I start a lab with a PostgreSQL database server running in the background and a container for the client application:

docker rm -f app db
docker run --name db -d -e POSTGRES_PASSWORD=x postgres
docker run --link db:db --rm -it --privileged postgres bash

I add a 50 millisecond latency from the application container to simulate a deployement where the application is in an edge ___location:

apt-get update && apt-get install -y iproute2
tc qdisc add dev eth0 root netem delay 50ms

I initialize and run pgbench with the default workload TCPB-like displaying the statements:

pgbench -i postgres://postgres:x@db
pgbench -r postgres://postgres:x@db

Here is the output:

root@e18764bd2f77:/# pgbench -i postgres://postgres:x@db

dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...
creating primary keys...
done in 1.75 s (drop tables 0.05 s, create tables 0.21 s, client-side generate 0.94 s, vacuum 0.24 s, primary keys 0.31 s).

root@e18764bd2f77:/# pgbench -r postgres://postgres:x@db

pgbench (17.5 (Debian 17.5-1.pgdg120+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
number of failed transactions: 0 (0.000%)
latency average = 353.092 ms
initial connection time = 263.115 ms
tps = 2.832121 (without initial connection time)

statement latencies in milliseconds and failures:
         0.004           0  \set aid random(1, 100000 * :scale)
         0.001           0  \set bid random(1, 1 * :scale)
         0.001           0  \set tid random(1, 10 * :scale)
         0.001           0  \set delta random(-5000, 5000)
        50.226           0  BEGIN;
        50.470           0  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
        50.344           0  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
        50.350           0  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
        50.416           0  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
        50.292           0  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
        50.981           0  END;

This has run ten transactions from one client. With the artificial 50 ms latency, the connection has taken 263 ms and the transactions 352 ms on average. This is 7 times the network roundtrip latency. The reason is visible thanks to the -r option showing the per-statement response times: 7 statements have been run to start the transaction, run DML statements, and commit the transaction.

The default PgBench script is:

root@e18764bd2f77:/# pgbench --show-script=tpcb-like

-- tpcb-like: <builtin: TPC-B (sort of)>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

In PostgreSQL, BEGIN; is like START TRANSACTION; and initiates a transaction, disabling auto-commit. END; is equivalent to COMMIT; and commits the current transaction.

This method is inefficient due to multiple network roundtrips between the application and database, which increase latency. Each client-server roundtrip is a system call, which increases CPU usage due to context switching, and leaves the database backend in the worst state for performance and scalability: idle yet locking resources. All operations for a business transaction should ideally be sent in a single request, either as an anonymous block or stored procedure, to minimize roundtrips, reduce network overhead and leave a stateless connection.

PostgreSQL single auto-commit call

I can run the same in one block with a DO command in PostgreSQL:

DO '
 begin
  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
  PERFORM abalance FROM pgbench_accounts WHERE aid = :aid;
  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
 end;
';

Note that begin and end have different meanings here, marking the boundaries of a PL/pgSQL block. Without explicitly starting a transaction, the call operates in autocommit mode. One transaction per call is the default in PostgreSQL.

I changed SELECT to PERFORM because such call cannot return the result. In practice, you must deploy this as a stored procedure, with exception handling, and all business logic in it, and that's the reason it is not used in modern application. I use the DO block to show that this requires a single roundtrip to the database:

root@e18764bd2f77:/# pgbench --show-script=tpcb-like 2>&1 \
  | sed \
    -e "s/^BEGIN;/DO ' begin/" \
    -e "s/^SELECT/PERFORM/" \
    -e "s/^END;/end; ';/" \
  | pgbench -r -f /dev/stdin postgres://postgres:x@db

pgbench (17.5 (Debian 17.5-1.pgdg120+1))
starting vacuum...end.
transaction type: /dev/stdin
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
number of failed transactions: 0 (0.000%)
latency average = 51.763 ms
initial connection time = 263.673 ms
tps = 19.318893 (without initial connection time)

statement latencies in milliseconds and failures:
         0.002           0  \set aid random(1, 100000 * :scale)
         0.001           0  \set bid random(1, 1 * :scale)
         0.000           0  \set tid random(1, 10 * :scale)
         0.000           0  \set delta random(-5000, 5000)
        51.755           0  DO ' begin

The transaction took 50 ms to complete. It was started and committed on the database side, which is why all business logic and error handling must occur there. Using stored procedures adds complexity to development, testing, and deployment. A DO block doesn't have to be deployed to the database first, but complicates the process of returning information. Both stored procedures and DO blocks in PL/SQL are sent as PL/pgSQL code in a character string, interpreted at runtime, which poses a risk for runtime errors. Developers prefer to keep code in their application, using their language of choice, ensuring it is compiled, tested, packaged, and deployed consistently.

Multi-statement transactions in SQL databases struggle to scale due to increased client-server roundtrips when all logic resides in the applications. Normalization was developed when applications were deployed on database servers, utilizing embedded SQL or stored procedures. This allowed transactions to execute multiple statements and acquire locks using two-phase locking, without waiting in between. However, with the rise of client-server and three-tier architectures, this didn't scale.

Document databases utilize a different data modeling strategy, where a single document can contain all relevant transaction information. This approach allows the business logic to reside in the application code, enabling an entire business transaction to fit into a single atomic call.

MongoDB multi-document transaction

I ran the same workload on MongoDB, using one collection per table and a multi-document transaction. Since the issue lies in the data model rather than the database engine, the response time remains unchanged. Document databases only demonstrate their advantages when the document model aligns with business transactions. Normalization undermines this alignment. This is why benchmarks from PostgreSQL vendors can be misleading: they apply a normalized model to a database built for unnormalized models.

Still, I've done it to prove the point, doing the same as pgbench from mongosh.

I start a lab with a MongoDB database server running in the background and a container for the client application:

docker rm -f app db
docker run --name db --hostname db -d mongo mongod --replSet rs0
docker run --link db:db --rm -it --privileged mongo bash

I add a 50 millisecond latency from the application container:

apt-get update && apt-get install -y iproute2
tc qdisc add dev eth0 root netem delay 50ms

I define it as a single node replica set and start MongoDB shell:

mongosh "mongodb://db" --eval '
rs.initiate( {_id: "rs0", members: [
          {_id: 0, priority: 1, host: "db:27017"},
         ]});
'

mongosh "mongodb://db?replicaSet=rs0"

Here is my equivalent to pgbench -i:

// mongosh equivalent to pgbench -i

db.accounts.drop();  db.branches.drop();  db.tellers.drop();  db.history.drop(); 

db.branches.insertOne({ _id: 1, bbalance: 0 });

let tellerDocs = [];  for (let i = 1; i <= 10; ++i)  {
  tellerDocs.push({ _id: i, bid: 1, tbalance: 0 });
} ;
db.tellers.insertMany(tellerDocs);

const nAccounts = 100000;  const bulk = db.accounts.initializeUnorderedBulkOp();
for (let i = 1; i <= nAccounts; ++i) {
  bulk.insert({ _id: i, bid: 1, abalance: 0 });
  if (i % 10000 === 0) print(`inserted ${i} accounts`);
}  ; 
bulk.execute();

This has created three collections, and initialized it with data, "history" is empty and will be created when used:

rs0 [primary] test> show collections
accounts
branches
tellers

Here is my equivalent to pgbench with all default options:

// mongosh equivalent to pgbench with all default options

// Measure connection time
let t0 = Date.now();
let session = db.getMongo().startSession()
let sess_db = session.getDatabase(db.getName());
let connTime = Date.now() - t0;

// Run 10 transactions
let fail = 0;
let totalTime = 0;
let nTx = 10
for (let i = 0; i < nTx; ++i) {
    let t1 = Date.now();
    let aid = Math.floor(Math.random() * (100000)) + 1;
    let bid = Math.floor(Math.random() * (1)) + 1;
    let tid = Math.floor(Math.random() * (10)) + 1;
    let delta = Math.floor(Math.random() * 10001) - 5000;
    // BEGIN;
    session.startTransaction();
    // UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
    sess_db.accounts.updateOne({_id: aid}, {$inc: {abalance: delta}});
    // SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
    let acc = sess_db.accounts.findOne({_id: aid}, {abalance: 1, _id: 0});
    // UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
    sess_db.tellers.updateOne({_id: tid}, {$inc: {tbalance: delta}});
    // UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
    sess_db.branches.updateOne({_id: bid}, {$inc: {bbalance: delta}});
    // INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
    sess_db.history.insertOne({tid, bid, aid, delta, mtime: new Date()});
    // END;
    session.commitTransaction();
    totalTime += Date.now() - t1;
}
session.endSession();

// display timings
const avgLat = totalTime / nTx;
const tps = nTx / (totalTime / 1000);
print("latency average =", avgLat.toFixed(3), "ms");
print("initial connection time =", connTime.toFixed(3), "ms");
print("tps =", tps.toFixed(6), "(without initial connection time)");

Here is the output:

...
... print("latency average =", avgLat.toFixed(3), "ms");
... print("initial connection time =", connTime.toFixed(3), "ms");
... print("tps =", tps.toFixed(6), "(without initial connection time)");
latency average = 319.100 ms
initial connection time = 1.000 ms
tps = 3.133814 (without initial connection time)

One difference with PostgreSQL is that the start of a transaction doesn't need a roundtrip to the database in MongoDB, saving 50ms latency in this lab. Each statements takes 2ms more, but that's not comparable because PostgreSQL deffers lots of work to later with vacuum and the consequence is not visible when running ten transactions. MongoDB throughput is a bit higher than PostgreSQL for multi-table/collection transactions but not tremendeously because the real benefit comes from the document model, which is not used here.

I check that the ten transactions have been recorded in "history":

rs0 [primary] test> db.history.find().sort({mtime:1}).forEach(
                     doc => print(JSON.stringify(doc))
);

{"_id":"687c0863fafd81ddbcbaa8b9","tid":9,"bid":1,"aid":79275,"delta":2113,"mtime":"2025-07-19T21:04:35.438Z"}
{"_id":"687c0863fafd81ddbcbaa8ba","tid":10,"bid":1,"aid":12931,"delta":-5,"mtime":"2025-07-19T21:04:35.767Z"}
{"_id":"687c0864fafd81ddbcbaa8bb","tid":7,"bid":1,"aid":73292,"delta":-2319,"mtime":"2025-07-19T21:04:36.084Z"}
{"_id":"687c0864fafd81ddbcbaa8bc","tid":2,"bid":1,"aid":74453,"delta":-2909,"mtime":"2025-07-19T21:04:36.402Z"}
{"_id":"687c0864fafd81ddbcbaa8bd","tid":8,"bid":1,"aid":25159,"delta":-1522,"mtime":"2025-07-19T21:04:36.721Z"}
{"_id":"687c0865fafd81ddbcbaa8be","tid":5,"bid":1,"aid":21455,"delta":-2985,"mtime":"2025-07-19T21:04:37.036Z"}
{"_id":"687c0865fafd81ddbcbaa8bf","tid":8,"bid":1,"aid":66059,"delta":328,"mtime":"2025-07-19T21:04:37.353Z"}
{"_id":"687c0865fafd81ddbcbaa8c0","tid":8,"bid":1,"aid":58666,"delta":-4803,"mtime":"2025-07-19T21:04:37.668Z"}
{"_id":"687c0865fafd81ddbcbaa8c1","tid":1,"bid":1,"aid":99695,"delta":-4717,"mtime":"2025-07-19T21:04:37.987Z"}
{"_id":"687c0866fafd81ddbcbaa8c2","tid":9,"bid":1,"aid":15122,"delta":-20,"mtime":"2025-07-19T21:04:38.304Z"}

My business transactions, including deposits and withdrawals, are fully recorded in this collection. In contrast, other collections only maintain the current balance to avoid aggregating all historical operations. While this approach is valid, should the client application, which is close to the user and awaits completion, be responsible for such optimization?

MongoDB single-document transaction

In MongoDB, achieving the same result in a single call is done not through interpreted procedural code or stored procedures, but by employing a proper document design. The TCPB-like benchmark records a transaction that modifies an account balance and updates some summaries per teller and branches. This workload was designed to stress the database in a non-scalable manner: the teller and branch summaries are hotspots.

In a proper application, the business transaction is recorded in the "history" collection. Summaries can be updated asynchronously by applying the transaction information, and a view can do the same in real-time if there's a need to see the current summary before it is applied to the account, teller, or branch collections. In this case, the workload on which the latency must be measured is a single-document insert into "history", with an additional field to flag what is applied to summaries.

Here is the code which records transactions in one atomic call to the database service:

let t0 = Date.now();
// No explicit session needed
let connTime = Date.now() - t0;
let totalTime = 0;
let nTx = 10;
for (let i = 0; i < nTx; ++i) {
    let t1 = Date.now();
    let aid = Math.floor(Math.random() * 100000) + 1;
    let bid = 1; // for scale 1
    let tid = Math.floor(Math.random() * 10) + 1;
    let delta = Math.floor(Math.random() * 10001) - 5000;
    db.history.insertOne({
        tid: tid,
        bid: bid,
        aid: aid,
        delta: delta,
        mtime: new Date(),
        to_apply: true // pending for background applier
    });
    totalTime += Date.now() - t1;
}
const avgLat = totalTime / nTx;
const tps = nTx / (totalTime / 1000);
print("latency average =", avgLat.toFixed(3), "ms");
print("initial connection time =", connTime.toFixed(3), "ms");
print("tps =", tps.toFixed(6), "(without initial connection time)");

There's a single round-trip, visible with my artificial network latency of 50ms:

... print("latency average =", avgLat.toFixed(3), "ms");
... print("initial connection time =", connTime.toFixed(3), "ms");
... print("tps =", tps.toFixed(6), "(without initial connection time)");
latency average = 54.400 ms
initial connection time = 0.000 ms
tps = 18.382353 (without initial connection time)

The asynchronous apply must run in a multi-document transaction, but the higher latency is not a problem as it runs in the background, and it can run closer to the database:

let session = db.getMongo().startSession()
let sess_db = session.getDatabase(db.getName());
while (true) {
 try {
  session.startTransaction();
  let txn = sess_db.history.findOneAndUpdate(
      { to_apply: true, bid: 1 },
      { $unset: { to_apply: "" } }, // remove flag when all applied
      { sort: { mtime: 1 }, returnDocument: "after" }
  );
  if (!txn) {  session.abortTransaction();  break;  }
  sess_db.accounts.updateOne(  { _id: txn.aid },  { $inc: { abalance: txn.delta } }  );
  sess_db.tellers.updateOne(  { _id: txn.tid },  { $inc: { tbalance: txn.delta } }  );
  sess_db.branches.updateOne(  { _id: txn.bid },  { $inc: { bbalance: txn.delta } }  );
  sess_db.history.updateOne(  { _id: txn._id },  { $set: { applied_time: new Date() } }  );
  session.commitTransaction();
 } catch (e) {
     print("APPLY ERROR:", e);
     session.abortTransaction();
 }
}
session.endSession();

This provides an idea, but it can be further improved by processing a batch of records in a single transaction. I've hardcoded the branch "bid" as there's only one here, but having one job per branch can make it more scalable.

A partial index allows fast access to the records to apply and doesn't add significant latency to the insert as it is maintained in the database:

db.history.createIndex(  
  { bid: 1, aid: 1 },  
  { partialFilterExpression: { to_apply: true } }  
);

If there's a need to get the real-time balance before it is applied, an aggregation pipeline can read the non-applied records for an account:

db.accounts.aggregate([  
  { $match: { _id: 53077 } },  
  { $lookup: {  
      from: "history",  
      let: { aid: "$_id", bid: "$bid" },  
      pipeline: [  
        { $match: { $expr: { $and: [  
          { $eq: ["$to_apply", true] }, // find the pending transactions to apply
          { $eq: ["$aid", "$$aid"] } ,  // for the same account
          { $eq: ["$bid", "$$bid"] }    // in the same branch
        ] } } },  
        {  
          $group: {  
            _id: null,  
            total: { $sum: "$delta" },  // sum their withdrawal and deposits
            mtime: { $max: "$mtime" }   // with the last modification timestamp
          }  
        }  
      ],  
      as: "pending"                     // this lists all pending transactions
    }  
  },  
  { $addFields: {  
      abalance: {  
        $add: [                         // add this sum to the balance
          "$abalance",  
          { $ifNull: [ { $arrayElemAt: [ "$pending.total", 0 ] }, 0 ] }  
        ]  
      },  
      mtime: { $arrayElemAt: [ "$pending.mtime", 0 ] }  
    }  
  },  
  { $project: { pending: 0 } }         // Remove the pending array from output  
])

I run it with an account that has a balance of 42 and a non-applied record with a deposit of 4335 and check that the query is correct:

...
...   { $project: { pending: 0 } } // Remove the pending array from output
... ])
[
  {
    _id: 53077,
    bid: 1,
    abalance: 4377,
    mtime: ISODate('2025-07-19T22:14:17.668Z')
  }
]

rs0 [primary] test> db.accounts.find({ _id: 53077 })

[ { _id: 53077, bid: 1, abalance: 42 } ]

rs0 [primary] test> db.history.find({ aid: 53077 })

[
  {
    _id: ObjectId('687c18b9cde3d760bbbaa8fd'),
    tid: 8,
    bid: 1,
    aid: 53077,
    delta: 4335,
    mtime: ISODate('2025-07-19T22:14:17.668Z'),
    to_apply: true
  }
]

If you have always been working with SQL databases, and have build all your demonstration for transactions, ACID properties, and locking using the traditional bank account transfer example, you may be surprised that you have to think differently. However, if you step back and look at the application globally, you will understand that it makes sense to decouple the business logic, like recording a business transaction, and the data logic, like maintaining summaries.
If you are not convinced by the architectural argument, simply check your online banking application after using your credit card. You'll find numerous pending, billed, or refunded transactions. No payment application will lock a database state for the entire duration of the delivery, and no modern database will lock the account when reading the current balance, so a slighty stale read always happen.

Summary: where is your business logic?

A business transaction, especially when the user is waiting for completion, should ideally complete in one round trip to the database to avoid unpredictable latency.
For latency-sensitive online applications, it is advisable to avoid multi-statement SQL transactions, which is dictated by normalization, to be executed from the application tier. Similarly, multi-document transactions in MongoDB should be reserved for background use, except when performance criteria are not critical.

In SQL databases, the most efficient approach is to encapsulate business logic in a single, autocommit call to a stored procedure, minimizing network roundtrips and avoiding incremental locking. This has been promoted with the SmartDB name by very smart people at Oracle:

This is the Smart Database paradigm: select, insert, update, delete, merge, commit, and rollback are issued only from database PL/SQL. Developers and end-users of applications built this way are happy with their correctness, maintainability, security, and performance. But when developers follow the NoPlsql paradigm, their applications have problems in each of these areas and end-users suffer.

Not all enterprises prefer coding applications in Oracle PL/SQL stored procedures. PostgreSQL offers PL/pgSQL, but it is interpreted and has limitations, such as the inability to use exception handlers with commits in a PL/pgSQL block.

In contrast, when the business and data logic remains in the application code, MongoDB’s document model allows business transactions to map directly to single-document operation, inherently ACID without additional transaction control. Correctness, maintainability, security, and performance is part of the application and follows the same development, testing and deployment process. It requires only one atomic call per business transaction, eliminating the need for back-and-forth dialogue between client and database to update normalized tables. This design results in lower latency and simpler, more scalable data architectures for modern applications.