on software development (possibly other stuff)

MongoDB Transaction Across Multiple Documents using Async and Mongoose

2 comments
Unlike traditional databases, MongoDB does not support transactions. So suppose you have multiple documents and want to perform a "save all or nothing" operation, you'd have to simulate a transaction from within your application.

Implementing such a transaction in NodeJS can be tricky because IO operations are generally asynchronous. This can lead to nested callbacks as demonstrated by the code below:
var MyModel = require('mongoose').model('MyModel');

var docs = [];
MyModel.create({ field: 'value1' }, function (err, doc) {
  if (err) { console.log(err); }
  else {
    docs.push(doc);
    MyModel.create({ field: 'value2' }, function (err, doc) {
      if (err) { rollback(docs); }
      else {
        docs.push(doc);
        MyModel.create({ field: 'value3' }, function (err, doc) {
          if (err) { rollback(docs); }
          else {
            console.log('Done.');
          }
        });
      }
    });
  }
});
Despite not having a dependency on other documents, inserting each document must be performed in series. This is because we have no way of finding out when all other documents will finish saving. This approach is problematic because it leads to deeper nesting as the number of documents increase. Also, in the above example, the callback functions modify the docs variable - this is a side effect and breaks functional principles. With the help of Async, both issues can be addressed.

The parallel() function of Async allows you run multiple functions in parallel. Each function signals completion by invoking a callback to which either the result of the operation or an error is passed. Once all functions are completed, an optional callback function is invoked to handle the results or errors.

The above code can be improved by implementing document insertions as functions of parallel(). If an insert succeeds, the inserted document will be passed to the callback; otherwise, the error will be passed (these can later be used when a rollback is required). Once all the parallel functions complete, the parallel callback can perform a rollback if any of the functions failed to save.

Below is the improved version using Async:
var async    = require('async'),
    mongoose = require('mongoose');

var MyModel = mongoose.model('MyModel');

async.parallel([
    function (callback) {
      MyModel.create({ field: 'value1' }, callback);
    },
    function (callback) {
      MyModel.create({ field: 'value2' }, callback);
    },
    function (callback) {
      MyModel.create({ field: 'value3' }, callback);
    }
  ],
  function (errs, results) {
    if (errs) {
      async.each(results, rollback, function () {
        console.log('Rollback done.');
      });
    } else {
      console.log('Done.');
    }
  });

function rollback (doc, callback) {
  if (!doc) { callback(); }
  else {
    MyModel.findByIdAndRemove(doc._id, function (err, doc) {
      console.log('Rolled-back document: ' + doc);
      callback();
    });
  }
}
Lines 8, 11, and 14 inserts a new document to MongoDB then passes either the saved document or an error to the callback.

Lines 18-21 checks if any of the parallel functions threw an error, and calls the rollback() function on each document passed to callback of parallel().

Lines 28-33 performs the rollback by deleting the inserted documents using their IDs.

Edit: As pointed-out in the comments, the rollback itself can fail. In such scenarios, it is best to retry the rollback until it succeeds. Therefore, the rollback logic must be an idempotent operation.

2 comments :

  1. Thanks for this write-up. Async looks like a good option to write cleaner code when you need to atomically write multiple documents.

    But what happens if the rollback doesn't succeed? I could imagine this situation:

    - value1 is inserted in database

    - value2 is inserted in database

    - Network connection to database goes down

    - Attempt to insert value3 results in an error, rollback function is called

    - Attempt to delete value1 results in error

    - Attempt to delete value2 results in error

    - Network connection to database comes back up

    Now the database is in an inconsistent state, something we wanted to avoid. To fix this sitution, I think you'd need to add a mechanism to retry the rollback. Are there other (better? easier?) ways to fix it though?

    ReplyDelete
  2. Yes, you are correct, that is one caveat I failed to mentioned.


    The errors I had in mind were more around data validation within Mongoose, but as you pointed out, the rollback itself can fail.


    Unfortunately, I have no solution for that as of the moment other than retry the rollback as you suggested.

    ReplyDelete