Batch insert in MongoDB using node.js and Monk

As a test, I have decided to use node.js and Monk to mimic a bulk import in MongoDB, from a source file containing a json array. I say « mimic », as Monk does not implement MongoDB bulk insert per se.

My first try was to insert the whole ‘data’ array in a ‘film’ collection, parsed from the json file (the initialization code is shown further down).

films.insert( data,
  function(err, doc) {
      if (err) throw err;

It really inserts each array item in Mongodb as an individual document, with its own id, which is fine.

But the issue was that on execution, the script was stuck somewhere, and the prompt never showed backed  until I hit ctrl-C. Browsing Mongodb with the ‘mongo’ shell showed that the docs were nevertheless stored in the database.

Some would just add a …


… *after* the insert directive, but it would be missing the point: the insert function call is asynchronous, and the insertion could still be running when the close directive is sent, ending with an exception of type « Connection Closed By Application ».

This happens if you don’t import a JSON array as a block, but instead have to insert a larget set of individual items (e.g. from a set of individual files), which takes some time.

The good practice (*) is therefore to close the db from the insert callback, after the last item has been processed. In the following code, I use insert > count > close for the demonstration.

var mongo = require('mongodb');
var monk = require('monk');
var fs = require('fs');

// one param = path of the source json file, with docs to store
if (process.argv.length < 3)
  return console.log("missing file path")
file = process.argv[2];

// create a Mongodb connection on the 'cinema' db
var db = monk('localhost:27017/cinema');
// access the 'films' collection
var films = db.get('films');

// read the complete file.
// no stream, as we have to modify the array on the fly
var jsondata = fs.readFileSync(file, 'utf8');

// transform json to array
data = JSON.parse(jsondata)
console.log(data.length + ' items to store');

// save the docs, one after the other (no bulk insert with monk)
// need to count inserted docs for closing the db *after* the last doc has been saved
var doccount = 0;
for (var i=0; i<data.length; i++) {
  // insert a doc (async); here I filter the french version (.fr) of the data[i] structure
  films.insert( data[i].fr,
    function(err, doc) {
        if (err) throw err;
        // when the last doc has been stored
        // (god, I'd like to use ++doccount in the condition,
        // but it is said "bad practice" in js)
        if (doccount == data.length) {
          // get and display the new collection size
          films.count({}, function(err, count) {
            if (err) throw err;
            console.log('%d docs in %s', count, 'films');
            // close the mongodb database

Note that the variable i cannot be used as a test item for asserting the end of the loop, as i may be equal to data.length *before* the first insertion callback is called.

(*) Disclaimer: I’m still a newcomer to node.js, therefore there may be better ways to achieve what I want; welcome if you can propose something more effective …