{"id":120,"date":"2015-10-26T16:40:13","date_gmt":"2015-10-26T15:40:13","guid":{"rendered":"http:\/\/www.fluxnumerique.fr\/?p=120"},"modified":"2020-12-13T10:12:43","modified_gmt":"2020-12-13T09:12:43","slug":"batch-insert-in-mongodb-using-node-js-and-monk","status":"publish","type":"post","link":"https:\/\/www.fluxnumerique.fr\/?p=120","title":{"rendered":"Batch insert in MongoDB using node.js and Monk"},"content":{"rendered":"\n<p>As a test, I have decided to use node.js and Monk to mimic a bulk import in MongoDB, from a source file containing a json array. I say \u00ab\u00a0mimic\u00a0\u00bb, as Monk does not implement <a href=\"https:\/\/docs.mongodb.org\/manual\/core\/bulk-write-operations\/\">MongoDB bulk insert<\/a>&nbsp;per se.<\/p>\n\n\n\n<p>My first try was to insert the whole &lsquo;data&rsquo; array in a &lsquo;film&rsquo; collection, parsed from the json file (the initialization&nbsp;code is shown further down).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted theme:sublime-text lang:default decode:true\">films.insert( data,\n  function(err, doc) {\n      if (err) throw err;\n  });<\/pre>\n\n\n\n<p>It really inserts each array item in Mongodb as an individual document, with its own id, which is fine.<\/p>\n\n\n\n<p>But the issue was that on execution, the script was stuck somewhere, and the prompt never showed backed &nbsp;until I hit ctrl-C.&nbsp;Browsing&nbsp;Mongodb&nbsp;with the &lsquo;mongo&rsquo; shell showed that the docs were nevertheless stored in the database.<\/p>\n\n\n\n<p>Some would just add a &#8230;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted theme:sublime-text lang:default decode:true\">db.close();<\/pre>\n\n\n\n<p>&#8230; *after* the insert directive, but it would be missing the point: the insert function call is asynchronous, and the insertion could still be running when the close directive is sent, ending with an exception of type \u00ab\u00a0Connection Closed By Application\u00a0\u00bb.<\/p>\n\n\n\n<p>This happens if you don&rsquo;t import a JSON array as a block, but instead have to insert a larget set of individual items (e.g. from a set of individual files), which takes some time.<\/p>\n\n\n\n<p>The good practice (*) is&nbsp;therefore to close the db from the insert callback, after the last item has been processed. In the following code, I use insert &gt; count &gt; close for the demonstration.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted theme:sublime-text lang:default decode:true\">var mongo = require('mongodb');\nvar monk = require('monk');\nvar fs = require('fs');\n\n\/\/ one param = path of the source json file, with docs to store\nif (process.argv.length &lt; 3)\n  return console.log(\"missing file path\")\nfile = process.argv[2];\n\n\/\/ create a Mongodb connection on the 'cinema' db\nvar db = monk('localhost:27017\/cinema');\n\/\/ access the 'films' collection\nvar films = db.get('films');\n\n\/\/ read the complete file.\n\/\/ no stream, as we have to modify the array on the fly\nvar jsondata = fs.readFileSync(file, 'utf8');\n\n\/\/ transform json to array\ndata = JSON.parse(jsondata)\nconsole.log(data.length + ' items to store');\n\n\/\/ save the docs, one after the other (no bulk insert with monk)\n\/\/ need to count inserted docs for closing the db *after* the last doc has been saved\nvar doccount = 0;\nfor (var i=0; i&lt;data.length; i++) {\n  \/\/ insert a doc (async); here I filter the french version (.fr) of the data[i] structure\n  films.insert( data[i].fr,\n    function(err, doc) {\n        if (err) throw err;\n        \/\/ when the last doc has been stored\n        \/\/ (god, I'd like to use ++doccount in the condition,\n        \/\/ but it is said \"bad practice\" in js)\n        doccount++;\n        if (doccount == data.length) {\n          \/\/ get and display the new collection size\n          films.count({}, function(err, count) {\n            if (err) throw err;\n            console.log('%d docs in %s', count, 'films');\n            \/\/ close the mongodb database\n            db.close();\n          });\n        }\n    });\n}<\/pre>\n\n\n\n<p>Note that the variable i cannot be used as a test item for asserting the end of the loop, as i may be equal to data.length *before* the first insertion callback is called.<\/p>\n\n\n\n<p>(*) Disclaimer: I&rsquo;m still a newcomer to node.js, therefore there may&nbsp;be better ways to achieve what I want; welcome if you can propose something more effective&nbsp;&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As a test, I have decided to use node.js and Monk to mimic a bulk import in MongoDB, from a source file containing a json array. I say \u00ab\u00a0mimic\u00a0\u00bb, as Monk does not implement MongoDB bulk insert&nbsp;per se. My first try was to insert the whole &lsquo;data&rsquo; array in a &lsquo;film&rsquo; collection, parsed from the&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-120","post","type-post","status-publish","format-standard","hentry","category-nosql"],"_links":{"self":[{"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=\/wp\/v2\/posts\/120","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=120"}],"version-history":[{"count":10,"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=\/wp\/v2\/posts\/120\/revisions"}],"predecessor-version":[{"id":148,"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=\/wp\/v2\/posts\/120\/revisions\/148"}],"wp:attachment":[{"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=120"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=120"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fluxnumerique.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=120"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}