{"id":6376,"date":"2014-04-17T01:13:26","date_gmt":"2014-04-17T01:13:26","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2014\/04\/17\/using-nodejs-async-to-process-a-large-xml-file-with-relationships-collection-of-common-programming-errors-2\/"},"modified":"2014-04-17T01:13:26","modified_gmt":"2014-04-17T01:13:26","slug":"using-nodejs-async-to-process-a-large-xml-file-with-relationships-collection-of-common-programming-errors-2","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2014\/04\/17\/using-nodejs-async-to-process-a-large-xml-file-with-relationships-collection-of-common-programming-errors-2\/","title":{"rendered":"Using nodejs async to process a large xml file (with relationships)-Collection of common programming errors"},"content":{"rendered":"<p>I have to process a large XML file (around 25 mb in size), and organize the data into documents to import into MongoDB.<\/p>\n<p>The issue is, there are around 5-6 types of elements in the xml document, each with around 10k rows.<\/p>\n<p>After fetching one xml node of type a, I have to fetch it&#8217;s corresponding elements of types b,c,d, etc.<\/p>\n<p>What I am trying to do in node:<\/p>\n<ol>\n<li>Fetch all the rows of type a.<\/li>\n<li>For each row, using xpath, find its corresponding related rows, and create the document.<\/li>\n<li>Insert document in mongodb<\/li>\n<\/ol>\n<p>If there are 10k rows of type a, the 2nd step runs 10k times. I am trying to get this to run in parallel so that the thing doesn&#8217;t take forever. Hence, async.forEach seemed to be the perfect solution.<\/p>\n<p><code>async.forEach(rowsA,fetchA);<\/code><\/p>\n<p>My fetchrelations function is sort of like this<\/p>\n<pre><code>var fetchA = function(rowA) {\n\/\/covert the xml row into an object \n    var obj = {};\n    for(i in rowA.attributes) {\n    attribute = rowA.attributes[i];\n    if(attribute.value === undefined) \n        continue;\n    obj[attribute.name] = attribute.value;\n    }\n    console.log(obj.someattribute);\n    \/\/first other related rows, \n    \/\/callback inserts the modified object with the subdocuments\n    findRelations(obj,function(obj){\n        insertA(obj,postInsert);\n    });\n};\n<\/code><\/pre>\n<p>After I try to run this, the console.log in the code only runs about once in every 1.5 seconds, not parallely for every row as I expected. I have been scratching my head and trying to figure this out for the past two hours, but I am not sure what I am doing wrong.<\/p>\n<p>I am not very adept with node, so please be patient.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have to process a large XML file (around 25 mb in size), and organize the data into documents to import into MongoDB. The issue is, there are around 5-6 types of elements in the xml document, each with around 10k rows. After fetching one xml node of type a, I have to fetch it&#8217;s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-6376","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/6376","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=6376"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/6376\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=6376"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=6376"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=6376"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}