{"id":7445,"date":"2014-06-19T03:58:09","date_gmt":"2014-06-19T03:58:09","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2014\/06\/19\/meteor-app-deployed-to-digital-ocean-stuck-at-100-cpu-and-oom-collection-of-common-programming-errors\/"},"modified":"2014-06-19T03:58:09","modified_gmt":"2014-06-19T03:58:09","slug":"meteor-app-deployed-to-digital-ocean-stuck-at-100-cpu-and-oom-collection-of-common-programming-errors","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2014\/06\/19\/meteor-app-deployed-to-digital-ocean-stuck-at-100-cpu-and-oom-collection-of-common-programming-errors\/","title":{"rendered":"Meteor app deployed to Digital Ocean stuck at 100% CPU and OOM-Collection of common programming errors"},"content":{"rendered":"<p>I have a Meteor (0.8.0) app deployed using Meteor Up to Digital Ocean that&#8217;s been stuck at 100% CPU, only to crash with out of memory, and start up again at 100% CPU. It&#8217;s been stuck like this for the past 24 hours. The weird part is nobody is using the server and meteor.log isn&#8217;t showing much clues. I&#8217;ve got MongoHQ with oplog for the database.<\/p>\n<p>Digital Ocean specs:<\/p>\n<p>1GB Ram 30GB SSD Disk New York 2 Ubuntu 12.04.3 x64<\/p>\n<p>Screenshot showing issue:<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/i.stack.imgur.com\/GsK7o.png\" \/><\/p>\n<p>Note that the screenshot was captured yesterday and it has stayed pegged at 100% cpu until it crashes with out of memory. The log shows:<\/p>\n<blockquote>\n<p>FATAL ERROR: Evacuation Allocation failed &#8211; process out of memory error: Forever detected script was killed by signal: SIGABRT error: Forever restarting script for 5 time<\/p>\n<\/blockquote>\n<p>Top displays:<\/p>\n<p><strong>26308 meteorus 20 0 1573m 644m 4200 R 98.1 64.7 32:45.36 node<\/strong><\/p>\n<p>How it started: I have an app that takes in a list of emails via csv or mailchimp oauth, sends them off to fullcontact via their batch process call http:\/\/www.fullcontact.com\/developer\/docs\/batch\/ and then updates the Meteor collections accordingly depending on the response status. A snippet from a 200 response<\/p>\n<pre><code>if (result.statusCode === 200) {\n            var data = JSON.parse(result.content);\n            var rate_limit = result.headers['x-rate-limit-limit'];\n            var rate_limit_remaining = result.headers['x-rate-limit-remaining'];\n            var rate_limit_reset = result.headers['x-rate-limit-reset'];\n            console.log(rate_limit);\n            console.log(rate_limit_remaining);\n            console.log(rate_limit_reset);\n            _.each(data.responses, function(resp, key) {\n                var email = key.split('=')[1];\n                if (resp.status === 200) {\n                    var sel = {\n                        email: email,\n                        listId: listId\n                    };\n                    Profiles.upsert({\n                        email: email,\n                        listId: listId\n                    }, {\n                        $set: sel\n                    }, function(err, result) {\n                        if (!err) {\n                            console.log(\"Upsert \", result);\n                            fullContactSave(resp, email, listId, Meteor.userId());                            \n                        }\n                    });\n                    RawCsv.update({\n                        email: email,\n                        listId: listId\n                    }, {\n                        $set: {\n                            processed: true,\n                            status: 200,\n                            updated_at: new Date().getTime()\n                        }\n                    }, {\n                        multi: true\n                    });\n                }\n                });\n                }\n<\/code><\/pre>\n<p>Locally on my wimpy Windows laptop running Vagrant, I have no performance issues whatsoever processing hundreds of thousands of emails at a time. But on Digital Ocean, it can&#8217;t even handle 15,000 it seems (I&#8217;ve seen the CPU spike to 100% and then crash with OOM, but after it comes up it usually stabalizes&#8230; not this time). What worries me is that the server hasn&#8217;t recovered at all despite no\/little activity on the app. I&#8217;ve verified this by looking at analytics &#8211; GA shows 9 sessions total over the 24 hours doing little more than hitting \/ and bouncing, MixPanel shows only 1 logged in user (me) in the same timeframe. And the only thing I&#8217;ve done since the initial failure is check the <code>facts<\/code> package, which shows:<\/p>\n<blockquote>\n<p>mongo-livedata observe-multiplexers 13 observe-drivers-oplog 13<\/p>\n<p>oplog-watchers 16 observe-handles 15 time-spent-in-QUERYING-phase<\/p>\n<p>87828 time-spent-in-FETCHING-phase 82 livedata<\/p>\n<p>invalidation-crossbar-listeners 16 subscriptions 11 sessions 1<\/p>\n<\/blockquote>\n<p>Meteor APM also doesn&#8217;t show anything out of the ordinary, the meteor.log doesn&#8217;t show any meteor activity aside from the OOM and restart messages. MongoHQ isn&#8217;t reporting any slow running queries or much activity &#8211; 0 queries, updates, inserts, deletes on avg from staring at their monitoring dashboard. So as far as I can tell, there hasn&#8217;t been much activity for 24 hours, and certainly not anything intensive. I&#8217;ve since tried to install newrelic and nodetime but neither is quite working &#8211; newrelic shows no data and the meteor.log has a nodetime debug message<\/p>\n<p><strong>Failed loaded nodetime-native extention.<\/strong><\/p>\n<p>So when I try to use nodetime&#8217;s CPU profiler it turns up blank and the heap snapshot returns with <strong>Error: V8 tools are not loaded.<\/strong><\/p>\n<p>I&#8217;m basically out of ideas at this point, and since Node is pretty new to me it feels like I&#8217;m taking wild stabs in the dark here. Please help.<\/p>\n<p><strong>Update<\/strong>: Server is still pegged at 100% four days later. Even an init 6 doesn&#8217;t do anything &#8211; Server restarts, node process starts and jumps back up to 100% cpu. I tried other tools like memwatch and webkit-devtools-agent but could not get them to work with Meteor.<\/p>\n<p>The following is the strace output<\/p>\n<blockquote>\n<p>strace -c -p 6840<\/p>\n<p>Process 6840 attached &#8211; interrupt to quit<\/p>\n<p>^CProcess 6840 detached<\/p>\n<p>% time seconds usecs\/call calls errors syscall<\/p>\n<p>77.17 0.073108 1 113701 epoll_wait<\/p>\n<p>11.15 0.010559 0 80106 39908 mmap<\/p>\n<p>6.66 0.006309 0 116907 read<\/p>\n<p>2.09 0.001982 0 84445 futex<\/p>\n<p>1.49 0.001416 0 45176 write<\/p>\n<p>0.68 0.000646 0 119975 munmap<\/p>\n<p>0.58 0.000549 0 227402 clock_gettime<\/p>\n<p>0.10 0.000095 0 117617 rt_sigprocmask<\/p>\n<p>0.04 0.000040 0 30471 epoll_ctl<\/p>\n<p>0.03 0.000031 0 71428 gettimeofday<\/p>\n<p>0.00 0.000000 0 36 mprotect<\/p>\n<p>0.00 0.000000 0 4 brk<\/p>\n<p>100.00 0.094735 1007268 39908 total<\/p>\n<\/blockquote>\n<p>So it looks like the node process spends most of its time in epoll_wait.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have a Meteor (0.8.0) app deployed using Meteor Up to Digital Ocean that&#8217;s been stuck at 100% CPU, only to crash with out of memory, and start up again at 100% CPU. It&#8217;s been stuck like this for the past 24 hours. The weird part is nobody is using the server and meteor.log isn&#8217;t [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-7445","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7445","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=7445"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7445\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=7445"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=7445"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=7445"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}