{"id":7644,"date":"2015-09-23T05:00:53","date_gmt":"2015-09-23T05:00:53","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2015\/09\/23\/chimpler-blog-spark-streaming-log-aggregation\/"},"modified":"2015-09-23T05:00:53","modified_gmt":"2015-09-23T05:00:53","slug":"chimpler-blog-spark-streaming-log-aggregation","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2015\/09\/23\/chimpler-blog-spark-streaming-log-aggregation\/","title":{"rendered":"chimpler\/blog-spark-streaming-log-aggregation"},"content":{"rendered":"<p>Simple example consuming an adserver logs stream from Kafka.<\/p>\n<p>More information on our blog: http:\/\/chimpler.wordpress.com\/2014\/07\/01\/implementing-a-real-time-data-pipeline-with-spark-streaming\/<\/p>\n<p>In order to run our example, we\u00a0need to install the followings:<\/p>\n<ul>\n<li>Scala 2.10+<\/li>\n<li>SBT<\/li>\n<li>Apache Zookeeper<\/li>\n<li>Apache Kafka<\/li>\n<li>MongoDB<\/li>\n<\/ul>\n<p>Building the examples:<\/p>\n<pre><code>$ sbt pack\n<\/code><\/pre>\n<p>Create a topic \u201cadnetwork-topic\u201d:<\/p>\n<pre><code>$ kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic adnetwork-topic\n<\/code><\/pre>\n<p>Start Zookeeper:<\/p>\n<pre><code>$ zookeeper-server-start.sh config\/zookeeper.properties\n<\/code><\/pre>\n<p>Start Kafka:<\/p>\n<pre><code>$ kafka-server-start.sh config\/kafka-server1.properties\n<\/code><\/pre>\n<p>Run MongoDB: $ sudo mongod<\/p>\n<p>On one window, run the aggregator:<\/p>\n<pre><code>$ target\/pack\/bin\/aggregator\n<\/code><\/pre>\n<p>On the other one, run the adserver log random generator:<\/p>\n<pre><code>$ target\/pack\/bin\/generator\n<\/code><\/pre>\n<p>You can also see the messages that are sent using the Kafka console consumer:<\/p>\n<pre><code>$ kafka-console-consumer.sh --topic adnetwork-topic --zookeeper localhost:2181\n<\/code><\/pre>\n<p>After a few seconds, you should see the results in MongoDB:<\/p>\n<pre><code>$ mongoexport -d adlogdb -c impsPerPubGeo --csv -f date,publisher,geo,imps,uniques,avgBids\nconnected to: 127.0.0.1\n \ndate,publisher,geo,imps,uniques,avgBids\n2014-07-01T03:24:39.679Z,\"publisher_4\",\"CA\",3980,3248,0.50062253292876\n2014-07-01T03:24:39.681Z,\"publisher_4\",\"MI\",3958,3229,0.505213545705667\n2014-07-01T03:24:39.681Z,\"publisher_1\",\"HI\",3886,3218,0.4984981221446526\n2014-07-01T03:24:39.681Z,\"publisher_3\",\"CA\",3937,3226,0.5038157362872939\n2014-07-01T03:24:39.679Z,\"publisher_4\",\"NY\",3894,3200,0.5022389599376207\n2014-07-01T03:24:39.679Z,\"publisher_2\",\"HI\",3906,3240,0.4988378174961185\n2014-07-01T03:24:39.679Z,\"publisher_3\",\"HI\",3989,3309,0.4975347625823641\n2014-07-01T03:24:39.681Z,\"publisher_3\",\"FL\",3957,3167,0.4993339490605483\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Simple example consuming an adserver logs stream from Kafka. More information on our blog: http:\/\/chimpler.wordpress.com\/2014\/07\/01\/implementing-a-real-time-data-pipeline-with-spark-streaming\/ In order to run our example, we\u00a0need to install the followings: Scala 2.10+ SBT Apache Zookeeper Apache Kafka MongoDB Building the examples: $ sbt pack Create a topic \u201cadnetwork-topic\u201d: $ kafka-topics.sh &#8211;create &#8211;zookeeper localhost:2181 &#8211;replication-factor 1 &#8211;partitions 1 &#8211;topic adnetwork-topic [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-7644","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7644","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=7644"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7644\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=7644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=7644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=7644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}