{"id":7999,"date":"2015-11-16T06:02:35","date_gmt":"2015-11-16T06:02:35","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2015\/11\/16\/dobiasd-programming-language-subreddits-and-their-choice-of-words\/"},"modified":"2022-08-30T15:03:04","modified_gmt":"2022-08-30T15:03:04","slug":"dobiasd-programming-language-subreddits-and-their-choice-of-words","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2015\/11\/16\/dobiasd-programming-language-subreddits-and-their-choice-of-words\/","title":{"rendered":"Dobiasd\/programming-language-subreddits-and-their-choice-of-words"},"content":{"rendered":"<p>While reading about various programming languages, I developed a hunch about how often different languages are mentioned by other communities and about the average conversational tones used by relative members.<\/p>\n<p>To examine if it was just selective perception on my site, an unconscious confirmation of stereotypes, or a valid observation I collected and analysed some data, i.e. all comments (about 300k) written to submissions (about 40k) in respective programming language subreddits from 2013-08 to 2014-07 using PRAW and SQLite.<\/p>\n<p>In this article I will present some selected results. (If you want you can also download the code I wrote\/used as well as the raw data generated by it.)<\/p>\n<h2>Mutual mentions<\/h2>\n<p>The following chord graph (click it for an interactive version) shows how often a programming language is mentioned in communities (subreddits) not belonging to them:<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/mutual_mentions.png\" \/><\/p>\n<p>(The size of a language is set by how often the others talk about it in sum. One connection represents the mutual mentions of two communities. The widths on each end is determined by the relative frequency of the mentionee being referenced by the respective other community. So PHP talks more about SQL than SQL talks about PHP.)<\/p>\n<p>The \u201cbig\u201d languages are the ones most talked about, <em>yawn<\/em>.<\/p>\n<p>Sure, measuring programming language popularity accurately is nearly impossible, but if we still simply take some values from TIOBE it gets interesting, because one can see how much is talked about a language relatively to how much it is supposedly used.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/mentions_relative_to_tiobe.png\" \/><\/p>\n<p>Here was the first time I said \u201cHa! I knew it!\u201d.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/haskell_tweet.png\" \/><\/p>\n<p>(No Haskell bash intended. I love it and its little web cousin Elm and use them for projects and also write articles about it.)<\/p>\n<h2>Word usage<\/h2>\n<p>If we now divide the number of comments in a subreddit containing a chosen word by the overall subreddit comment count (and multiply by 10000 to have a nice integer value), we get more \u2026 well, diagrams. But most results like the obsession with abstract concepts by the Haskell people and the consideration of hardware issues by people using C and C++ are not that surprising.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/abstract_concepts.png\" \/><\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/hardware.png\" \/><\/p>\n<h2>Cursing<\/h2>\n<p>This part here is quite comforting, because a conjecture many of us probably have is confirmed.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/cursing.png\" \/><\/p>\n<h2>Happiness<\/h2>\n<p>To finish with something positive: The lispy guys seem to be the most cheerful people.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/unknownerror.org\/opensource\/Dobiasd\/programming-language-subreddits-and-their-choice-of-words\/img\/happy.png\" \/><\/p>\n<p>But what is up with the Visual Basic community? They are neither angry nor happy. They just \u2026 are? \ud83d\ude42<\/p>\n<h2>Disclaimer<\/h2>\n<p>As you probably already noticed, this is not hard science. It was just a small fun project and contains several possibilities for errors. I tried to only choose big communities and frequent words so that there is at least a bit of statistical significance. (btw If you remove this constraint Elm is the most happy and coolest language. ^_-) But potential errors in my parser and interpretation (e.g. no taking negations into account etc.) are not to exclude fully as well. \ud83d\ude09<\/p>\n<p>Also, positive correlation (e.g cursing PHP) does not imply one causing the other. But if somebody wants to repeat this experiment to confirm\/refute the results with more fancy tools like nltk or something, I would be happy if you could drop me an email.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While reading about various programming languages, I developed a hunch about how often different languages are mentioned by other communities and about the average conversational tones used by relative members. To examine if it was just selective perception on my site, an unconscious confirmation of stereotypes, or a valid observation I collected and analysed some [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-7999","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7999","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=7999"}],"version-history":[{"count":1,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7999\/revisions"}],"predecessor-version":[{"id":8711,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7999\/revisions\/8711"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=7999"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=7999"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=7999"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}