{"id":7045,"date":"2014-05-17T00:24:32","date_gmt":"2014-05-17T00:24:32","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2014\/05\/17\/dealing-with-a-non-ascii-character-in-rspec-testing-collection-of-common-programming-errors\/"},"modified":"2014-05-17T00:24:32","modified_gmt":"2014-05-17T00:24:32","slug":"dealing-with-a-non-ascii-character-in-rspec-testing-collection-of-common-programming-errors","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2014\/05\/17\/dealing-with-a-non-ascii-character-in-rspec-testing-collection-of-common-programming-errors\/","title":{"rendered":"Dealing with a non-ascii character in Rspec Testing-Collection of common programming errors"},"content":{"rendered":"<p>I&#8217;m using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of word documents. To my surprise today while I was running a test on a particular piece of one of these documents I started running into character encoding inconstencies.<\/p>\n<p>I have tried a number of different methods to resolve the issue which I will list below, but the best success I&#8217;ve had so far is to remove all non-ASCII characters. This is far from ideal, as I don&#8217;t think the character&#8217;s are really going to be all that problematic in the DB.<\/p>\n<pre><code>gsub(\/[^[:ascii:]]\/, \"\")\n<\/code><\/pre>\n<p>This is a sample of what my output looks like vs. what I&#8217;m expecting:<\/p>\n<pre><code>My CODES'S APOSTROPHE\n\nMy CODES\u2019S APOSTROPHE\n<\/code><\/pre>\n<p>The second apostrophe should look squiggly. If you paste it into irb, you get the following: \\U+FFE2<\/p>\n<p>I tried Regexing specifically for this character and it appears to work in Rubular. As soon as I put it in my model however, I got a syntax error.<\/p>\n<pre><code>syntax error, unexpected $end, expecting ')'\nraw_title = raw_title.gsub(\/\u2019\/, \"\")\n<\/code><\/pre>\n<p>I also tried forcing the encoding to UTF-8, but everything is already in UTF-8 and this does not appear to have an effect. I tried forcing the output to US-ASCII, but I get a byte sequence error.<\/p>\n<p>I also tried a few of the encoding options found in Ruby library. These basically did the same thing as the Regex.<\/p>\n<p>This all comes down to that I&#8217;m trying to match output for testing purposes. Should I even be concerned about these special characters? Is there a better way to match these characters without blindly removing them?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of word documents. To my surprise today while I was running a test on a particular piece of one of these documents I started running into character encoding inconstencies. I have tried a number of different methods to resolve the issue which [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-7045","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=7045"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7045\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=7045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=7045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=7045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}