{"id":580,"date":"2013-11-10T19:40:37","date_gmt":"2013-11-11T03:40:37","guid":{"rendered":"http:\/\/kagan.mactane.org\/blog\/?p=580"},"modified":"2013-11-10T19:40:37","modified_gmt":"2013-11-11T03:40:37","slug":"smart-apostrophes-theyre-a-problem-in-urls","status":"publish","type":"post","link":"https:\/\/kagan.mactane.org\/blog\/2013\/11\/10\/smart-apostrophes-theyre-a-problem-in-urls\/","title":{"rendered":"Smart Apostrophes: They\u2019re a Problem (in URLs)"},"content":{"rendered":"<p>Recently, The American Prospect published <a href=\"http:\/\/prospect.org\/article\/good-mens-rights-movement-hard-find\">an article<\/a> excoriating the &#8220;men&#8217;s rights&#8221; movement. It was a pretty good article, and well-received. Lots of people tweeted links to it&#8230; or, they tried&nbsp;to.<\/p>\n<p>Curiously, those tweets all broke <em>in the exact same way<\/em>, pointing at <a href=\"http:\/\/prospect.org\/article\/good-men\">a truncated version<\/a> of the correct URL. That&#8217;s because the next character after the end of that truncation was a &#8220;smart apostrophe&#8221;, or a <a href=\"http:\/\/www.fileformat.info\/info\/unicode\/char\/2019\/index.htm\">right single quotation&nbsp;mark<\/a>.<\/p>\n<p>And when it hit Twitter&#8217;s automatic URL-shortening service, t.co, that service didn&#8217;t recognize &#x2019; as a valid URL character. It decided that must be the end of the URL. Hence the truncation.<\/p>\n<p>My reading of <a href=\"https:\/\/tools.ietf.org\/html\/rfc3986#section-2.5\">RFC 3986, &sect;2.5<\/a> is that a really good implementation should have spotted the high-range Unicode character and encoded it as %E2%80%99, leading to the URL: <a href=\"http:\/\/prospect.org\/article\/good-men%E2%80%99s-rights-movement-hard-find\">http:\/\/prospect.org\/article\/good-men%E2%80%99s-rights-movement-hard-find<\/a>. And indeed, when <a href=\"https:\/\/twitter.com\/kmactane\/status\/393796850932670464\">I handed people that URL<\/a>, it worked beautifully!<\/p>\n<p>In short order, The American Prospect had put a redirect in place. Now, the &#8220;smart apostrophized&#8221; URL automatically pushes through to a new version that simply omits the apostrophe altogether&nbsp;&mdash; much like the WordPress &#8220;slug&#8221; for this post itself. (The sharp-eyed among you may have noticed that this post&#8217;s title is a <a href=\"http:\/\/tvtropes.org\/pmwiki\/pmwiki.php\/Main\/SelfDemonstratingArticle\">self-demonstrating&nbsp;article<\/a>.)<\/p>\n<p>However, it was a little embarrassing for a while, when even using the &#8220;Tweet&#8221; button at the top of the article&nbsp;&mdash; a thing that looked awfully professional and well-tooled&nbsp;&mdash; would still result in the truncated URL and a 404&nbsp;page.<\/p>\n<h2 id=\"soWhatCanWeLearnFromThis\">So, What Can We Learn From This?<\/h2>\n<ol>\n<li><strong>Don&#8217;t use unusual characters in your URLs<\/strong> in the first place. Seriously, avoid them. Again, WordPress has made this super-easy for years; its slug-making routine strips pretty much everything that isn&#8217;t plain low-ASCII. (Of course, if your entire title is non-ASCII&nbsp;&mdash; say, you&#8217;re a Japanese site and your title is something like <span class=\"tooltip\" title=\"&quot;What does the fox say?&quot;\">\u72d0\u306f\u3001\u4f55\u3092\u8a00\u3044\u307e\u3059\u304b\u3002<\/span>&nbsp;&mdash; then the results may be idiosyncratic, at best. You&#8217;ll need some other method.)<\/li>\n<li><strong>Beware of third-party tools.<\/strong> One of the things that stymied people&#8217;s ability to share the Prospect&#8217;s article was that the &#8220;Tweet&#8221; button had code that misread the URL&#8217;s terminating character. But The American Prospect didn&#8217;t write that code themselves; they were using <a href=\"http:\/\/www.addthis.com\/\">AddThis<\/a>&#8216; social-sharing buttons. And that&#8217;s actually a very sensible thing for them to have done: This is what third-party code providers are supposed to be for. But in this case, their code wasn&#8217;t quite ready for what the Prospect threw at it.<\/li>\n<li><strong>Stay on top of what&#8217;s happening with your site.<\/strong> All things considered, this was a pretty small problem&nbsp;&mdash; it only lasted about a day, and particularly didn&#8217;t last for very long after the issue became clear (in my own corner of the Internet, at least, where I saw a fix only a few hours after I started seeing complaints and confusion about the issue). This is almost certainly because <em>someone was paying attention<\/em>, whether to social media, server hit logs, emails, or some other channel.<\/li>\n<\/ol>\n<p>Those are, of course, aimed at site operators. For people like the coders of AddThis&nbsp;&mdash; or anyone else making library code&nbsp;&mdash; I&#8217;d just reiterate the usual advice to <strong>be sure your tests cover lots of different cases<\/strong>, and especially edge cases! And <strong>read the relevant specs<\/strong> to be sure you know what you&#8217;re doing; don&#8217;t just wing&nbsp;it.<\/p>\n<p>There are probably other lessons to be learned from this. If you&#8217;ve thought of any, let me know in a&nbsp;comment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, The American Prospect published an article excoriating the &#8220;men&#8217;s rights&#8221; movement. It was a pretty good article, and well-received. Lots of people tweeted links to it&#8230; or, they tried&nbsp;to. Curiously, those tweets all broke in the exact same way, pointing at a truncated version of the correct URL. That&#8217;s because the next character after [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[37,149,7,150,69,35,36],"_links":{"self":[{"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/posts\/580"}],"collection":[{"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/comments?post=580"}],"version-history":[{"count":3,"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/posts\/580\/revisions"}],"predecessor-version":[{"id":634,"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/posts\/580\/revisions\/634"}],"wp:attachment":[{"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/media?parent=580"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/categories?post=580"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kagan.mactane.org\/blog\/wp-json\/wp\/v2\/tags?post=580"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}