Smart Apostrophes: They’re a Problem (in URLs)

Recently, The American Prospect published an article excoriating the “men’s rights” movement. It was a pretty good article, and well-received. Lots of people tweeted links to it… or, they tried to.

Curiously, those tweets all broke in the exact same way, pointing at a truncated version of the correct URL. That’s because the next character after the end of that truncation was a “smart apostrophe”, or a right single quotation mark.

And when it hit Twitter’s automatic URL-shortening service, t.co, that service didn’t recognize ’ as a valid URL character. It decided that must be the end of the URL. Hence the truncation.

My reading of RFC 3986, §2.5 is that a really good implementation should have spotted the high-range Unicode character and encoded it as %E2%80%99, leading to the URL: http://prospect.org/article/good-men%E2%80%99s-rights-movement-hard-find. And indeed, when I handed people that URL, it worked beautifully!

In short order, The American Prospect had put a redirect in place. Now, the “smart apostrophized” URL automatically pushes through to a new version that simply omits the apostrophe altogether — much like the WordPress “slug” for this post itself. (The sharp-eyed among you may have noticed that this post’s title is a self-demonstrating article.)

However, it was a little embarrassing for a while, when even using the “Tweet” button at the top of the article — a thing that looked awfully professional and well-tooled — would still result in the truncated URL and a 404 page.

So, What Can We Learn From This?

  1. Don’t use unusual characters in your URLs in the first place. Seriously, avoid them. Again, WordPress has made this super-easy for years; its slug-making routine strips pretty much everything that isn’t plain low-ASCII. (Of course, if your entire title is non-ASCII — say, you’re a Japanese site and your title is something like 狐は、何を言いますか。 — then the results may be idiosyncratic, at best. You’ll need some other method.)
  2. Beware of third-party tools. One of the things that stymied people’s ability to share the Prospect’s article was that the “Tweet” button had code that misread the URL’s terminating character. But The American Prospect didn’t write that code themselves; they were using AddThis‘ social-sharing buttons. And that’s actually a very sensible thing for them to have done: This is what third-party code providers are supposed to be for. But in this case, their code wasn’t quite ready for what the Prospect threw at it.
  3. Stay on top of what’s happening with your site. All things considered, this was a pretty small problem — it only lasted about a day, and particularly didn’t last for very long after the issue became clear (in my own corner of the Internet, at least, where I saw a fix only a few hours after I started seeing complaints and confusion about the issue). This is almost certainly because someone was paying attention, whether to social media, server hit logs, emails, or some other channel.

Those are, of course, aimed at site operators. For people like the coders of AddThis — or anyone else making library code — I’d just reiterate the usual advice to be sure your tests cover lots of different cases, and especially edge cases! And read the relevant specs to be sure you know what you’re doing; don’t just wing it.

There are probably other lessons to be learned from this. If you’ve thought of any, let me know in a comment.

Post a Comment

Your email is never shared. Required fields are marked *

*
*