Wednesday, October 3, 2012

Three(ish) thoughts about FiveThirtyEight

By far the most entertaining campaign news of the week was that UnskewedPolls.com--which has been increasingly serving as central command for the Republican self-delusion brigade--had broken the Romney zero bound, and was now showing Obama with a two-point lead.  Whoops.

Presumably, this will lead to the creation UnskewedUnskewedPolls.com.  (One also wonders if, should Obama's lead continue to grow, UnskewedUnskewedUnskewedPolls.com would follow, and so on, until, eventually, there existed a perfect 1-to-1 ratio between American Republicans and Republican-leaning alternative polling sites, with the number of adjectives in the URL corresponding to a perfect ordinal ranking of individual partisanship.  I digress.)

But the news also reminded me, yet again, of the greatness of FiveThirtyEight.com.  Okay, I know--this isn't exactly bleeding-edge material.  That doesn't mean it isn't worth stepping back, from time to time, and appreciating the ingeniousness of Nate Silver's operation.  Lots of people try to predict elections, after all.  But somehow, nobody does it quite as well as Silver.

Considering that it's not conducting any polling itself, or generating any new data, how can FiveThirtyEight's model possibly outperform rigorous, academic election results models? In a word: volume.  Instead of deliberating extensively over the soundness of every variable, Silver has simply tossed everything he can think of into his model (polls, economy, history, and, I believe, in a major statistical faux pas, the model's own output).  The result, it turns out, just works. Add enough, and the flaws introduced by any particularly sloppy variables tend to get smoothed over by the sheer number of other variables.  Those variables have flaws too, but since there's no real reason to expect them all to bias the model in the same direction, it all more-or-less evens out in the end. Could you make a case that he should exclude, say, the effect of the economy ?  Sure, but if you did that, you'd also have to make a case that he should exclude any number of other variables, and you'd end up with a statistically-unimpeachable-but-basically-useless model.  It would no doubt be described as "elegant" and it would no doubt predict the wrong outcome as often as not.

So bravo, Mr. Silver, for your function-over-form statistics.

Still, the site is not without problems.  The prediction model is as great as ever, but unfortunately, since moving to the Times, the site's actual blogging hasn't really reflected that greatness.  It's hard to pin down, exactly, but something about the Times' style guidelines seems to be suffocating Silver's writing a bit.  The inverted pyramid, wholly inappropriate for blogging, has slowly crept into his prose--more and more of the articles have been dedicated to rehashing previously-explained concepts.  (I swear he's explained the idea of a gradually-receding "convention bump" at least twenty times in the last month.  Just write it once and link to it, man!)

But more than anything, I wish the site took a little more time to explain the use (or in some cases, uselessness) of some of the numbers it presents.  For instance, the "Return on Investment Index," which the site describes as "the likelihood that an individual voter will determine the electoral college winner."  What it actually means by this, as far as I can tell, is "Assuming that an individual voter determines the electoral college winner, what is the likelihood that he or she hails from a particular state?"  (Right now, Nevada comes in first, at 11.4%.)

The problem here is that the measure's results are conditional on an event that is so infinitesimally improbable that it may as well be impossible.  It's like saying "What are the chances someone who is struck by lightning four times while playing in the Stanley Cup final will flip a coin and get heads?"  The correct answer is simultaneously "50%" and "It doesn't matter because that will never happen."

And then there's the now-cast.  The now-cast is simultaneously useful and opaque.  It purports to measure the probability of a particular election outcome, assuming the election occurred today.  "What's the point of that?" I asked at first.  "The election isn't being held today, no matter how much Mitt Romney might want it to be over already."  But the now-cast provides a little bit of extra information about the model's election day predictions, even if the site's own tracker doesn't make this explicit.  Because Silver is determining the probability of a future event, there are really two potential sources of error in his predictions.  First, there's the chance that current polling might be wrong.  And then there's the chance that events might change between now and the election.  The now-cast conveniently strips out the second source of error, and tells us exactly how sure the model is that Barack Obama currently leads the race (98% sure, as it turns out).  Nonetheless, FiveThirtyEight only grants Obama about an 85% chance of winning the election, which gives us a somewhat more sophisticated view of the state of the race than would be immediately obvious: Romney can still plausibly win almost a sixth of the time, but only because, almost a sixth of the time, some external event causes a shift in the polls.  Coasting and hoping for a lucky roll of the statistical dice won't do anymore.

No comments:

Post a Comment