About 90% of an iceberg lies beneath its surface. So it is with political commentary. It’s easy enough to open a national broadsheet newspaper after an election and scan the pages filled with phrases like “hostage to entrenched interests” and “politically fabled Labor heartland” and assume that all political commentary is equally and uniformly silly.

But that would be lazy. Each columnist is unique, and by careful examination of the underlying data, using only the most up-to-date and high-tech statistical techniques, we can actually explore the heterogeneity of hacks. Whose commentary is based on nothing more than vague impression and anecdote, and whose is instead based on dodgy mathematics? Today, I propose to you that we undertake such analysis. And who better to begin with than News Corp columnist John Black? We saw a little while ago how he uses a very questionable statistical “technique”, one that systematically overstates the explanatory power of the resulting model and the magnitude of the effects of variables, and understates the uncertainty in estimates of the effects of variables, among other problems, to try to determine who voted for Labor in Queensland’s recent election, just by looking at census data and the aggregate votes in Queensland seats.

(Most social scientists would be astounded if he actually could do that, since it would mean he has overcome the ecological fallacy, but he’s a smart guy, let’s give him the benefit of the doubt: he probably has solved the most intractable methodological challenge in statistical inference, probably when he had nothing better to do during Senate estimates.)

We also saw that it was possible for me, with exactly the same technique, to get models of the Queensland electorate with exactly the same level of explanatory power from entirely randomly generated numbers. But John Black thought his model of the Queensland electorate was nonetheless a pretty good guide to the New South Wales election. On March 1, about a month before the state went to the polls, he wrote:

“When we took the Queensland demographics underpinning the January 31 state result and applied them to identical demographics in current NSW seats, Labor won 46 out of 93 seats … We felt pretty comfortable carrying out this projection of the Queensland result as we’ve been doing it for 40 years and the model was statistically powerful, explaining 84 per cent of the variation in votes across the 89 seats in Queensland … With an error of estimate of 4 per cent, we don’t expect the projection to be perfect but it should be a better guide than the usual incorrect assumption that uniform swings will apply. Bearing in mind that Labor lost every second vote in some of its safest seats in the previous election, this assumption is even sillier than usual.” (Emphasis mine).

The results of the election are (more or less) in, so we can assess these claims: did Black’s “projection” really make “a better guide than the usual incorrect assumption that uniform swings will apply?

But for a guy who has come across a revolutionary technique for forecasting elections — hey, he’s been using it for 40 years! — John Black sure is modest after the election. Just look at his weekend column on the aftermath of the NSW election. He sure knows a lot about people who don’t vote for the Greens, as in this prolier-than-thou paragraph:

“The state seats where the Greens failed to win many votes were dominated by mainstream Australian suburban families with children, who drive themselves to work daily or ride as a car passenger.

“The parents tend to have certificate qualifications in engineering for dad and hospitality for mum, with dad employed as a machine operator in manufacturing or a transport driver, and mum finding it very difficult to get a hospitality job which pays enough to earn any realistic income and has flexible hours for her to look after three kids in the local government school system.”

Maybe we should have a look at how Black’s model did, especially compared the the “uniform swing” model that he reckons it’s superior to.

So let’s put the two following models to the test: John Black’s Queensland demographic-stepwise-bullshit model (calculating, in a sense, the impact of demographic variables in the Queensland election on the ALP vote and using the coefficients from that model to project NSW results based on NSW demographics, the results of which Black generously provided on his website) and a uniform-swing model based on the final polls in which the pollsters were predicting a 10-percentage point swing to the Labor Party (that is, a model whose projection is just the 2011 results, adjusted for boundary changes and with 10 percentage points added to it). Comparing their projections to the actual result, what do we find?

Perhaps not unsurprisingly, John Black’s model — which had no input from polling at all, even at an overall state level — overpredicted the swing to Labor. It projected a 17% swing to Labor on the two-party preferred measure in the “traditional contest” seats (where there was a TPP fight between Labor and the Coalition); in fact, it was about 10%, almost exactly what the final polls were predicting.

This means that John Black’s model dramatically overpredicted how many of these “traditional contests” the ALP would win– his model predicted 44 wins for the ALP in these contests, while the uniform-swing model predicted 35 victories. In reality, there were only 32.

On average, the uniform-swing model missed the result by about 5 percentage points (either over- or under-shooting). John Black’s model, on the other hand, missed by nearly 12 points.

Slice it as you will, John, the weird Queensland-demographics-in-NSW model you’ve created actually performed much worse than the uniform-swing model. Maybe you could write a column in The Australian about that. Let me get you started:

“The pundits who got the NSW election wrong tend to be former Queensland Labor Senators with fishing hobbies and boutique ‘analytics’ firms who haven’t opened a statistics textbook since the 1960s, patronised by Boomer managers with no clue about data analysis. Often, they have obtained a vanity column in a vanity newspaper in which they cannot help but promote, at every opportunity and regardless of nominal relevance, their business interests. Their names are most often short monosyllables. And most characteristically, they tend to be enormously and, sadly, unjustifiably confident in their risibly flimsy analysis.”

*This article was originally published at Tom Westland’s blog, Apocalypse of Thomas