At the heart of the Australian government’s push for mandatory data retention lies a fiction: that there’s stuff called “metadata” — data about data — that’s separate from and less revealing of our private lives than the “content” of our communications. Stuff that’s just “billing data”. As I’ve written elsewhere, anyone still pushing that angle is either a liar or a fool.

The government has yet to say exactly what data it plans to force internet service providers to collect. But in 2012 under the previous government, the Attorney-General’s Department tabled a one-page working definition (PDF) in Senate Estimates, after much prodding by Greens Senator Scott Ludlam. It makes little sense technically, and refers vaguely to recording data about “communication” via an internet “service” without explaining what categories of communications and services would be covered.

According to iiNet’s chief regulatory officer Steve Dalby, the AGD and law enforcement agencies have floated at least three different suggestions in recent years. “We’re confused by the contradictory comments and I expect our policy makers are, too,” he wrote in a blog post that outlines just how extensive metadata can be for certain kinds of communication.

However, one consistent government line is that mandatory data retention would only involve data that service providers are already collecting. Each time your router or wi-fi dongle sets up your internet connection, your ISP will log the date and time, the internet protocol (IP) address it assigns your device, the duration of the connection, and the total amount of data transferred so it can be charged against your monthly quota. ISPs typically keep this data long enough to resolve any customer billing disputes. This is presumably the “billing data” the politicians refer to.

But the government also wants to be able to track individuals’ emails, the websites they visit, and perhaps other ill-defined “communications”. Service providers usually log emails, but only long enough to investigate technical faults — a few weeks at most. The logs usually record the date and time the email was sent, the IP address of the computer sending it, the sender’s email address, the recipient’s email address, the size of the email, the IP address of the computer it was delivered to and, quite often, the subject line — as well as certain technical information for troubleshooting. Whether the subject would be considered to be “metadata” or “content” is an open question.

“A pattern of calls or website visits can reveal an obvious narrative … We don’t need the content to know what’s happening.”

Your web browsing may not necessarily be logged by your ISP, but most websites log the traffic they receive. Those logs include, as a minimum, your computer’s IP address, the type and version number of your web browser and computer operating system and, if you clicked a link to reach the site, the web address (or URL, which stands for uniform resource locator) of the page you clicked from. Every page you visit is logged, along with every image and other element within those pages, so these log files can get big — so they’re usually trashed once the web traffic reports are done.

In the smartphone era, all of this activity can be matched to your location. Knowing the cell towers that you’re connecting through gives a rough area, on the scale of a kilometre or so in rural areas, or a hundred metres in cities. App providers might be tracking you with more precision, down to a few metres in the city. What logs they keep is up to them.

All of this data can be extremely revealing, even without knowing the content of a communication.

“Phone metadata is unambiguously sensitive, even over a small sample and short time window. We were able to infer medical conditions, firearm ownership and more, using solely phone metadata,” said Stanford University doctoral student Jonathan Mayer, who worked with the phone records of 546 volunteers, matching phone numbers against the public Yelp and Google Places directories to see who was being called.

A pattern of calls or website visits can reveal an obvious narrative. One participant in Mayer’s research called local neurology groups, a specialty pharmacy, a rare condition management service, and a pharmaceutical hotline used for multiple sclerosis. We don’t need the content to know what’s happening.

So how much of this information does the government want to capture? We don’t know.

“This so-called metadata is … It’s not what you’re doing on the internet, it’s the sites you’re visiting,” Prime Minister Tony Abbott said on Channel Nine this morning. Even if, as now clarified, Abbott doesn’t want browsing history, as that Stanford research and others show, in practice it’s a distinction without a difference.

We also await clarification of who’s going to pay ISPs and online service providers to set up, secure and maintain the data storage. iiNet, for example, has estimated it would cost it up to $100 million. Noted network architect Tony Abbott disagrees. “Well, I don’t know why they would be saying that because this is information which is already kept. It’s information which is currently kept, it’s information which is currently done with, it’s embedded in the current price. It’s already factored into current pricing structures,” Abbott told AM this morning.

Except they don’t. Much of the data is thrown out after a few weeks, not stored for years. What data is captured is chosen for technical troubleshooting. That may not match the government’s needs.

It’d be handy if we had a definition …