Metadata and content: a distinction without difference

Anyone still pushing the angle that stuff called "metadata" is less revealing than stuff called "content" is either a fool or a liar, writes Stilgherrian.

At the heart of the Australian government’s push for mandatory data retention lies a fiction: that there’s stuff called “metadata” — data about data — that’s separate from and less revealing of our private lives than the “content” of our communications. Stuff that’s just “billing data”. As I’ve written elsewhere, anyone still pushing that angle is either a liar or a fool.

The government has yet to say exactly what data it plans to force internet service providers to collect. But in 2012 under the previous government, the Attorney-General’s Department tabled a one-page working definition (PDF) in Senate Estimates, after much prodding by Greens Senator Scott Ludlam. It makes little sense technically, and refers vaguely to recording data about “communication” via an internet “service” without explaining what categories of communications and services would be covered.

According to iiNet’s chief regulatory officer Steve Dalby, the AGD and law enforcement agencies have floated at least three different suggestions in recent years. “We’re confused by the contradictory comments and I expect our policy makers are, too,” he wrote in a blog post that outlines just how extensive metadata can be for certain kinds of communication.

However, one consistent government line is that mandatory data retention would only involve data that service providers are already collecting. Each time your router or wi-fi dongle sets up your internet connection, your ISP will log the date and time, the internet protocol (IP) address it assigns your device, the duration of the connection, and the total amount of data transferred so it can be charged against your monthly quota. ISPs typically keep this data long enough to resolve any customer billing disputes. This is presumably the “billing data” the politicians refer to.

But the government also wants to be able to track individuals’ emails, the websites they visit, and perhaps other ill-defined “communications”. Service providers usually log emails, but only long enough to investigate technical faults — a few weeks at most. The logs usually record the date and time the email was sent, the IP address of the computer sending it, the sender’s email address, the recipient’s email address, the size of the email, the IP address of the computer it was delivered to and, quite often, the subject line — as well as certain technical information for troubleshooting. Whether the subject would be considered to be “metadata” or “content” is an open question.

“A pattern of calls or website visits can reveal an obvious narrative … We don’t need the content to know what’s happening.”

Your web browsing may not necessarily be logged by your ISP, but most websites log the traffic they receive. Those logs include, as a minimum, your computer’s IP address, the type and version number of your web browser and computer operating system and, if you clicked a link to reach the site, the web address (or URL, which stands for uniform resource locator) of the page you clicked from. Every page you visit is logged, along with every image and other element within those pages, so these log files can get big — so they’re usually trashed once the web traffic reports are done.

In the smartphone era, all of this activity can be matched to your location. Knowing the cell towers that you’re connecting through gives a rough area, on the scale of a kilometre or so in rural areas, or a hundred metres in cities. App providers might be tracking you with more precision, down to a few metres in the city. What logs they keep is up to them.

All of this data can be extremely revealing, even without knowing the content of a communication.

“Phone metadata is unambiguously sensitive, even over a small sample and short time window. We were able to infer medical conditions, firearm ownership and more, using solely phone metadata,” said Stanford University doctoral student Jonathan Mayer, who worked with the phone records of 546 volunteers, matching phone numbers against the public Yelp and Google Places directories to see who was being called.

A pattern of calls or website visits can reveal an obvious narrative. One participant in Mayer’s research called local neurology groups, a specialty pharmacy, a rare condition management service, and a pharmaceutical hotline used for multiple sclerosis. We don’t need the content to know what’s happening.

So how much of this information does the government want to capture? We don’t know.

“This so-called metadata is … It’s not what you’re doing on the internet, it’s the sites you’re visiting,” Prime Minister Tony Abbott said on Channel Nine this morning. Even if, as now clarified, Abbott doesn’t want browsing history, as that Stanford research and others show, in practice it’s a distinction without a difference.

We also await clarification of who’s going to pay ISPs and online service providers to set up, secure and maintain the data storage. iiNet, for example, has estimated it would cost it up to $100 million. Noted network architect Tony Abbott disagrees. “Well, I don’t know why they would be saying that because this is information which is already kept. It’s information which is currently kept, it’s information which is currently done with, it’s embedded in the current price. It’s already factored into current pricing structures,” Abbott told AM this morning.

Except they don’t. Much of the data is thrown out after a few weeks, not stored for years. What data is captured is chosen for technical troubleshooting. That may not match the government’s needs.

It’d be handy if we had a definition …

Comments

5 Comments

Most voted

Newest Oldest

Inline feedbacks

View all comments

Yclept

10 years ago

Yet again Tony displays that he doesn’t have a clue.

Maybe it’s time to go off the grid.

klewso

Abbott was never much for detail – he was always more into his “friendly’s” briefs.
Costello said as much, about his “slow eye” for (economic?) detail.
He admitted as much on The 7:30 Report (O’Brien) when asked for details about an issue (“technology”?) some years ago – something along the lines of “I haven’t read it but I have spoken to someone/people who have”?

a.lewis@unsw.edu.au

Noted network architect Tony Abbott disagrees.

Worth the price of admission. 🙂

Hard to argue with Tony.

The Old Bill

Tony might ask God to put all this data on a little cloud for him for free. That way there won’t be a great, pause, big, long pause, tax on internet use for the opposition to hound him with. (With ideas like this I am hoping to become a Liberal Party adviser.)

Stuart Coyle

So, two years of internet access data kept by a ramshackle mob of ISPs and Telcos, and we are to trust that it won’t get leaked to third parties, used for marketing, used for profiling the political views of individuals, used for blackmail, used for suppressing dissent, used for targeting minor offenders, used by corrupt developers and their political cronies or used by criminals for identity theft or other nefarious purposes.

At least it will stop all the terrorism that we have been seeing lately in this country…

Of course the real terrorists already use TOR, VPNs, encryped email, throwaway mobile phones. The obvious next step is to make all these things illegal. We have a country rapidly approaching that perfect state where what is not illegal is mandatory.

Copy link	Email
Facebook	Twitter	LinkedIn

Metadata and content: a distinction without difference

Related

About the Author

Topics

Send to their inbox

Want some assistance?

Metadata and content: a distinction without difference

Related

About the Author

Topics

Send to their inbox

Share this with friends