[email protected]
Engineering

Commenting on Data Leaks

We have an entirely new kind of thing in the world

By

Still life with Leeks (Carl Schuch)

Have you heard about the Panama Papers? Or seen the Panama Papers website? Or seen the #panamapapers hashtag? Or read the story about how reporters “pulled off” the Panama Papers? Or read about how the New York Times didn’t get the Panama Papers? Or about how the Panama Papers are 2.5 terabytes of data! (Dr. Evil places pinkie finger by mouth. Terabytes!) Spread over 11.5 million documents.

The Panama Papers are two things: First, they’re a giant pile of unstructured data extracted from a mole working within a Panamanian law firm. Much of this data apparently relates to the structuring and development of “offshore” corporate entities—many of them above-board, many of them places to hide cash for the global kleptocracy. Second, after much massaging and exploration, and global co-operation among journalists, they are now The Panama Papers!, a global event in journalism that is touted, complete with charts showing just how big they are.

This is the first time that a data leak has been treated like a product. It’s surreal and a little like any product launch—except instead of a man from Apple standing on stage in a bright purple shirt, stroking gorilla glass, or Samsung insulting all womankind, it’s a global consortium of journalists and some rather well-thought-out home page work: A leak, a plan, and a brand: The Panama Papers. Brought to you by Journalism.

There’s a weird message here—throw away your garbage State Department telegrams, your stupid Ashley Madison databases, and get yourself some Panama Papers, 2.5 terabytes of pure raw corruption. Check out this graphic from The Guardian:

The gray box is this leak. So big! Leave aside for a moment that the size of the leak is not really a good proxy for anything—is this the way things are going to go? By terabytes?

So far, the Panama Papers have caused the Prime Minister of Iceland to resign. That’s a big deal, and there’s a lot more to come. But it also feels a little (maybe even a lot) that the product here is not just the news that we read and sometimes even pay for, but journalism. Starting today, giant global branded data leaks are the new product of journalism—and also reason to justify that institution’s existence. Only journalists, it seems, are willing to look through all those files. Have you thanked them for their service?

I wrote about this very subject for The New Republic and filed the article last month, before all this news came out. They just put the article up this morning. But the Panama Papers are part of a pattern:

[The movie] Spotlight was set around 2001, and it features a lot of people looking at things on paper. The problem has changed greatly since then: The data is everywhere. The media has been forced into a new cultural role, that of the arbiter of the giant and semi-legal database. ProPublica, a nonprofit that does a great deal of data gathering and data journalism and then shares its findings with other media outlets, is one example; it funded a project called DocumentCloud with other media organizations that simplifies the process of searching through giant piles of PDFs (e.g., court records, or the results of Freedom of Information Act requests).

At some level the sheer boredom and drudgery of managing these large data leaks make them immune to casual interest; even the Ashley Madison leak, which I downloaded, was basically an opaque pile of data and really quite boring unless you had some motive to poke around.

We have a new kind of thing in the world—the giant global leak (Snowden, Wikileaks, etc.)—and it has become the de-facto job of the media to parse through and process that data, make it usable, and also to keep it private from the larger population. A new responsibility of the media in the 2010s is to turn leaks into products that can be consumed by the larger population and to decide which data from the leaks should be made available. In return the media deals with the complexity of the data, and manages the litigation and personal risks of exposing secrets.

For all the data there is in the world, almost none of it is in a usable, browsable form. (I’ve talked about this a little before, in an earlier post.) There existed inside the law firm Mossack Fonseca a pretty accurate map of a very large part of the corrupt portion of the world economy. The map was latent—it was absolutely not in Mossack Fonseca’s interest to draw that map; it’s totally in the interests of the press to draw it, though. It’s also in their best interest to trickle out the data for as long as possible and get as many people to subscribe or view their articles as possible.

It’s not totally clear if this leaker-journalism axis is in the best interest of the global population, but the International Consortium of Investigative Journalists would like you to think so. I don’t have a clear answer, because I don’t have access to the data. Only a few hundred people do. So it will take a long time to understand what the Panama Papers represent not to journalists, but to the rest of us.

Story published on Apr 6, 2016.