Return to site

How Four Students Took Anonymous Browsing Histories and Connected Them To Facebook Accounts

There is a Lot of Money in Convincing You that Anonymous Data is Anonymous

If you've been paying attention to Internet Service Providers, or the legislators that represent them, you've heard a consistent narrative:

"This Browser History Law isn't a big deal. All your data will be anonymous, and if you don't like it, you can always switch. Net Neutrality isn't good for the consumer. It's better to have ISPs deciding how you use the internet, because after all, they've never acted against your best interests."

Besides, does it really matter if ISPs sell a bunch of bulk, anonymized data? Would it matter if it turned out the data wasn't anonymous?

Spoiler Alert: Data isn't Anonymous-- Not Even Close

Anonymous data isn't anonymous. It's laughably un-anonymous. In fact, it's so un-anonymous, it's basically an inside joke within the data science community.

And intuitively, that's the point, right? If your data was actually anonymous, it wouldn't be useful to marketers. They depend on data to paint pictures of actual people, and if data can't be used to represent individuals, then it isn't valuable.

Internet Service Providers strip names and addresses from their rolls, and call it anonymous. However, those are easy bits of info to re-discover. 

For example, let's take a random John Doe. if you know John is, say, a white male from southeast Cleveland who banks with Wells Fargo, is 32, drives a 2009 Honda Civic, graduated from Oklahoma State, a carpenter, married, with one daughter, voted Republican in the last election, and who has a browsing history that looks like this, how hard do you think it would be to figure out his name?

None of the above info is considered "personally identifiable," so it's not filtered out of the data. In aggregate, however, it paints a pretty compelling picture of a single person's life, and if you wanted to know who he was by name, it wouldn't be too hard to figure out. Especially if you were, for example, a marketing agency with hundreds of millions of dollars available to spend on lead validation.

How To Turn Data Back Into People

These guys became famous when they figured out people's names, addresses, and political preferences. From their Netflix Reviews. Not bank statements, not browsing histories-- this team of grad students figured out sensitive personal information from the shows they watched on Netflix.

Another group out of Stanford went even farther. They took a bunch of browsing histories, and, like taking shredded documents and taping them back together, linked those histories to public Facebook profiles.

I don't know about you, but this is literally, literally, my worst nightmare.

Protecting Yourself

Protecting yourself is simple. Opt out of ISP data collection programs, and they are legally prohibited from selling your browser history. ISPs want this process to be hard, so we made it simple enough to do in one click. You can check it out here.