Researchers can successfully re-identify people using only their (supposedly) anonymous credit card metadata. And it’s remarkably easy too! According to a new Science study, all it takes is a few bits of information that are as simple as: Where you had dinner and drinks last weekend, and where you returned that shirt you thought you could pull off.
The metadata that accompany any record of credit card use includes sensitive information. This is typically then “simply anonymized,” or stripped of any names, addresses, phone numbers, account numbers, and any other obvious identifiers. But how secret is it really?
To study how well anonymization protects the privacy of credit card users, an MIT team led by Yves-Alexandre de Montjoye analyzed a dataset containing three months’ worth of credit card transactions from 1.1 million people living in an unidentified country. These financial data traces had been anonymized, but they still include the locations of the shops, the dates of the purchase, and the amounts.
As it turns out, all the researchers needed were the date and location of four purchases to re-identify 90 percent of the individuals in the dataset. Furthermore, adding just one more bit of data—the price of a transaction—upped the likelihood (i.e. risk) of re-identification by 22 percent.
What's worse is that it didn’t even really matter if the time and place of the purchases were sometimes vague. Instead of a specific store on a particular day, they had information about a geographic area visited within 15 days, for example. Even then, individuals can be matched up with their anonymized information with just a few additional data points.
And the technique isn’t limited to just, say, receipts and purchase orders. The team was able to identify even more people when they included coarser data -- like that Instagram photo of a misspelled name on a coffee cup or a tweet about a shiny, new iPhone.
Here’s what the team did, according to the MIT News Office: First, they tagged all the purchases made on the same credit card with the same random identification number. One number for each credit card user in the dataset. Next, for each number, the team would randomly select purchases and then determine how many other customers’ purchase histories consists of the same data points. Women in higher income brackets are the easiest to identify this way, the researchers found, possibly because their shopping patterns are more distinctive.
"We're building this body of evidence showing how hard it actually is to anonymize large sets of data like credit cards, mobile phones, and browsing information," de Montjoye tells New Scientist. "We really need to think about what it means to be make data truly anonymous and whether it's even possible."