Study shows problems with anonymizing data to improve security

With an increasing number of organizations looking to leverage so-called big data comes increasing risk that the datasets created could be misused by internal staff or hackers.

To combat this many companies try to “anonymize” the data by making the numbers vague — in addition to deleting names, they also add “binning,” which creates discrete bins that correspond to a range of values and assign the records to those bins. That might change the time and location of a retail purchase from a day into a week, and a store location into a general region.

But according to a report researchers at the Massachusetts Institute of Technology (MIT) examining three months of credit card transactions from an unnamed have found that may not be enough to hide the identity of people in the data.

The researchers, who published their results in the latest issue of Science magazine, found that four dates and locations of recent purchases are all that is needed to identity 90 per cent of people making the purchases. If price information is included, then only three transactions are necessary.

The study used anonymized data on 1.1 million people and transactions at 10,000 stores.The bank had stripped away names, credit card numbers, shop addresses, and even the exact times of the transactions, said the magazine’s synopsis. All that was left were the metadata: amounts spent, shop type—restaurant, gym, or grocery store, for example—and a code representing each person.. More than 40 per cent of the people could be identified with just two data points, it says, while five purchases identified nearly everyone.

How? By correlating the data with outside information.  First, researchers pulled random observations about each individual in the data: information equivalent to a single time-stamped photo. These clues were simulated, the report says, but people generate the real-world equivalent of this information day in and day out, for example through geolocated tweets or mobile phone apps that log location. A computer then used those clues to identify some of the anonymous spenders. The researchers then fed a different piece of outside information into the algorithm and tried again, and so on until every person was de-anonymized.

The report is a caution not only to organizations that try to de-personalize data they hold themselves, but also to companies that collect data and re-sell it to other parties.

“In light of the results, data custodians should carefully limit access to data,” Science quotes Arvind Narayanan, a computer scientist at Princeton University who was not involved with the study.

Howard Solomon
Howard Solomon
Currently a freelance writer, I'm the former editor of and Computing Canada. An IT journalist since 1997, I've written for several of ITWC's sister publications including and Computer Dealer News. Before that I was a staff reporter at the Calgary Herald and the Brampton (Ont.) Daily Times. I can be reached at hsolomon [@]

Would you recommend this article?


Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.

Jim Love, Chief Content Officer, IT World Canada

Featured Download

ITW in your inbox

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

More Best of The Web