Data breach of 48 million records was scraped from public social network pages

A data breach of 48 million personal records first revealed to a House of Commons committee was caused by an error made by business data search service LocalBlox, according to cyber security vendor UpGuard.

A misconfigured Amazon Web Services (AWS) S3 bucket was discovered by Chris Vickery, a member of the UpGuard Cyber Risk Team on Feb. 18. As has been the case in many misconfigured S3 buckets over the past 18 months, the data was externally exposed to the public Internet.

Upon examining the contents, UpGuard discovered 48 million personal records that were assembled by LocalBlox. At least in part, the data was assembled by scraping public-facing websites of social networks including Facebook, LinkedIn, Twitter, and real estate site Zillow.

On Wednesday morning, UpGuard published an article explaining the breach in fuller detail. Facebook was a victim in the case of this breach, along with its users, as data is scraped for purposes that no one ever agreed to. Facebook recently took steps to restrict the practice of data scraping from its profile pages. On Apr. 4 Facebook announced it would no longer allow account searches to be conducted using a phone number or email address.

LocalBlox scraped the Facebook portion of the data using HTML, not its API, says Dan O’Sullivan, report author and cyber risk analyst at UpGuard. He spoke with IT World Canada on the phone. Data scraping is a very common practice and LocalBlox wasn’t trying to hide its techniques.

“They are advertising on the basis of this. That they scrape these social media accounts to give you, the paying customer, the best insights into user data,” he says.

On Tuesday, Vickery was a guest of the House of Commons Standing Committee of Access to Information, Privacy and Ethics. There, he made reference to the breach of 48 million records that included Facebook data. He was responding to a question about how detailed a data breach involving Facebook data could get. He indicated that it was possible that personal messages were involved in the breach.

That’s not the case, O’Sullivan says. Vickery was merely referring to social media posts that could have been scraped.

Also published Wednesday morning was a story by ZD Net reporter Zack Whittaker, who was working with Vickery. According to the story, Vickery disclosed the breach to LocalBlox and it was secured hours later.

In an interview with LocalBlox chief technology officer Ashfaq Rahman tells ZD Net that most of the 48 million records were just made up for internal testing. He said that no other individual besides Vickery is believed to have accessed the S3 bucket.

The websites affected by the data scraping all say that the practice violates their terms of service.

Securing S3 buckets

Even though organizations from Verizon to the Pentagon have been caught with S3-related data breaches, securing Amazon’s storage service should be simple enough. A S3 bucket comes password-protected by default and an administrator must configure it to be externally accessible. Amazon also recently added an orange warning indicator to its dashboard for any S3 bucket that is made public.

“If you wade through our archives of previous breach reports, most of them are S3 exposures,” O’Sullivan says. “I don’t say that to pick on Amazon because the default setting on an S3 bucket is secured and password protected.”

By its nature of being easy to set up, S3 buckets are broadly accessible and may sometimes be in the hands of administrators that don’t appreciate the finer points of user privacy and security, he says.

AWS customers can also use the free Trusted Advisor feature to check S3 bucket permissions, or use AWS CloudTrail to monitor account activity and actions taken on their infrastructure.



Would you recommend this article?


Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.

Jim Love, Chief Content Officer, IT World Canada

Featured Download

Brian Jackson
Brian Jackson
Former editorial director of IT World Canada. Current research director at Info-Tech

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations 

By Derek Manky As predictions for 2024 point to the continued expansion...

Survey shows generative AI is a top priority for Canadian corporate leaders.

Leaders are devoting significant budget to generative AI for 2024 Canadian corporate...

Related Tech News

Tech Jobs

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Tech Companies Hiring Right Now