Beware of big data biases

Companies large and small are rushing to coral the abundant and cheap data available in various social media outlets and dig into the readily available information of what people are thinking, feeling, doing and even intending to do in the hopes improving corporate decisions, campaigns and making more money.

Two computer scientists, one from Canada’s McGill University and the other from the Carnegie Mellon University and the other from Carnegie Mellon University in the United States, are warning that these huge datasets can be “misleading.”

big data, social media, analytics, researchers

McGill’s Derek Ruths and Juergen Pfeffer of Carnegie Mellon cautioned in article they published in the Nov. 28 issue of the journal Science, that big data users need to figure out how to correct for biases inherent in information gathered from Facebook posts, tweets, and other social media output.

Ruths is an assistant professor in the School of Computer Science at McGill.

Pfeffer is an assistant research professor at the Institute for Software research, School of Computer Science at Carnegie.

The two pointed out that thousands of research papers based on data collected from social media are published each year.

“Many of these papers are used to inform and justify decision and investments among the public and in industry and government,” Ruths said in an article in the MCGill Web site.

“Not everything that can be labelled as,Big Data is automatically great, said Pfeffer who was quoted in an article appearing in the Carnegie Mellon Web site. “,,,the old adage of behavioural research still applies: Know Your Data.”

Their research highlighted several issues with using big data. The McGill article posted some of the issues and the ways to address them:

  • Different social media platforms attract different users – Pinterest, for example, is dominated by females aged 25-34 – yet researchers rarely correct for the distorted picture these populations can produce.
  • Publicly available data feeds used in social media research don’t always provide an accurate representation of the platform’s overall data – and researchers are generally in the dark about when and how social media providers filter their data streams.
  • The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured. For instance, on Facebook the absence of a “dislike” button makes negative responses to content harder to detect than positive “likes”.
  • Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour.
  • Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65 per cent accuracy for typical users – even though studies (focusing on politically active users) have claimed 90 per cent accuracy.

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

Nestor E. Arellano
Nestor E. Arellano
Toronto-based journalist specializing in technology and business news. Blogs and tweets on the latest tech trends and gadgets.

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations 

By Derek Manky As predictions for 2024 point to the continued expansion...

Survey shows generative AI is a top priority for Canadian corporate leaders.

Leaders are devoting significant budget to generative AI for 2024 Canadian corporate...

Related Tech News

Tech Jobs

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Tech Companies Hiring Right Now