Careers & Education Data & Analytics Infrastructure Leadership

Beware of big data biases

December 1, 2014

Companies large and small are rushing to coral the abundant and cheap data available in various social media outlets and dig into the readily available information of what people are thinking, feeling, doing and even intending to do in the hopes improving corporate decisions, campaigns and making more money.

Two computer scientists, one from Canada’s McGill University and the other from the Carnegie Mellon University and the other from Carnegie Mellon University in the United States, are warning that these huge datasets can be “misleading.”

Related Articles

Ryerson continuing education offers big data and analytics certificate

When small data is better than big data

10 social media tools for enterprises

McGill’s Derek Ruths and Juergen Pfeffer of Carnegie Mellon cautioned in article they published in the Nov. 28 issue of the journal Science, that big data users need to figure out how to correct for biases inherent in information gathered from Facebook posts, tweets, and other social media output.

Ruths is an assistant professor in the School of Computer Science at McGill.

Pfeffer is an assistant research professor at the Institute for Software research, School of Computer Science at Carnegie.

The two pointed out that thousands of research papers based on data collected from social media are published each year.

“Many of these papers are used to inform and justify decision and investments among the public and in industry and government,” Ruths said in an article in the MCGill Web site.

“Not everything that can be labelled as,Big Data is automatically great, said Pfeffer who was quoted in an article appearing in the Carnegie Mellon Web site. “,,,the old adage of behavioural research still applies: Know Your Data.”

Their research highlighted several issues with using big data. The McGill article posted some of the issues and the ways to address them:

Different social media platforms attract different users – Pinterest, for example, is dominated by females aged 25-34 – yet researchers rarely correct for the distorted picture these populations can produce.
Publicly available data feeds used in social media research don’t always provide an accurate representation of the platform’s overall data – and researchers are generally in the dark about when and how social media providers filter their data streams.
The design of social media platforms can dictate how users behave and, therefore, what behaviour can be measured. For instance, on Facebook the absence of a “dislike” button makes negative responses to content harder to detect than positive “likes”.
Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behaviour.
Researchers often report results for groups of easy-to-classify users, topics, and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65 per cent accuracy for typical users – even though studies (focusing on politically active users) have claimed 90 per cent accuracy.

POPULAR CATEGORIES

Content Types

ALL CATEGORIES

Beware of big data biases

Would you recommend this article?

Share

Featured Download

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations

Survey shows generative AI is a top priority for Canadian corporate leaders.

Related Tech News

Pilot cybersecurity training program for women to recruit third cohort

Coffee Briefing Mar. 5 – High speed internet access for 150 NWT...

Fab wars: Intel, Tata Group, CG Power all launch foundry plans

Tech Jobs

Subscribe to our Newsletter

Tech Companies Hiring Right Now

Popular Stories This Week

Meta’s new release sparks debate about open versus closed source AI: Hashtag Trending for Friday, April 19, 2024

Cyber Security Today, April 24, 2024 – Good news/bad news in Mandiant report, UnitedHealth admits paying a ransomware gang, and more

Cyber Security Today, April 19, 2024 – Police bust phishing rental platform, a nine-year old virus found on Ukrainian computers, and more

ITWC Network

Follow Us