Don’t do big data analytics alone, says expert

Jumping off a diving board into a pool of big data doesn’t have to be a lonely job, says an analytics expert.

“Don’t do it alone,” Alex Mair, architecture director of Canada Health Infoway’s emerging technology group, told a Toronto conference on big data Thursday. “There’s lots of opportunities to collaborate and be successful.”

Infoway is the not-for-profit agency that advises provincial governments on creating architectures for sharing data. And while its advice may be more relevant in a sector like health, where hospitals or regional health authorities don’t compete against each other, Mair said in an interview it does apply to enterprises that don’t have large IT departments.

Mair, who has worked in analytics in the private as well as the public sector, said organizations need to understand what big data analytics could mean before starting out.

“You need to find what is the use case that’s going to provide me value,” he said. Then ask “how am I going to measure that so I can develop a business case going forward?”

As with all big projects, he also urged those interested to find senior people in the organization willing to champion it.

Big data – defined by some as more data than can be handled by traditional analytics software, and by others, like Mair, as processing unstructured near real-time data – offers the possibility of giving more insight into customers intentions and possibly predicting the future.

One presenter at the conference analyzes big data to predict trends from analyzing social media. Another has just launched software that predicts which students are at risk of failing.

The conference attracted a wide range public and private sector analysts who wanted to learn more about how to deal with huge volumes of data.

Attendees included Dr. Eugene Wen, vice-president and chief statistician of Ontario’s Workplace Safety and Insurance Board, who said in an interview he wants to do critical data modeling better.

The agency, which pays compensation to injured workers, has 100 years of data, but isn’t using advanced techniques to bet the best from it, he said.

There was Ty Lunsford, senior business intelligence analyst for Denver-based SendGrid, an email platform used by developers for sending transactional email in Web apps, who admitted he was partly there to hear from a competitor making a presentation, but also to learn how analytics could be better used by his company.

SendGrid has sent 115 billion emails since 2009, he said. “With all this data, we sure could be doing a lot more” in the way of analytics.

Mihaela Smochina, a database advisor at the Ontario Agency for Health Protection, which owns public health laboratories, was there to learn how others resolve integrating data from disparate platforms and improve the ability to make predictions.

What they and other attendees are getting – the conference continues Friday – are brief case studies from a number of Canadian and U.S. organizations.

Jordan Christensen, vice-president of big data for Toronto-based e-book retailer Kobo, said his company decided last year to use big data from reader searches to help make recommendations to buyers.

Their strategy was drawn from an article in an IEEE magazine that argued simple models and large datasets trump elaborate models and less data – appealing, because Kobo could only assemble a big data analytics team of five.

In addition, it was decided the unit should be a profit centre, generating at least enough revenue from its suggestions to pay for the division’s staff.

Arek Kaczmarek, data engineer at Web travel site Expedia, explained how the company uses the open source Hadoop framework to tie hundreds of terabytes of data from several databases. This lets analysts query Hadoop directly rather than create single-purpose data marts. To improve query speed, the data is not normalized (that is, it contains redundant copies of data).

Rakesh Kulkarni, research scientist at Xerox Inc., recounted how his company created an algorithm for the city of Los Angeles which plows through reams of parking data to set the hourly price of parking in the city based on the number of available parking spots.

Charumitra Pujar, principal data scientist at of Points.com, an online service that integrates loyalty programs from many companies, said his company decided big data needed a new way of data modeling that not only makes data and business teams work closely together, but also includes setting a success metric.

“Big data is fundamentally changing the way we make decisions today,” he said.

“Marketers and business owners who are making decisions based on forecasting using (traditional) models are now challenged because they have much more information available.”

The conference, put on by Enterprise Innovation Ltd., continues Friday.



Related Download
EMC Data Protection For VMWare-Winning In The Real World Sponsor: EMC
EMC Data Protection For VMWare-Winning In The Real World
Download this white paper for a deep dive analysis based on truly real world comparison of EMC data protection vs. Veritas NetBackup for VMware backup and recovery.
Register Now