A big data ‘aha’ moment

It may be a buzz phrase, the cloud computing of 2012, but I do find big data analytics fascinating. It's just the way my mind works; give me a big enough survey sample, and I can entertain myself with pivot tables for hours on end. But I felt I needed a better grounding in the concept, so I asked the folks at SAS Canada for a schooling. They connected me with Paul Kent, SAS Institute Inc. in Cary, N.C. Kent is the vice-president of platform research and development for the company.

(I also spoke to Pat Finerty of SAS Canada about the evolution of analysis, from data mining to big data, in this video.)

It's a given that technology changes everything, but that's particularly true in the big data analytics field. The ability to process the analytics of billions of lines of data in memory, innovatons like the Hadoop MapReduce framework for distributed computing, and high-performance computing grids make it possible to perform analytics on ever increasing amounts of data in near-real time.

On the other side of the equation, we're collecting more and more data to analyze. The evolution of data analysis is inextricably linked to the evolution of data collection. In the early days of computing, data was part of the application itself. Move along to the transactional data base model, and data is collected from outside the application, but complying with a specific structure of fields. Now, the sources of data aren't so structured: we're dealing with documents, images, and media files, often without the appropriate meta data; geo-location data that may or may not be associated with a transaction; social media feeds wherein context is everything; metering data from electrical grids; all manner of telematics from vehicles, production machinery, etc.

I remember a story from the days of yore, when data mining was a fresh concept. A colleague of mine called out a representative of one of the vendors over the beer and diapers issue: analyze enough transactional data, and you'll find a pattern that suggests people who buy diapers also buy beer, so a retailler can organize the shelves accordingly. Said colleague's complaint was that the company rep was presenting this as a fact, rather than a theoretical example of the patterns that data mining can unlock, and factually, it wasn't true. It's an item of small relevance, but for the fact that it lodged the beer-and-diapers model of data mining in my head for the ensuing 15 years.

And it's a handy model to have when the skeptical say that big data analytics is just a jumped-up version of data mining. It highlights the fundamental difference, and my discussion with Kent crystalized it: data mining is transaction-focused, teasing patterns out of information of limited scope, whereas big data analytics has a behavioural focus. We're not concerned with the transaction, according to Kent, but with the behaviour that leads to the transaction. Of those many new types of data outlined a couple paragraphs ago, almost all are related to behaviour.

That was my big data “aha” moment, and it fundamentally changes my understanding of analytics.

POPULAR CATEGORIES

Content Types

ALL CATEGORIES

SENIOR CONTRIBUTORS

A big data ‘aha’ moment

Would you recommend this article?

Share

Featured Download

IT World Canada in your inbox

Latest Blogs

Successfully managing Cybersecurity projects in the Age of AI

D&A leaders must improve risk culture among their teams

Understanding Cybersecurity on Smartphones (UCS-Sph) Part 2

Search ain’t what it used to be

4 reasons organizations should incorporate AI TRiSM into their AI models

Tumblr buy promising move for Yahoo

Be there or be square: HackerNest Toronto on Monday, Feb. 25

Why the BlackBerry 10 launch mattered as much as the features

Senior Contributor Spotlight

Successfully managing Cybersecurity projects in the Age of AI

D&A leaders must improve risk culture among their teams

A sustainable future for the next generation

Popular Stories This Week

Cyber Security Today, Week in Review for week ending Friday April 19, 2024

Cyber Security Today, April 22, 2024 -Vulnerability in CrushFTP file transfer software, security updates for Cisco’s controller management application, and more

Cyber Security Today, April 24, 2024 – Good news/bad news in Mandiant report, UnitedHealth admits paying a ransomware gang, and more

ITWC Network

Follow Us