SHARE
Follow this article on Twitter Facebook LinkedIn Bookmark and Share
Home >> Information Architecture

7 top tools for taming big data

7 top tools for taming big data

By:  Peter Wayner  On: 18 Apr 2012 For: InfoWorld (U.S.) 
 

Top-flight reporting, analysis, visualization, integration, and development tools that help you harness Hadoop

SAN FRANCISCO -- The floods that devastated the hard disk industry in Thailand are now half a year old, and the prices per terabyte are finally dropping once again. That means data will start piling up and people around the office will wonder what can be done with it. Perhaps there are some insights in those log files? Perhaps a bit of statistical analysis will find some nuggets of gold buried in all of that noise? Maybe we can find enough change buried in the couch cushions of these files to give us all a raise?

The industry now has a buzzword, "big data," for how we're going to do something with the huge amount of information piling up. "Big data" is replacing "business intelligence," which subsumed "reporting," which put a nicer gloss on "spreadsheets," which beat out the old-fashioned "printouts." Managers who long ago studied printouts are now hiring mathematicians who claim to be big data specialists to help them solve the same old problem: What's selling and why?

It's not fair to suggest that these buzzwords are simple replacements for each other. Big data is a more complicated world because the scale is much larger. The information is usually spread out over a number of servers, and the work of compiling the data must be coordinated among them. In the past, the work was largely delegated to the database software, which would use its magical JOIN mechanism to compile tables, then add up the columns before handing off the rectangle of data to the reporting software that would paginate it. This was often harder than it sounds. Database programmers can tell you the stories about complicated JOIN commands that would lock up their database for hours as it tried to produce a report for the boss who wanted his columns just so.

The game is much different now. Hadoop is a popular tool for organizing the racks and racks of servers, and NoSQL databases are popular tools for storing data on these racks. These mechanism can be much more powerful than the old single machine, but they are far from being as polished as the old database servers. Although SQL may be complicated, writing the JOIN query for the SQL databases was often much simpler than gathering information from dozens of machines and compiling it into one coherent answer. Hadoop jobs are written in Java, and that requires another level of sophistication. The tools for tackling big data are just beginning to package this distributed computing power in a way that's a bit easier to use.

Many of the big data tools are also working with NoSQL data stores. These are more flexible than traditional relational databases, but the flexibility isn't as much of a departure from the past as Hadoop. NoSQL queries can be simpler because the database design discourages the complicated tabular structure that drives the complexity of working with SQL. The main worry is that software needs to anticipate the possibility that not every row will have some data for every column.


Sign up for our Newsletters

 












Print |  Views: 3677   |   Rating:offoffoffoffoff  (0 votes)
Rate this article on a scale of
1 to 5 stars,5 being the best.




peter wayner Peter Wayner is a contributor to the International Data Group (IDG) News Service, which publishes global technology stories from bureaus around the world to more than 300 publications in more than 60 countries.

Recent Canadian IT Jobs




Related Content

What's the big deal about Hadoop?
What's the big deal about Hadoop?How to get started with an enterprise-ready version of Hadoop
Hadoop version of enterprise DB launched
Hadoop version of enterprise DB launchedThe database allows compressed data to be analyzed
When HANA meets Hadoop
When HANA meets HadoopSAP's in-memory database is becoming core to its long-term strategy, and an announcent at its Influencer Summit earlier this week may explain how it will gain broader support
Jaspersoft and Talend partner on BI software
jaspersoft corp. has renewed its partnership with open source data integration firm talend to help launch the latest incarnation of the jasperetl business intelligence (bi) software platform. 
blog comments powered by Disqus