Supporters of using Apache Hadoop
for processing big data have two big boosters on their side: Intel Corp., and EMC Greenplum
, which this week released their own distribution of the Hadoop software for storing and processing large amounts of data.
Chipmaker Intel said Tuesday that its version of the open source software, which includes a manager for deployment, is optimized for its Xeon processors and includes encryption that supports Intel’s AES New Instructions for security on its CPUs.
Meanwhile on Monday the Greenplum division of EMC announced Pivotal HD, which integrates the Greenplum database with Apache Hadoop.
They join three other commercial Hadoop distributions -- Cloudera, Hortonworks and MapR -- as likely to appeal to organizations.
Forrester Research analyst Mike Gualtieri found the announcements exciting. It makes sense for Intel to get into the fray because Hadoop is a storage and data processing platform, he said. As a chip maker it can help with getting data more efficiently into Hadoop, he added. He also noted that Intel says it's not trying to compete with enterprise software companies, who try to lock customers in to their technologies.
Gualtieri was less impressed with Greenplum's announcement, even though its Pivotal HD software includes a SQL database. One of Hadoop's shortcomings is accessing data through SQL. However, he noted that there are other solutions. Cloudera is working on a project called Impala to put a fast SQL layer on top of Hadoop, he pointed out.
The ability to process big data – broadly defined as data bigger than most analytics software can handle – could bring big benefits to business, argues Intel. But “only a small fraction of the world is able to extract meaning from all of this information because the technologies, techniques and skills available today are either too rigid for the data types or too expensive to deploy.”
The optimizations made for the networking and IO technologies in the Intel Xeon processor platform also enable new levels of analytic performance, Intel said in a news release.
Analyzing one terabyte of data, which would previously take more than four hours to fully process, can now be done in seven minutes, it claims, thanks a combination of Intel hardware and the company’s Hadoop distribution.
The proprietary management software is aimed at simplifying the deployment, configuration and monitoring of the Hadoop processing cluster. Optimal performance can be had through an automatic tuner, Intel says.