Syndicated

If you need clusters to process big data then Hadoop is the leading platform for consideration.

And why not? The open source framework for processing large datasets is available for free to try and through a number of distributions with paid support and added features. You have a choice of versions from Intel, Cloudera, Hortonworks and Pivotal among others.

As an analysis on forbes.com of the recent Hadoop Summit points out, even Microsoft is jumping on the bandwagon. The software giant, along with SAP and others, have contributed to the Stinger Initiative to bring full SQL interactive query capabilities to hadoop via a new and improved Hive.

The other thing author John Furrier notes is that distributions are increasingly offering improved Hadoop security as an attempt to differentiate themselves from one another. For example, last week Cloudera announced it had bought Gazzang for end to end data encryption and key management. In May Hortonworks bought XA Secure, which provides centralized capabilities around data security, authorization, auditing and overall governance.

This is a good thing, because as recently as last October a KPMG Canada security analyst criticized Apache Hadoop security as being not up to enterprise standards. Apache Hadoop, of course, is the framework that most distributions build their platforms on which include capabilities like end to end security.

Version 2.4 of Apache Hadoop was released in April, which includes support in the HDFS file system for Access Control Lists, full HTTPS support, native support for Rolling Upgrades, support for automatic failover of the YARN resource manager  and other capabilities.  These are being added to the independent distributions.

As Hadoop distributions mature, it becomes increasingly difficult for enterprises not to consider at least testing one of the versions. Soon it may be as ubiquitous as Windows.

 

 

Share on LinkedIn Share with Google+ Comment on this article
More Articles