Forget big data. This is bigger data

With a planned upgrade to its Hadoop distributed data processing technology, the Apache Software Foundation intends for the platform to run across much larger clusters and take on larger workloads, an Apache official said Thursday.
A key goal for the upcoming 0.23 release of Hadoop, which could eventually be called version 2 or 3, is to have it run across 6,000-node clusters. It currently has run on 4,000-node clusters, said Arun Murthy, vice-president of Apache Hadoop at Apache and a founder of Hortonworks, which offers Hadoop technologies and services. Release 0.23 is currently alpha quality; it is due for more formal release later this year.

Hadoop has become popular for mining large data sets. Plans call for Hadoop 0.23 to run across 6,000-machine clusters, each with 16 or more cores, and process 10,000 concurrent jobs. Users will get more work done, Murthy said in a presentation at the O’Reilly Strata conference in Santa Clara, Calif. Performance, he stressed, is something users “can never have enough of.”

Other capabilities eyed for the upgrade include HDFS (Hadoop Distributed File System) federation as well as high availability for HDFS. MapReduce, which is the programming model and software framework in Hadoop, will be improved as well. Called “Yarn,” the MapReduce upgrade “is the first to take Hadoop and make it a much more general data processing system,” Murthy said. Yarn is “a high-performance rewrite of MapReduce,” with twice the throughput on large clusters, said Eric Baldeschwieler, Hortonworks CTO. Also, wire protocol compatibility planned for the 0.23 release will enable server and client upgrades to be done independently.

Also at Strata on Thursday, MarkLogic and Hortonworks announced integration between Hortonworks Data Platform and MarkLogic’s operational database platform. The integration will allow users to combine MapReduce with MarkLogic’s real-time interactive analysis and indexing on a single, unified platform, MarkLogic said. The arrangement is intended to help users better accommodate big data workloads. MarkLogic will certify its Connector for Hadoop against Hortonworks Data Platform.

Would you recommend this article?


Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.

Jim Love, Chief Content Officer, IT World Canada

Featured Download

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations 

By Derek Manky As predictions for 2024 point to the continued expansion...

Survey shows generative AI is a top priority for Canadian corporate leaders.

Leaders are devoting significant budget to generative AI for 2024 Canadian corporate...

Related Tech News

Tech Jobs

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Tech Companies Hiring Right Now