Hadoop in the enterprise: IT departments are struggling with two perennial issues

Although Hadoop is still an emerging platform, it’s finding its way into enterprises despite its complexity and growing pains, but a multi-tenancy approach to resources can smooth out enterprise-wide deployment.

Deployment and operational complexity remain two of the perennial issues for Hadoop, said Tony Baer, principal analyst with Ovum in a recent webcast, and there are a number of reasons for that complexity; even organizations that have initial successes with a prototype can expect hiccups as they expand their deployments further.

For one thing, it’s a distributed architecture, not a database, said Baer. “For anyone with database experience, distributed architectures have always been very difficult hurdle to deal with.” Hadoop is a file system with database-like features and it doesn’t have the built-in tools that come with databases, such as those for self-tuning, security or lifecycle management. “These things are just emerging.”

Another aspect of Hadoop that makes it more complex, said Baer, is that it’s not a monolithic platform, it’s a collection of projects. Enterprises are depending on the distribution provider to rationalize those projects into a single package.

He said Hadoop has quickly evolved into a multi-purpose platform, which is both a blessing and a curse. “Hadoop no longer just a single-purpose map/reduce machine.” In the early days, map/reduce was really the only workload that could be run on Hadoop. Now it has evolved to work in a number of different modes, including standard reporting, interactive query and operational decision support. “With frameworks like Spark, we’re starting to see Hadoop become a real-time platform, which is such a huge departure from its roots as a batch platform.”

This evolution means it’s capable of even more diverse workloads. However, it further adds to Hadoop’s complexity. “It multiplies its value to organizations, but also multiplies the potential complexities you have to deal with.”

Hadoop deployments usually begin with a single use case to solve a problem; from there enterprises realize it can address other issues. “From a compute standpoint, these workloads are growing even more diverse,” said Baer. “You’re trying to do a lot of juggling and that of course multiplies your complexity. It makes prioritizing resources more difficult.”

It’s not just deployment complexity. Common growing pains for enterprises as they expand their Hadoop footprint include cluster sprawl, ecosystem sprawl and a disconnect between staffing and skills, said Baer. In addition, early success with Hadoop can lead to over-promising on future deployments.

Because of the all the complexity obstacles and resource management required by Hadoop, enterprises are starting to look to the cloud for lessons, said Baer. For one thing, the cloud treats all resources as a common pool. “You manage that resource dynamically,” he said. “The core building block that makes elasticity possible is multi-tenancy.” This means a cluster is a shared resource across many entities.

Multi-tenancy can smooth out the bumps of Hadoop deployment, said Baer, because it means infrastructure is better utilized. In the early days of adoption, low utilization was common, but it’s not acceptable for enterprise-wide Hadoop deployments, he said. Multi-tenancy enables an organization to spin up separate compute and storage quickly for each tenant.

Baer said Hadoop has become more valuable to the enterprise, but its versatility comes with a cost. Multi-tenancy offers an avenue for making resource juggling more manageable.

 

 

 

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

Gary Hilson
Gary Hilson
Gary Hilson is a Toronto-based freelance writer who has written thousands of words for print and pixel in publications across North America. His areas of interest and expertise include software, enterprise and networking technology, memory systems, green energy, sustainable transportation, and research and education. His articles have been published by EE Times, SolarEnergy.Net, Network Computing, InformationWeek, Computing Canada, Computer Dealer News, Toronto Business Times and the Ottawa Citizen, among others.

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations 

By Derek Manky As predictions for 2024 point to the continued expansion...

Survey shows generative AI is a top priority for Canadian corporate leaders.

Leaders are devoting significant budget to generative AI for 2024 Canadian corporate...

Related Tech News

Tech Jobs

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Tech Companies Hiring Right Now