taming-big-data

Although Hadoop is still an emerging platform, it’s finding its way into enterprises despite its complexity and growing pains, but a multi-tenancy approach to resources can smooth out enterprise-wide deployment.

Deployment and operational complexity remain two of the perennial issues for Hadoop, said Tony Baer, principal analyst with Ovum in a recent webcast, and there are a number of reasons for that complexity; even organizations that have initial successes with a prototype can expect hiccups as they expand their deployments further.

For one thing, it’s a distributed architecture, not a database, said Baer. “For anyone with database experience, distributed architectures have always been very difficult hurdle to deal with.” Hadoop is a file system with database-like features and it doesn’t have the built-in tools that come with databases, such as those for self-tuning, security or lifecycle management. “These things are just emerging.”

Another aspect of Hadoop that makes it more complex, said Baer, is that it’s not a monolithic platform, it’s a collection of projects. Enterprises are depending on the distribution provider to rationalize those projects into a single package.

He said Hadoop has quickly evolved into a multi-purpose platform, which is both a blessing and a curse. “Hadoop no longer just a single-purpose map/reduce machine.” In the early days, map/reduce was really the only workload that could be run on Hadoop. Now it has evolved to work in a number of different modes, including standard reporting, interactive query and operational decision support. “With frameworks like Spark, we’re starting to see Hadoop become a real-time platform, which is such a huge departure from its roots as a batch platform.”

This evolution means it’s capable of even more diverse workloads. However, it further adds to Hadoop’s complexity. “It multiplies its value to organizations, but also multiplies the potential complexities you have to deal with.”

Hadoop deployments usually begin with a single use case to solve a problem; from there enterprises realize it can address other issues. “From a compute standpoint, these workloads are growing even more diverse,” said Baer. “You’re trying to do a lot of juggling and that of course multiplies your complexity. It makes prioritizing resources more difficult.”

It’s not just deployment complexity. Common growing pains for enterprises as they expand their Hadoop footprint include cluster sprawl, ecosystem sprawl and a disconnect between staffing and skills, said Baer. In addition, early success with Hadoop can lead to over-promising on future deployments.

Because of the all the complexity obstacles and resource management required by Hadoop, enterprises are starting to look to the cloud for lessons, said Baer. For one thing, the cloud treats all resources as a common pool. “You manage that resource dynamically,” he said. “The core building block that makes elasticity possible is multi-tenancy.” This means a cluster is a shared resource across many entities.

Multi-tenancy can smooth out the bumps of Hadoop deployment, said Baer, because it means infrastructure is better utilized. In the early days of adoption, low utilization was common, but it’s not acceptable for enterprise-wide Hadoop deployments, he said. Multi-tenancy enables an organization to spin up separate compute and storage quickly for each tenant.

Baer said Hadoop has become more valuable to the enterprise, but its versatility comes with a cost. Multi-tenancy offers an avenue for making resource juggling more manageable.

 

 

 



Related Download
Equifax Task Force Report Sponsor: Equifax
Equifax Task Force Report
Download this Task Force report for detailed insights on the use of client data by financial institutions.
Register Now