Thursday, May 19, 2022

Datameer adds automation to Hadoop

The open-source Hadoop framework for processing pools of big data across clusters of servers is a boon for organizations with large stocks of data. However, making efficient use of the platform for analyzing data can be a chore.

Typically raw data has to be combed and filtered before it can be scrutinized, a process that doesn’t allow large numbers of users to leverage Hadoop at once. And the smaller datasets that result from the refining are an inefficient use of a distributed cluster.

Rather than transferring filtered data to a data warehouse or other system for analytics, Datameer, a company that makes a Hadoop analytics suite, says you can have your cake and eat it too.

The company said Wednesday its upcoming Datameer 5.0 can use Hadoop’s MapReduce and in-memory technology and a single server to process data that many users can access simultaneously.

“What we’re delivering is an optimizer called Smart Execution that looks at the dataset characteristics, looks at the analytical characteristics, looks at the available resources on your Hadoop cluster,” Matt Schumpert, director of product marketing, said in an interview.

“Then Smart Execution will dispatch the different parts of work to the different engines looking at how busy is the cluster, how busy is the dataset, can be leverage some statistics we have about the data (like do the filter before the join) … to schedule and efficiently use all the different computational engines.”

On large datasets Smart Execution uses Apache Tez, an optimized form of MapReduce, while small data analysis will be executed on a single Hadoop node or using in-memory technology. That selection is completely transparent to the end user,  and does not require IT assistance or extra hardware or software, the company said.  Smart Execution can add  new advances in the Hadoop ecosystem as they become available, such as Spark, as they become enterprise ready.

The advantages are speed faster analytics, low latency and better utilization of the Hadoop cluster, Schumpert said.

Some organizations copy data back and forth between the Hadoop cluster and business intelligence or in-memory database tools, he said. That raises administration and security issues. Datameer 5.0 eliminates that. “It’s one job that runs through Hadoop and is audited through YARN,” a tool within Hadoop 2.0 that lets users run multiple applications with shared resource management.

It also reduces the cycle times of end users, he said. Datameer lets users define the analytical steps with a sample of data, and the software computes the full results.

“Now jobs can run faster, which means you’re going to be able to iterate faster. So if you need those accurate results before you can decide what’s going to be the next step, or you’re going to pass the result to an another analyst who’s going to go down another path, you can do that quicker.”

Datameer 5.0 will be released in Q4.

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication. Click this link to send me a note →

Jim Love, Chief Content Officer, IT World Canada
Howard Solomon
Howard Solomon
Currently a freelance writer, I'm the former editor of ITWorldCanada.com and Computing Canada. An IT journalist since 1997, I've written for several of ITWC's sister publications including ITBusiness.ca and Computer Dealer News. Before that I was a staff reporter at the Calgary Herald and the Brampton (Ont.) Daily Times. I can be reached at hsolomon [@] soloreporter.com

Related Tech News

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.