Hadoop, Cassandra to merge in DataStax distribution

Uniting the seemingly conflicting values of fast data access and deep analysis, open-source software company DataStax is developing a package that will combine its Cassandra non-relational database with Apache Hadoop data process framework, the company announced Wednesday.

The distribution, to be called Brisk, combines low-latency data storage and retrieval with the ability to do in-depth analysis of that data, said Matt Pfeil, CEO and co-founder of DataStax, formerly called Riptano.

Cassandra has been traditionally used by Web 2.0 companies that require a fast and scalable way to store simple data sets, while Hadoop has been used for analyzing vast amounts of data across many servers.

Typically, running heavy analytics against live databases has been frowned upon, because it could slow responsiveness of the database. For this distribution, however, DataStax is taking advantage of Cassandra’s ability to be distributed across multiple nodes.

In this setup, the data can be replicated, whereupon one copy would be kept with the transactional servers and another copy of the data could be placed on servers that would be subjected to analytics. “The two parts of your data don’t interfere with each other,” Pfeil said.

The initial customers might be Internet service companies that already use Cassandra for high-volume data capture and retrieval, Pfeil explained. The company is also marketing the package for enterprises, as a potential lower-cost and speedier alternative to databases and data warehouses.

The initial version of Brisk will use Hadoop version 0.20.2, the Hive data warehouse infrastructure version 0.7, and Cassandra 0.7.4. It will keep Hadoop’s MapReduce, job tracker and task tracker functionality, but will replace the underlying Hadoop File System (HDFS) with a Cassandra interface called CassandraFS, explains a DataStax white paper describing the technology.

DataStax plans to issue this distribution, under Apache open-source license, within the next two months. 

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

Featured Articles

Cybersecurity in 2024: Priorities and challenges for Canadian organizations 

By Derek Manky As predictions for 2024 point to the continued expansion...

Survey shows generative AI is a top priority for Canadian corporate leaders.

Leaders are devoting significant budget to generative AI for 2024 Canadian corporate...

Related Tech News

Tech Jobs

Our experienced team of journalists and bloggers bring you engaging in-depth interviews, videos and content targeted to IT professionals and line-of-business executives.

Tech Companies Hiring Right Now