The open source Hadoop framework for storing and processing large datasets is attracting an increasing number of organizations who want to at least try it because their information stores are rapidly increasing.
After a recent big data conference Ben Lorica, chief data scientist at O’Reilly Media pulled together a detailed summary of things he heard which deserve to be followed up.
For example, many companies are struggling with how to process and mine near real time data streams. An Intel official talked about how the chipmaker uses the in-memory cluster computing research from the University of California at Berkeley called Spark and Shark.
SQL-on-Hadoop solutions are now of interest with the release of Cloudera’s Impala query engine that runs on top of Hadoop, and Hadapt’s data-driven schema. Both were discussed at the conference.
And there was also a session about a corner of data science that I haven’t heard of called adversarial analytics — think of behavioral models that try to detect cyber intrusions and black hat hackers that try to evade them.
Sometimes you don’t have to go to a conference to pick up nuggets that are worth pursing on your own time. That makes this column worth scanning.