Many organizations today are convinced that collecting and hoarding data is their future: Without big data, how can they get to know their customers (and potential customers).
So as the pool of data grows bigger, the need for a way to store it becomes bigger. Often firms have silos of data, but how can that be leveraged? Hence the data lake, a large store of raw data — often built around Hadoop or cloud storage — from which analysts can dip in and create data marts/warehouses. In theory there’s a saving because data doesn’t have to be transformed into familiar formats an organization uses.
But as an article on CSO Online reminds infosec pros, data lakes need securing. After all, what could be a sweeter target than all the valuable data in one place?
“The appeal of increased agility, reduced costs and removal of silos cause many organizations to jump head first into the data lake and ignore basic information governance best practices at their own peril,” Jonathan Steenland, principal at Zyston CISO Advisory Services, is quoted as saying.
That means the standard security strategies must be top of mind. But the article quotes a Gartner analyst saying many of the current data lake technologies on the market don’t have fine-grained security controls. Until then access management, encryption, and tracking of data throughout its lifecycle in the enterprise have to be the priorities of the CISO. The protection becomes even more sensitive if the data lake is in the cloud.
The article doesn’t detail, but other risk reducing strategies must also come into play, including anonymizing data where possible and archiving — or better yet deleting — unnecessary data. In 2014 Gartner warned that many data lakes are being used for data whose privacy and regulatory requirements are likely to represent risk exposure.
For those who need a primer on incorporating privacy into business functions and data stores can start by looking at the Privacy by Design principles.