Data warehousing and extraction, transformation and loading (ETL) tools may not be the sexiest of topics, but that apparently doesn’t stop enterprises from bragging about whose is bigger. We’re talking about the terabyte size of an organization’s data warehouse, of course. At the ETL: The Humble Champion of Data Warehousing conference, held this week in Toronto, execs talked about the changing face of data integration.
In his opening keynote, Stephen Brobst, chief technology officer (CTO) and self-described ETL guru for Teradata (a subsidiary of Dayton, Ohio-based NCR Corp.) dicussed some current and future data warehouse challenges enterprises are up against.
- More data will be created in the next two years than in the past 40,000 years
- Real-time data accessibility is the trend
- ‘Extreme data warehousing’ gives enterprise the ability to exploit business relationships in the data
- Business need, not technology, should drive service requirements
According to Brobst, “appetite for data is outpacing Moore’s Law.” He said more data will be created in the next two years than in the past 40,000 years. The need for data integration technology is growing, particularly as organizations seek methods to build a consolidated view of information scattered across disparate internal and external systems. That said, organizations must have a strategy and best practices in place for dealing with all that data.
“Data is not good enough – information is the goal,” Brobst said, adding that while the size of an enterprise’s data warehouse does matter, more important is having a holistic and single view of the customer and eliminating all planned and unplanned downtime.
Time-based transformations of business processes are becoming competitive necessities, the Teradata CTO argued. He explained the concept of “extreme data warehousing” where response times are measured real-time in milliseconds, and enterprises have the ability to exploit all business relationships in the data.
Traditionally, ETL tools run under the data warehouse and are used to pull out source data, transform and clean it in a required format, and load it into the warehouse. The current shift is towards the real-time data warehouse, Brobst said.
These real-time enterprise (RTE) implementations include data warehouses that provide real-time access to data, and those that acquire data in real time. Either way, organizations should avoid vendor hype, Brobst noted. Service level requirements should be driven by the business need, not the technology.
The goal is the active data warehouse, where data is accurate up to the minute, is always-on and there is support for large data volumes, mixed workloads and concurrent users, Brobst said.
Right now, technology is not the limiting factor, Brobst said. Rather, it is all the legal, and ethical questions surrounding data access.
Who will ultimately be responsible for the accuracy of data of the reliability of the analytics? Are the proper security measures in place? Who will ultimately be held accountable? These are the issues that enterprises will be faced with moving forward, Brobst said.