eBay, which runs a gigantic data warehouse for internal business intelligence (BI) analytics, is considering taking it outside its firewall and offering it as a Web service to interested companies.
That would duplicate a move by Amazon.com Inc., which built its family of Amazon Web Services for in-house use but is now marketing them to outside customers. Most notable among these services are its Elastic Compute Cloud (EC2) application hosting service and its S3 hosted storage service.
In eBay’s case, the online auctioneer has built a 5 petabyte data warehouse that adds 50 TB of new data each day.
One terabyte every 5 seconds
With the Teradata-based data warehouse able to turn over a terabyte of data in just 5 seconds, eBay has taken advantage of that speed to enable business analysts to build their own “virtual” data marts, according to Oliver Ratzesberger, senior director of architecture and operations at the company.
The virtual data marts are used by about 5,000 business analysts in 100 groups inside eBay. These data marts run off the central data warehouse but are created without the help of central IT, said Ratzesberger.
Business analysts create and upload their own mini-data warehouses using standard Web and analytical tools such as those from Business Objects, SAS and Microstrategy — and even Microsoft’s Excel. This lets the analysts rapidly create and test prototypes of the BI analyses they think they want. After 90 days, successful prototypes are brought to the data warehousing managers, who convert them into production data marts with a minimal level of rewriting.
“We cut the amount of time needed to build a data mart at least in half, and in some cases up to three to five times,” said Ratzesberger.
So is eBay thinking of turning its data warehouse into a utility that can be used by outside subscribing firms?
“Yes we are thinking about it,” said Ratzesberger. He acknowledged one problem is how to minimize the time it would take customers to upload large amounts of data to eBay’s data warehouse.
That can be avoided, he said, if “you couple analytics as a platform offering that has the data generating part sitting closer together. Then you totally do away with that problem.”
In other words, possibly hosting eBay’s BI-as-a-service on Amazon’s EC2 and storing users’ data on S3.
Amazon is using EC2 to provide its own Web-hosted database called SimpleDB. Other vendors who plan to use EC2 and S3 to offer hosted versions of their databases include EnterpriseDB Corp. and Oracle Corp.
Self-service BI capabilities have become a hot topic of late. Microsoft announced last week that through “Project Gemini” it plans to create an easy-to-use Excel-based tool that lets regular analysts easily build their own BI queries and dashboards.
But Ratzesberger said that much of the self-service BI capability already came built in to the Teradata 5550 data warehousing software it runs.
“Yes, we built a Web portal and a point-and-click interface. But there was very little that we otherwise had to build,” he said. The Teradata software provides “very solid” workload management capabilities, so that “virtual” data marts can be partitioned and be automatically given lower priority than production data marts, Ratzesberger said.