Scalability has been the buzzword for data warehousing vendors over the past several years, with the standout questions being, how many petabytes of data can I store? And how many servers and nodes?
In 2010, however, the watchword will be speed, as vendors start introducing flash memory storage to get around the longtime bottleneck of reading and writing to disk.
Oracle has already started shipping its flash-enabled Exadata v2 database appliance. Start-up ParAccel said last week that it will bring out a flash appliance this quarter, while Teradata is aiming for a product release by midyear.
“This is the most important hardware development of the decade,” said independent analyst Curt Monash. “Other vendors will need to rapidly follow suit.”
The leaders are taking different approaches. Oracle is using flash memory cards developed by Sun Microsystems that are connected to the Exadata server’s motherboard via fast PCI Express (PCIe) interfaces.
The four 96GB cards cache the “hottest” data. They are a key piece of the Exadata’s overall architecture for boosting storage I/O, which, until flash came along, had failed to keep pace technically with components such as software and CPUs.
Long a critic of Oracle’s database efforts, Monash said he likes what he sees.
“This doesn’t mean all of Oracle’s marketing claims are correct, or that their legacy DBMS is the best starting point for an overall system design, but with the Exadata v2, they have made some smart choices,” said Monash.
Like a proud parent after watching a child score his first goal, Oracle CEO Larry Ellison can’t help but crow about the Exadata and its “1 million random I/Os” per second, nor can he hold back from launching barbs at rivals.
“You would’ve thought IBM, because they do hardware and software, would’ve come out with a database machine many years ago, it’s so obvious,” Ellison said during a Q&A on Wednesday.
Ellison also said that Oracle recently made inroads into a longtime Teradata customer after an Exadata v2 was able to handle the same workload in one-eighth the time.
Teradata declined to comment on Ellison’s claim. But the Dayton, Ohio-based data warehousing leader respectfully — but vigorously — disagrees with Oracle’s approach.
“In the OLTP world, that makes sense. But in the data warehousing world, it does not make sense architecturally,” argued Scott Gnau, vice president of research and development at Teradata.
For one, Gnau questions the ability of the Exadata’s flash cache to store the huge data sets often crunched by analytics apps. “You have to know exactly what stuff you plan to park there, or get lucky,” he said.
That means data will be scattered on several places — the RAM, the flash and then disk — all with different access speeds. That will result in dependencies and bottlenecks — problems that will be exacerbated in multiserver clusters and grids, contended Gnau.
Rather than trying to play traffic cop, Teradata said its approach with its Extreme Performance Appliance 4600 (code-named Blur) is simpler: Use flash-based solid-state disks.
Storing up to 24TB of data, the 4600 connects to the SSDs over the same physical interconnects as hard disks. While theoretically slower than connecting straight to the motherboard via PCIe, it’s also an easier-to-handle load-balancing problem, said Gnau, and one that can be addressed using Teradata’s virtual storage software.
“This is real direct-attached storage,” he said. “We don’t use disk controllers; we do all our of data integrity inside our software.”
The result, said Gnau is 5 million I/Os per second performance — fast enough to replace a complex event processing engine or an in-memory database.
Using SSDs throughout won’t be cheap, though Gnau declined to comment on that.
Still other, faster options
While Teradata has trashed Oracle’s flash-cache approach, it hasn’t ruled out using PCIe-based technology down the road.
That’s what ParAccel is doing. It’s using PCIe to connect to 640GB of flash per server appliance — about two-thirds more than the Exadata v2 — to deliver 15X performance boosts, the company said recently.
While ParAccel is reportedly going with Fusion-io, there are other PCIe-based options.
One is an Israeli start-up called PetaScan that has talked to a number of data warehousing firms about its offering.
Scott Yara, president and co-founder of Greenplum, has looked at PetaScan’s technology.
“This is absolutely a good direction to go down,” Yara said, though he declined to confirm if and when a flash-based appliance might come from Greenplum.
Not everyone is jumping onto the flash bandwagon, however. Netezza said it has tested flash SSDs like Teradata’s and found them not worth using.
“Ten times the cost for four times the performance over rotating disk is not a good deal,” said Phil Francisco, vice president of marketing at Netezza. But “the jury is still out” on PCIe-based flash, he added.