SAS Institute Inc. has developed high performance computing model that uses HP BladeSystem Infrastructure servers configured as a private grid to process massive amounts of data. Analytics jobs that previously took an entire day to process can now be reduced to a few hours – or minutes.
Jim Goodnight, CEO of SAS, and Chris Bailey, director of the Advanced Computing Lab at SAS, demonstrated the patent-pending technology on stage at the SAS Global Forum in Seattle. Goodnight was personally involved in the development of the code.
The demo used five racks of blades, with 196 blades in total. Each blade had eight CPU cores, providing a total of 1,664 cores. The model processed roughly one billion records of stock market data – 100,000 market states, two horizons and 4,000 instruments – in two minutes and 26 seconds.
This is normally an 18-hour job, said Bailey. “We are splitting it across 1,664 processors … In this particular problem, not only are we using thousands of CPU cores, we are also not using any disk I/O at all. It has totally become an in-memory problem and we have terabytes of memory to solve it with,” he said.
Bailey demonstrated “getting an OLAP cube view on the fly” without actually building an OLAP cube. “When we do our reporting, we are going to take the data we’ve just generated, which is just the leaf level data … and we are not going to pre-aggregate it,” he said.
This used to take three or four hours and now it can be done before you take your fingers off the keyboard, said Goodnight. “My advice is, don’t ever build a cube if you’ve got that many dimensions … put the data in memory, build the cube on the fly, build what you need on screen,” he said.
The model is an example of “the exciting work” SAS is doing in its Advanced Computing Lab, said Goodnight. “The amount of data in the world, the amount of data we are having to deal with, continues to grow year after year. The size of the problems that we need to solve continues to grow. We’ve got to come up with a new computing paradigm we could use and we believe we’ve found it,” he said.
In an interview the next day, Goodnight told a group of Canadian press that the high performance model is probably one and a half years away from mass adoption. People will find it hard to believe you can do a 24-hour job in 15 minutes, he said.
“Adoption of a technology of this nature is always very slow. People don’t believe you can do this … We’ll see if we can convince people to give this stuff a try,” he said.
Most people would call this a grid, but the computing methods SAS is using do not treat it as a grid, said Goodnight. “It treats it as a single unit so it is extremely tightly tied together … I think of a grid as boxes scattered all over the place and we think of this as essentially one single computer,” he said.
“On these kinds of applications where we are looking for massive performance, you don’t want to be virtualized because you are going to lose 30 per cent … we want to operate the system as sitting right down on top of the metal,” said Goodnight.
The technology, which works cross-industry, targets markets like finance and retail. SAS reduced markdown optimization from 28 hours to 3.5 hours for one retail client with over 1,000 stores and 200,000 to 300,000 skus per store. This provides more time for the client, which runs analytics after its stores close Saturday night, to notify managers about sales before Monday morning.
SAS’s development is very advantageous from a processor perspective because enterprises can build a high performance, robust SAS engine out of cores and components they can readily access — without having to necessarily invest in a product for platform computing or some other grid-based computing engine, Russ Conwath, senior analyst at Info-Tech Research Group Ltd.
“There are other folks out there who do in-memory compute, but this seems like a really good addition to the SAS portfolio and something that is going to potentially differentiate them from the other folks because of its scope and scale,” he said.
Highlighting recent announcements from Advanced Micro Devices Inc. (AMD) and Intel Corp., Conwath said, “it is all about the core count and all about how much memory you can blow into these machines.” With SAS’s model, “you could scale out quite a bit,” he said.
The interesting question here is the use case for this new processing architecture, said Gareth Doherty, senior analyst at Info-Tech. “This product is very likely going to be focused in the finance industry where you’ve got extremely data-intensive predictive models that are looking to forecast where the marking is going and there is a lot of information that you have to process on the fly in order to set pricing information,” he said.
A lot of organizations in these scenarios are already leveraging sophisticated statistical analysis tools like SAS, added Doherty. “I think what (SAS) is looking to do is entrench their market presence by also offering an infrastructural solution that allows them to be able to leverage their analytical capabilities, but do so in a way that also gives them a much better response time so they are no longer limited in the hardware level,” he said.
In-memory processing in the context of BI isn’t new – mega-vendors like IBM Corp. and MicroStrategy Inc. already have in-memory solutions, and even Microsoft Corp.’s SQL Server is going to offer in-memory processing capabilities, said Doherty.
What’s interesting is the way SAS is describing the technology from the infrastructure perspective, he said. “I think what they are going for is something that looks a lot more like an appliance and less like a piece of software that’s going to be the engine behind the processing,” he said.