APC manufacturer – who wishes to remain unnamed for competitive reasons – wanted to find new ways to determine the lifespan of dies used to produce semi-conductors. The problem was that the dies were expensive and predicting their life was difficult: Even removing one could result in millions of dollars in unnecessary expenditure, while keeping one active too long could prove even more costly in terms of decreased productivity and customer satisfaction.
The manufacturer’s team of engineers needed to look at data from their process control units – computers that monitor the production process – in new ways. What the engineers required were ad-hoc queries, something they couldn’t get from traditional databases.
“The traditional data structure is not designed to ask questions that are outside the pre-conceived reports that you already know you want to generate,” says Jerry Shattner, president of the Montreal-based Sand Technology. “That’s fine for pre-conceived ideas, like if you know you want to generate monthly sales reports.” In addition to being inflexible, traditional databases can also become extremely large and costly to maintain, adds Shattner.
NO MASSIVE OVERHEAD
With that in mind, Sand Technology has developed Nucleus – a technology able to build databases that can be mined to provide the kind of ad hoc queries that companies need quickly, but without the massive overhead of keeping a large database.
The unnamed PC manufacturer successfully used Nucleus to solve its problem of determining the lifespan of its dies: Company engineers were able to make just-in-time decisions as to the quality of the manufacturing process, and optimize their investments in equipment. According to a company spokesman, this resulted in better products, faster production and a greater return on investment.
“It’s great technology,” says Mike Schiff, director of data warehousing strategies with Current Analysis in Sterling, Va. “It allows a user to basically index everything: That’s one of the product’s strengths. However, it’s not a general database. I wouldn’t run them for the inventory, let’s say.”
Sand is specifically targeting ad hoc decision support or data exploration environments in which “short-lived” data marts are swiftly constructed without prior knowledge of specific usage patterns, or without knowing in advance which data fields needs to be indexed, wrote Schiff in a recent analysis of Sand.
According to the analyst, Sand’s products surmount the difficulties of these environments, and provide high-speed access to – and storage of – data by means of a process known as tokenization.
Tokenization is a way of storing data that basically removes the restrictions of traditional databases. It stores information according to the columns in data, as opposed to the rows. Rows, for example, would contain customer address files, with customer information comprising a single row entry.
“When you ask a question in data, you are typically asking a question along the column, such as: how many people live in this postal code?” explains Shattner. “You’re asking about codes, and when you get further down the cycle and (discover that) three million people meet this profile, you’re not going to send the mailing to three million people. You start to qualify questions or information on the basis of the columns, rather than the rows and the records.”
Nucleus stores the data according to the columns. For example, the last name Brown would only be stored once – no matter how many Browns there were in the phone book. After that, if record number one had the last name of Brown, a bit would be turned on to represent that last name. If not, the bit is turned off like a zero. The end result is encoded bit vector representations of the data.
For example, a phone book for all of Canada would contain approximately 25 million rows in the database – with a bit vector representing the last name of Brown that is 25 million single bits long. For each bit, there would be either a one or zero present.
These are called fairly sparsely populated bit vectors, as most contain zeros or ones, and are compressed so that they don’t take up much space. Therefore, you’ve only stored the last name once – whereas a traditional database will have stored all the names, such as Bill Brown, Jim Brown and John Brown. This is why Nucleus takes up so little space relevant to the others, explains Shattner.
Of course, to figure out how many Browns there are in the phone book, you simply have to add up all the ones in the Brown bit vector. To figure out how many are in a specific area code, you would simply add a boolean algorithm (AND), Shattner says.
VERY QUICK RESPONSIVENESS
“Responsiveness is very quick in this type of architecture,” Shattner says. “And because of the inherent way that Nucleus stores the data, every field, every column is in fact an indexed column, so you don’t have to pre-decide in designing your database ‘where am I going to ask the questions?’ They all become indexed and all become available for quick response queries.”
“In manufacturing, there may be information coming out of the manufacturing process as to the yield or change in thickness of a product such as a wafer, or a change in resistance data that is captured by process control units that give you quality measurements – such as the thickness of the steel plate or what the temperature is,” Shattner says. “The question then is what do I with that? What sense do I want to make of that information? In manufacturing, there are probably trends that you can gather from the process that says the temperature is starting to creep or this is starting to vary outside some fiscal limits.”
For example, if a manufacturer wants to be able to determine if there is a correlation between temperature change and thickness creep, they can make some determinations and eventually say “there looks to be a pattern in what’s happening here, or you know, it looks pretty clear to us that our chip yields are starting to get lousy when the temperature consistently stays above X,” explains Shattner. “So you can start to understand what’s going to affect the yields in your operation. You can start to understand when, for example, certain dies may be wearing out based on the query information you got.”
While Current Analysis’ Mike Schiff thinks Nucleus’ is a great technology, he points out that Sand’s financial performance has been less than stellar.
Shattner admits that the company has had difficulty in generating sales revenue.
“It has taken awhile from an external perspective, but it’s really not off what our expectation was in terms of the business plan,” he says. “We’re in a marketplace where you’re trying to identify a customer that’s tried to do something with traditional methods and hasn’t been able to do it, and is willing to go with a relatively small company to move forward. At this stage we see we’re kind of at that knee in the curve and it’s getting to be extremely gratifying.”
“What we’re finding is that most of the initial sales, because we’re not well known, are coming as a result of customers that have tried something they’ve been told would work but doesn’t,” Shattner continues. “A lot of the larger players will typically get the first crack at providing that kind of thing, and only then does the organization realize that there are too many restrictions or it takes too long. When we were first looking to bring the product to market, we looked at it as a very high-speed query engine that was capable of letting you ask any question you wanted of your data and giving you pretty good response time, where as before if you asked something outside the box it could take three days to get an answer to the question.”
Shattner says the company now sees Nucleus in the long-term as a technology or data storage architecture being embedded in almost everything that’s produced accesses data, because “you need this type of technology to access it,” he adds. We foresee licensing hardware and software manufacturers to use our technology as the way they store their data.”
In August, Sand and IBM announced that Nucleus would be ported to run on IBM’s AS/400 – a move that Current Analysis’ Schiff views as a significant one for Sand in gaining access to IBM’s customers. The move also enhances the overall business intelligence of the AS/400, Schiff says.
“I think the endorsement of IBM is important to our sales initiatives in terms of how fast we’re growing,” Shattner says. “It’s really a function of any other product when you’re trying to get it off the ground – getting the first 50 to 100 testimonials is critical and then you can start to build on those.
All tools that have an SQL (structured query language) interface – such as Cognos Power Play, Business Objects Brio, Microsoft Access or Excel – can work with Nucleus to conduct searches and queries. Nucleus is the background operation. Nucleus does offer the use of their version of Brio technologies Enterprise Portal or Query called Nucleus Query.
Text box: “The traditional data structure is not designed to ask questions that are outside the pre-conceived reports that you already know you want to generate.”