Group modelling a predictable future

Usually the adjectives “predictable” and “derivative” carry with them somewhat negative connotations. They imply something is not new, trendy or cutting edge. In the case of PMML, that isn’t exactly the case.

PMML (Predictive Modeling Mark-up Language) is based on XML — hence the derivative label — while at the same time, is a new and evolving language. So far it has only reached version 1.0. It is intended to be a standard method of defining predictive models and algorithms – decision trees and logistical regressions, for starters — that are the foundations of data mining and business intelligence. It is cutting edge because no one had thought to apply standardization to data mining until now.

The Data Management Group (, a consortium made up of SPSS Inc., NCR Corp., Angoss Software Corp., Magnify Inc. and the National Data Mining Center at the University of Chicago originated the language model. Currently the group is attempting to become an official working group with the W3C – one of the steps in establishing an industry standard.

According to its backers, PMML is an idea whose time has come. They say the data mining field is so complex and fragmented that nobody’s products can communicate with anybody else’s, and that causes problems for consumers and vendors alike.

“PMML provides a quick and easy way for companies to define predictive models and then share those models between compliant vendors’ applications,” explained Marc Fienberg, director of marketing for Magnify in Chicago. “Right now there is no easy way to do that. There is no open exchange of models, and PMML simplifies that process.”

He added that if the standard takes hold, users will be able to build a model with one company’s product, analyse it with another company’s offering, and visualize it with a third party’s tool. That is impossible today.

In addition, Michael Cornelison, senior member of Magnify’s technical staff, said even within an organization there is a need for standards to ensure the accuracy and usefulness of predictive modelling results.

The direction in which PMML is heading appeals to Eric Apps, president of Toronto-based Angoss Software. “We see it as a useful standard and not a hype announcement. We’ve got developers spending time on it and, we think there is a framework there.”

But, Apps said, the idea needs to gain a more popularity. “We’re pragmatic enough to understand that if large organizations like IBM, Oracle or Microsoft don’t support it, it will become irrelevant.”

Even with their support, he added that PMML is only at the beginning of its lifecycle. “It will take a year or 18 months before it will be out there in a useful way, and by that time it may have been changed and transformed.”

Cornelison and Fienberg said Magnify is currently offering products that conform to the PMML 1.0 specification, and added anybody who can use XML will be able to work with PMML.

Ease of use will be important to the future of PMML, according to Aaron Zornes, executive vice-president and research director, application delivery strategies, at the Meta Group Silicon Valley Research Center in Burlingame, Calif., but so will the depth of resources of the Data Management Group.

“Usually there is a champion behind things to make them happen and here it seems to be Bob Grossman (president of Magnify). The question is does he have the personality and the personal bandwidth to carry this as his personal battle as his crusade?

“I think it would be better if Bob had his chief technical officer do this, or found some other people in the industry to take it on — I don’t think he’s got the bandwidth. He’s got the eloquence, but it takes time, and running a fairly small company in a fast moving business is very demanding. “

Still, Zornes said that the group has built a strong initial foundation. “As with any emerging technology, it is hard to be the early adopter unless you are really brave and have a lot of money and a lot of power, because you may invest part of your career and a lot of your budget into one of these emerging data mining products.”