Dirty data

There are many ways that a supplier named IBM could be entered into a supply chain database: IBM Corp., I.B.M. Corporation, International Business Machines Corp. or a host of other variations.

Any one of those monikers might work well enough for a specific transaction. But if a company wants to see how much business it’s doing with IBM overall, the name variations become a problem. The company might be doing US$100 million worth of business with IBM, yet a database query might show only US$20 million, depending on which name is used in the query.

The result is that the company wouldn’t have a complete, accurate view of its suppliers so it could negotiate better deals and volume discounts.

“We see 20 per cent duplicate supplier records,” says Craig Verran, assistant vice-president for supply chain solutions at The Dun & Bradstreet Corp. in Murray Hill, N.J., which helps companies clean up their supplier files.

That’s just one small example of how uncleansed data gives a company the wrong picture of its supply chain.

“Companies are making bad operational decisions every day of the week [and losing money] because of bad data quality,” says Ted Friedman, an analyst at Gartner Inc. in Stamford, Conn.

Poor data management is costing global businesses more than US$1.4 billion per year in billing, accounting and inventory snafus, according to a survey of 599 companies by PricewaterhouseCoopers in New York. One-third of the companies say that “dirty data” forced them to delay or scrap a new system.

“We have had major [supply chain] software projects fail for lack of good data,” says Donald Carlson, director of data and configuration management at Motorola Inc.’s semiconductor products group in Austin, Tex. In a presentation at a recent data quality conference, he recalled how a new supply chain planning system had to be scrapped because bills of material were hit by a triple whammy: incomplete data, inaccurate data and different data formats in different countries.

“We could do a supply chain plan for Hong Kong, we could do one for Scotland, but we didn’t have a standard methodology,” Carlson says.

Carlson was given the task of fixing the problem in late 1998, and he’s well on his way. Many data problems have been ironed out, and now there are “data stewards” people assigned to make sure the data stays clean.

There are two fatal mistakes users make when it comes to data quality. The first is to neglect data issues in the rush to install a software package. “Many companies are oblivious,” Friedman says. “They don’t take the time to quantify the pain that they’re experiencing as a result of poor data quality.”

The second is to clean up data for a specific IT project instead of finding the source of the data problems. Companies may turn to data-cleansing companies, such as Vality Technology Inc. in Boston and Firstlogic Inc. in La Crosse, Wis. They may even try to identify problems by using analytical tools from data profiling vendors, such as Evoke Software Corp. in San Francisco and Metagenix Inc. in Durham, N.C.

But these efforts tend to be one-shot deals. “People put a Band-Aid on the problem by cleansing data [for a particular software project], but they’re not going back to fix the root cause of the problem,” Friedman says.

Ideally, a company would undertake a continuous campaign to make sure data that enters the system is clean and then stays that way. Some of the job can be forced onto suppliers that feed data into the system by setting strict requirements for data formats.

But it’s also important for trading partners to establish business rules or what used to be called a data dictionary about the semantics of the data, since different systems define data fields in different ways.

For example, companies need to establish rules for what to call things: Are they pants, slacks or trousers? Does 06-03-2001 mean June 3 or, as Europeans might say, March 6? Or, take the database field called Invoice Date. Does that refer to the date the supplier put on the invoice, or the date the invoice was received?

“Something that seems very simple is, in fact, very complex,” says Brent Habig, president of Tigris Consulting, a supply chain consultancy in New York. “And you need to be very careful when pulling data from different systems together to make sure that it’s the right field name, the right format and that semantically, it means what you want it to mean.”