Data warehouse or data pit?

Marketing hype aside, data warehousing and data modeling have been around for a long time. Alexander the Great and Napoleon built battle models based on information from their intelligence networks and both the German and Allied General Staffs used data modeling in World War II.

In those days it was called Operations Research. During the Cold War the various sides assembled intelligence information for analysis of “what if” situations.

More and more data was fed through computers to get faster matches or to answer the question, “How many blue-eyed carpenters are there in Murmansk?”

In the business world, operations research still exists in large organizations and variations on the theme have been in smaller companies since the days of punched cards and the invention of the word “database.” In the days of punched cards, the term data warehouse was much more literal.

Off-site warehouse storage became a thriving business. Now we have data storage that is remarkably cheap, computers that become faster every week and a wealth of available information. The ability to distribute data is not in question. Sorting against a wide variety of parameters and doing statistical analysis is available on the desktop.

In the IS world we don’t question the data too much, nor do we challenge the need. The latest phrase I keep hearing is “Mining your data.” It may be a very appropriate phrase. Mining to me means going into dark places with a potentially explosive atmosphere. Data warehousing can be like that.

Private employee information, overly optimistic marketing information and questionable statistics can lead to serious security and modeling difficulties and the old favourite, comparison of apples with watermelons.

Employee data is the most volatile. An example is a clever system designed to measure employee attendance over time to determine the productivity level of the company. Aside from measuring absences on Fridays and Mondays, the system also massaged the reported sick time to determine which employees were taking undue advantage of collective agreements and employment contracts. The problem turned out to be that if a person had minor surgery and was away for two or three weeks, the absence showed up on a list of excessive absenteeism for years. This continued until a former employee, whose references showed an attendance problem, successfully sued them. It will be argued that this example is not true data warehousing, nor is it proper data modeling. It really doesn’t matter what you call it. If it doesn’t provide the right information then companies can end up in court, or in the worst case a bankruptcy hearing.

If your company is anticipating a new data warehousing approach, or considering enterprise performance measurement, find out the business objectives and how the data will be used. Work with the departments and company executives to determine key performance indicators and do some homework on activity-based costing. It goes without saying that your existing financial systems need to be stable and accurate. You also need to challenge the premise behind some of the data collection. If data doesn’t provide a realistic measurement or if it changes meaning as the business matures, you run the risk of producing faulty performance indicators or useless reports.

Remember the old formula “garbage in equals garbage out”? Now it’s “data in plus manipulation equals garbage out.”

Horner is a partner in Sierra Systems Consultants Corporate Enterprise Systems practice. He can be reached at