There are no pure data problems

While I agree with Ken Karacsony’s assessment that “Too Much ETL Signals Poor Data Management,” I have a very different opinion on what’s at the heart of the issue and what kind of solution it deserves, at least on the online transactional processing (OLTP) side of things. There is no such thing as a pure data problem, because in any business application, data always exists within the context of the business process.Text

Where Karacsony sees just poor data management, I see poor engineering practices in general.

There is no such thing as a pure data problem, because in any business application, data always exists within the context of the business process. Whenever data is taken out of that context (for example, when it’s stored in relational database management system tables), it loses a significant portion of its semantic significance.

Take a typical database for a financial services company. While it may be sufficient for very simple cases to have just one flavor of address in the database, business process complexity is likely to require numerous variations of the address structure: current client residence address, property address, client correspondence address, shipping address, billing address, third-party address and so on. All these address records may have identical physical structures, but semantically they are very different.

If someone does a home-appraisal search with the current client residence address instead of the property address, for example, he will get a wrong result. And, of course, giving the shipping department the billing address is probably a bad idea. One way to ensure that data is not taken out of the business context is to build cohesive systems around a logical unit of the business and expose these systems to each other only through semantically rich messages.

One advantage that the messaging integration style has over the shared-database integration style is that it transmits not only the shared data but also the shared business-context semantics. Because it services many different owners at the same time, a shared database by its nature will rapidly lose its initial design crispiness due to an inability to keep up with modification requests.

This in turn will lead to data overloading, redundancy, inconsistency and, in the end, poor data quality at the application level. (I would recommend using shared data integration within a single business unit but going to message-based integration for interdepartmental and enterprise development.)

Except for the area of ad hoc reporting, users don’t deal with databases; they deal with business applications. I would argue that too much ad hoc reporting signals problems with the business process design, application workflow design and/or user-interface design.

Too many OLTP applications are poorly designed and thus have very inadequate usability characteristics, forcing users to compensate by requesting a high volume of “canned” reports as well as sophisticated ad hoc reporting capabilities. In the world of carefully designed applications, it’s the applications and not the databases that are the centers of customer interactions.

As an example, I recently worked on a project where we were able to either migrate into an application’s human workflow process or completely eliminate more than half of the reports initially requested by the business users.

The solution to the “too much ETL” problem in the OLTP world is thus less centralization and lower coupling of the OLTP systems and not more centralization and tighter application coupling through a common data store. One can argue that it’s always possible to introduce a layer of indirection (i.e., XML) between the application logic and the common database physical schema, thus providing a level of flexibility and decoupling. While this may work for some companies, in my experience, this type of design is harder to maintain than the more robust asynchronous, middleware-based messaging because it mixes two different design paradigms.

I would be interested in hearing from Computerworld readers about any midsize to large companies that have been successful in building operational data stores that worked well with multiple interdepartmental systems through a number of consecutive releases. I predict that it will be hard to find a significant number of cases to discuss at all, and it will be especially difficult to find any examples from companies with a dynamic business process that requires constant introduction of new products and services.

The main reason for the lack of success, from my point of view, isn’t technical in nature. It’s relatively easy to build tightly coupled applications integrated via the common data store, especially if it’s done under the umbrella of a single program with a mature systems development culture.

The problem is in the “realpolitik” of a modern business environment: We work for businesses in the age of ever-accelerating global competition. It’s almost impossible to coordinate the business plans of various departments, much less the subsequent deployment schedules of multiple IT projects, each working on its group of business priorities in order to keep current systems that are built around one shared database.

When one of the interdependent development teams misses a deliverable deadline, political pressure to separate will become hard to resist. And if a commercial off-the-shelf software package is acquired or a corporate merger or an acquisition takes place, the whole idea of all applications working with one common data format is immediately thrown out the window.

So we in IT need to learn how to build systems that won’t require rigid release synchronization from the multiple OLTP systems belonging to disparate business units. Decoupling can provide us with the required flexibility to modify our systems on a coordinated, but not prohibitively rigid schedule.

Finally, it’s important to emphasize that while loose coupling gives us an opportunity to modify different systems on different schedules without corrupting the coupled systems, “loosely coupled” doesn’t mean “loosely managed.” Loose coupling provides us with a degree of flexibility in implementation and deployment.

This additional degree of flexibility gives our business partners the ability to move rapidly when they need to and at the same time provides IT with the ability to contain and manage the challenges caused by the ever-increasing rate of business change. We have to acknowledge that developing loosely coupled applications that work well together across an enterprise with well-delineated responsibilities is a very challenging engineering problem.Successful data management begins by taking focus away from data. Instead, the focus should be on the general level of systems engineering and its main aspects, such as requirements analysis and management, business domain modeling, configuration management and quality assurance processes.Text

If not managed well, this type of systems development may turn the advantage of loose coupling into the disadvantage of “delayed-action” semantic problems. A mature IT development process is absolutely necessary to overcome this engineering problem and deliver this type of information infrastructure to our business partners.

From this perspective, it is worthwhile for any organization that is striving to build a well-integrated enterprise-level IT infrastructure to look into SEI Capability Maturity Model. Specifically, Maturity Level 3, called the Defined Level, addresses issues of development process consistency across the whole enterprise. From my point of view, it is a prerequisite to the physical integration of the enterprise systems into consistent whole.

CMM manual describes this process level as when “the standard process for developing and maintaining software across the organization is documented, including both software engineering and management processes, and these processes are integrated into a coherent whole.” Unless an organization is prepared to operate at his level, it should not have high hopes for a success in the integration area.

So to summarize: Successful data management begins by taking focus away from data.

Instead, the focus should be on the general level of systems engineering and its main aspects, such as requirements analysis and management, business domain modeling, configuration management and quality assurance processes.

I would argue that any midsize to large company that hasn’t reached CMM Level 3 and is trying to get “data under control” would have little chance to succeed in this undertaking, regardless of what integration style it uses.

–Semyon Axelrod is an IT architect in Minnesota. He can be contacted at

Don’t miss related articles, white papers, product reviews, useful links and more! Visit our Spotlight on Data Management