Why data management needs a new approach

Last year, I said there was a database management crisis; the relational model (practical or theoretically pure) wouldn’t solve it, and alternative, more pragmatic ways of thinking about database management needed to be emphasized. There just wasn’t an application that could relate keyword and concept searches across various FBI, CIA and public data banks, let alone factor in connections among various individuals and organizations. This application need still hasn’t been met.Text

I’ll illustrate the point with several examples of situations in which the inability to access known information has cost large numbers of human lives.

Homeland security 1: antiterrorism. Middle Eastern men, some of a suspicious nature, were discovered seeking flight lessons. Alert FBI agents suspected that they might be planning to take over civilian aircraft. But this data was never combined with other FBI information, or with CIA knowledge of al-Qaeda interest in airplane hijackings. There just wasn’t an application that could relate keyword and concept searches across various FBI, CIA and public data banks, let alone factor in connections among various individuals and organizations.

This application need still hasn’t been met.

Health care records. The potential benefits from solving the health care record challenge are almost incalculable. Tens of thousands of lives could be saved annually, and David Brailer, national coordinator of health information technology, has estimated cost savings in the hundreds of billions of dollars.

The technical challenges are immense as well. Almost every data type is relevant — character, numeric, date, text, image, time series, genomic, maybe even geospatial.

New sources of data are invented every year. The most important data of all — physicians’ and nurses’ observations and conclusions — is subjective, incomplete, inconsistent, commonly illegible. And it’s usually missing entirely. (Just how many years of your medical records exist anymore?) Even the rules for evaluating and summarizing patient data change as a result of advances in medicine.

Nontechnical problems are also forbidding, involving cost, privacy, organizational politics and the like. This is especially true in countries that, like the U.S., have private-sector health care, but these issues are no picnic in single-payer countries, either.

Homeland security 2: intelligence analysis. In the run-up to the Iraq invasion, the U.S. loudly trumpeted various pieces of “intelligence” related to weapons of mass destruction that actually turned out to be false, specifically in the areas of mobile bioweapons labs, yellowcake uranium ore and aluminum tubing. Intelligence analysts knew each claim was highly unreliable, yet officials presented each one as a near-certain fact. Whatever one’s theories about the motives for these errors or the likely policy outcome had they not been made, one thing is clear — something in the intelligence community needs a great deal of improvement.

One thing that’s needed is technology not unlike a medical records solution — a comprehensive and accessible data bank that would let senior decision-makers directly assess the information used to support specific recommendations and conclusions. The privacy and security issues of such a system are huge, as are the challenges in computational linguistics. Other technical challenges, such as integration and data type support, are also nontrivial.

Why the answer isn’t relational. Each of these problems can and should be addressed, in part, by standard tabular data management. But each also has elements that aren’t well addressed by tables and rows, or indeed by predicate logic in general. For example, they all involve text search, and Boolean keyword search won’t suffice. Instead, users need to search on concepts, such as “interest in flying” or “possible circulatory problems,” while the system estimates relevance in complex ways.

More generally, these apps involve the search for and processing of subjective human opinions and also of unreliable machine correlations and judgments. They involve handling unforeseen data types — perhaps some kind of telemetry or graphical analysis. The need for a new kind of data may be uncovered by an end user, who must stuff it into the database before anybody figures out the best structure for handling such information on a repeated basis. Given a near-infinite staff of database designers, perhaps these needs could be met relationally. But in real life, they’ll be solved only by a more loosely coupled approach, combining multiple modeling philosophies — relational, semantic and object alike.

That amalgamation of practical data management techniques, along with their supporting technologies, is what I’m calling database management system services (DBMS2). As an explicit philosophy, this may be revolutionary — but actually it’s only making a virtue out of a necessity. This is how data management is already done today, and it’s definitely how data management must be done in the future.

– Curt A. Monash is a consultant in Acton, Mass., and also blogs regularly on Computerworld.com. You can reach him at curtmonash@monash.com.

Don’t miss related articles, white papers, product reviews, useful links and more! Visit our Spotlight on Data Management