Mining message metadata

At InfoWorld‘s CTO Forum in April, BEA Systems Inc.’s Adam Bosworth talked about missing pieces of the Web services infrastructure.

Some were usual suspects: reliable asynchronous transactions, more flexible programming techniques. But one of the items on Bosworth’s wish list – the high-performance XML message broker that he said BEA is developing – struck me as novel. If XML messages are the transaction currency of a service-oriented network, the argument runs, then there are going to be astronomical numbers of messages. While the messages are in flight, we’ll need new kinds of dynamic message stores to index, search, monitor, and correlate them.

Relational databases, as Bosworth pointed out, aren’t built to handle transient, high-volume flows of irregularly shaped data. So I wasn’t surprised when Sonic Software Corp. rolled eXcelon, a high-performance object/XML database, into its Enterprise Service Bus offering. But the picture didn’t come fully into focus until Fawcette’s Enterprise Architect Summit last month.

McKinsey partner, Chris Barlow, described how Delta Airlines has radically reorganized its application portfolio. Point-to-point integration is out; event-driven communication across a common message bus is in. When you build a system this way, message queues are the first and best way to take the pulse of its real-time state.

Later I had a long talk with Blue Titan’s chairman and CTO, Frank Martinez, who defined the scope of the opportunity as “things that last from seconds to 30 days.” These are arbitrary limits, but they bound a lot of the interesting activity in a system built around coarse-grained asynchronous transactions. When a flight is delayed, triggering a cascade of events related to gates, baggage handling, and meal provisioning, the cluster of events is hot for a couple of hours. Events surrounding a financial transaction are hot for a couple of days, until the settlement cycle ends. Martinez says Blue Titan is developing “a new class of persistence backbone” to enable real-time access to these event clusters.

Martinez makes a crucial distinction between message data and message metadata. In the realm of Web services, it’s the difference between SOAP bodies and SOAP headers. The bodies eventually land in an operational data store, the headers often don’t. Yet the headers define the context of the message: who (or what) is sending it and why.

“It’s the same message payload,” Martinez said, “but contexts are very different.” The service consumers may not even save these contexts; if they do, they’ll likely record them in separate data stores and incompatible formats. That’s not a recipe for real-time policy-directed control.

Sonic Software’s Sonic XML Server, Software AG’s Tamino, and Sleepycat Software Inc.’s Berkeley DB XML are some of the products that will be used to capture and exploit the context surrounding XML messages in event-driven, service-oriented architectures. Emerging vendors of SOA fabrics, including Blue Titan Software Inc. and The Mind Electric Inc. (recently acquired by webMethods Inc.), are weaving XML persistence and querying into those fabrics.

Today, most IT shops can’t store or process massive flows of transient data. But XML message traffic is a resource that creates strategic opportunity for those who learn to manage it well. Tools for doing that are on the way.