The lifeblood of the modern enterprise is information. This isn’t news. But as organizations collect more and more information from different sources and applications, it’s increasingly difficult to deal with that information.
We know how to work with databases, data marts and data warehouses, because information in those places is carefully structured and massaged. But businesses also need to work with a wealth of unstructured information from sources such as document libraries, spreadsheets, e-mail and instant messaging archives, electronic forms and records, publicly available Web pages and commercial information services.
Two elements are key to this discussion. First is the unstructured nature of content: Organizations have to handle streams of what might seem to be random text instead of the carefully delineated and validated fields that we’re used to in “normally” managed data.
The second consideration is that companies are getting this information from multiple sources, both inside and outside the enterprise. Each data source has its own organization and format, and most were designed for a single, stand-alone purpose, not to be part of an integrated data collection. Thus, these repositories tend to be silos, independent of one another, and don’t easily work well together.
We rely on a growing number of these data sources, and we need to be able to use new ones as they appear without having to rewrite our applications and tools.
The simple-minded answer to this problem is to aggregate all the data into a single, universal database or data warehouse. Unfortunately, creating such a central repository is a slow and expensive process. Maintaining and updating that repository is a job that could give any IT manager nightmares. And we haven’t even addressed the issues of scalability and who owns the information. Clearly, a better, more efficient strategy is called for.
Enterprise information integration (EII) is the general heading under which such a strategy would fall today. But approaches to solving the problem have been around for years under a variety of names. Three main factors have made the situation more manageable today:
– The growing use and acceptance of XML as a cross-platform standard.
– Cheaper and more capacious storage combined with faster, more powerful processors.
– The emergence of new tools to tackle the problem head-on.
EII products make it broadly possible to combine data from different sources whenever you need it. They accomplish this by creating an intermediate data services layer (middleware) that allows access to the data in a standardized way, instead of having to interact directly with each separate back-end data source.
Although named after enterprise application integration, a group of older technologies designed for linking applications, EII is more service-oriented than traditional EAI.
XML is probably the biggest single force driving the advance of EII today, because XML gives us the ability to tag data — whether for format, content or both — either at creation time or later on. And these tags can be extended and modified to accommodate almost any area of knowledge.
Also, consider that Microsoft Corp. has announced its intention to make XML the default save format for its successor to Office 2003.
Besides XML, EII applications today are generally built around metadata repositories and specific connectors to link to these repositories.
For EII to be practical, it can’t simply be another data warehouse. Instead, it must pull together information when needed, in a timely and ad hoc fashion. The simplest way for an enterprise to do this is to establish and maintain a metadata repository or detailed catalog that describes what data is available, how it’s stored, where it’s located and the relationships among data components.
Relying on metadata also helps reduce data redundancy, data movement and inappropriate data transformations, potentially saving both time and money.
Early metadata systems were file-based data dictionaries; these were superseded by metadata repositories based on relational database systems. A modern, XML-based metadata repository lets data architects work with dissimilar data sources that are distributed throughout the organization or even outside its firewalls.
Most EII products come already equipped with a set of tools for accessing some “standard” set of repositories. But integration almost always involves customization, so you should expect to either create new connectors or modify existing ones.
Also, some EII approaches focus on a one-way interaction with data — find what you need and aggregate with others — while others are more interactive and bidirectional in locating and dealing with information.
Finally, the type of information you’re going after (transactional documents, rich media, graphics and video, or technical data) also affects the type of interaction and connectivity needed, so EII products may have quite different sets of connectors according to the domains of knowledge they are accustomed to working in.
–Kay is a Computerworld contributing writer in Worcester, Mass. You can reach him at firstname.lastname@example.org.
Don’t miss related articles, white papers, product reviews, useful links and more! Visit our Spotlight on Data Management at www.itworldcanada.com/datamanagement.