The odd couple: ETL and EAI

META Trend: Data management will increasingly be viewed as a part of overall integration architectures and designs (2004/05). This will cause organizations to consolidate these efforts organizationally (i.e., COE for integration services), to take advantage of common technologies and skills. Distinct data integration technologies (e.g., EII, ETL, EAI) will converge (2007), ultimately surviving only as various subsets of intermediary capabilities in the service-oriented architecture (2009).

Our research indicates that organizations are increasingly confused by the marketing messages from both the ETL and EAI vendor communities. Because most organizations have installed both ETL and EAI engines and are often in the process of rationalizing their IT portfolios, they often question why a single engine is not used to transport data for business processes, data warehousing, or data replication.

The first acquisitions in the combined ETL/EAI space have already taken place. We expect to see more mergers through 2004/05 where ETL vendors acquire EAI technology or vice versa. It will take vendors until 2006 before a consistent platform will be available that is able to seamlessly provide EAI and ETL capabilities. By 2007, users should also expect to see similar technologies embedded in platforms from mega-infrastructure vendors such as IBM, Microsoft, or Oracle.

Although both ETL and EAI technologies seem surprisingly similar from an architectural view – where so-called adapters provide access to systems and data sources transformations take place to standardize proprietary formats, or routing capabilities are used to move packets of data – ETL and EAI serve fundamentally different purposes from an information management perspective. Although ETL is typically used to move bulk data from operational systems to data warehouses or data marts – all in a highly scheduled environment to avoid generating bottlenecks from high-volume transactions – EAI is the technology of choice to connect systems for business process management and workflow. However, much of the confusion arises from this artificially generated submarket of real-time ETL; which is often misunderstood as being similar to EAI, where small messages are sent – not high-volume data extractions.

As for similar marketing messages, users should not fall for the real-time notion, because it is somewhat fallacious. Although ETL jobs may offer very quick turnaround, requirements for true real-time data access allow only for querying the operational systems themselves, in which case enterprise information integration (e.g., IBM DB2 Information Integrator, BEA LiquidData) technologies should be considered.

The true value from real-time ETL (e.g., Ascential RTI Services, Business Objects Data Integration, Informatica PowerCenterRT, Embarcadero Delta Agent, Sunopsis Version 3), or SAS Institute’s more accurately worded near-real-time ETLQ, comes from being able to do bidirectional synchronization of systems and databases, as well as the reduction of full-blown and high-volume ETL jobs by using “changed data capture” capabilities to move only appropriate changes from the source to the target system. However, there is still a significant functional gap between moving data from system to system and defining business processes that may use the same data. Although they most likely have adapters to the same systems, ETL engines are not built to provide workflow-like threads across various applications with embedded “if then else” logic constructs that enable context-based routing of information.

ETL definitions are mostly created by combining one or more data sources, applying transformations, and submitting the result to a single target, which is, in most cases, a data warehouse or data mart. By comparison, in the case of EAI, a business process typically originates in a single system (triggered by an event) and, during the process flow, touches various systems, each of which can be a target and a source.

Despite all differences in ETL and EAI, the latest market changes indicate a strong convergence between both technologies. Ascential Software’s acquisition of EAI vendor Mercator demonstrates the increased interest in the ETL and bulk-data integration camp to broaden their “information transportation” capabilities into process integration. Another example of enlarging the footprint is Pervasive’s acquisition of Data Junction. In addition to data replication, it now adds Data Junction’s ETL and EAI functionalities (claimed to be originating from the same code base) and is striving to become the data integration platform for ISVs, particularly for the midmarket.

Major challenges for vendors remain in the area of standardizing ETL process definitions. Although EAI is broadly moving into standards-based environments and increasingly using Business Process Execution Language (BPEL), ETL vendors continue to use proprietary formats to describe process definitions. This makes it particularly difficult and less interesting for end users to leverage a combined ETL/EAI offering from a single vendor. Thus, organizations should be careful investing in “to be integrated” platforms, even when provided by only one vendor, because it essentially means buying two separate product lines with limited cross-functional benefits. When ETL and EAI platforms eventually converge, business users will benefit from a unified data integration platform, which reduces the number of system adapters to maintain, streamlines the development of transformations, and also increases overall reliability and auditability of business rules through common metadata.

Far more complex for vendors planning to provide an integrated offering is the introduction of life-cycle concepts into either EAI or ETL platforms. Although it is reasonably simple to create extractions, transformations, and even business processes as a one-off, managing changes in source or target systems that have an immediate impact on both ETL and EAI flows continues to challenge database administrators. In addition, version-control concepts for extractions, transformations, loader schedules, or process definitions are still missing in vendor offerings, requiring user organizations to resort to third-party version-control systems, and making it virtually impossible for businesses to implement an audit trail for changes in database schemata, extraction schedules, transformations, and business processes. In addition, organizations should not underestimate the ongoing lack of “diff/merge” functionalities in both ETL and EAI environments, which would enable users to better manage multiple versions of individual ETL or EAI processes. Solutions that manage processes purely on the source level are not appropriate, particularly when using visual development environments and changes need to be visualized beyond the source level.

In addition, organizations should not forget that ETL and EAI end users are typically very different. Although ETL remains in the IT domain (it requires deep understanding of database schema, foreign-key relationships, and SQL language), EAI becomes more a task for the business analyst, who (after adapter customization and transformation plumbing provided by the IT department) uses components and logic elements to assemble business processes, for which the IT organization generally has no knowledge.

Bottom Line: EAI and ETL continue to be separate markets undergoing mutual consolidation. Through expected vendor acquisitions, combined platform offerings with integrated EAI/ETL capabilities will slowly enter the market.

Business Impact: Organizations should continue following independent road maps for EAI and ETL until vendors offer single-stack platforms with unified user interfaces for replication, synchronization, ETL, and business process integration.

Related Download
Virtualization: For Victory Over IT Complexity Sponsor: HPE
Virtualization: For Victory Over IT Complexity
Download this white paper to learn how to effectively deploy virtualization and create your own high-performance infrastructures
Register Now