Standalone data quality vendors: Evolution or extinction – Part 1

META Trend: Beginning in 2005, CFOs of Global 1000 companies will justify and fund data quality programs to meet compliance and corporate efficiency requirements. In addition, data quality programs will gain grassroots support as the business re-asserts data ownership by taking responsibility for data creation and maintenance. Virtual organizational structures will slowly emerge, consisting of new job roles such as data quality analysts and information stewards. Market consolidation will continue, with point solutions acquired by larger business intelligence platform vendors. By 2006/07, data quality processes will emerge from analytics as the standard for independent data quality services. Inter-organization data sharing (especially across corporations) and application development outsourcing will be the principal drivers of data quality services. By 2007, data profiling will actively contribute to data quality monitoring by tracking data distribution trends and identifying non-linear events. By 2008/09, mature data quality programs will be able to support dynamic source selection within service-oriented architecture.

The future of the data quality market is uncertain, as it is exhibiting signs similar to those of the short-lived data profiling market. The evaporation of the data profiling market begs the question, “What is the future of the data quality standalone vendor?”

Since 2002, the standalone data profiling market has dwindled, leaving only a few contenders. Yet, data profiling demand remains strong. ETL vendors Ascential, Informatica, Pervasive, Business Objects Data Integrator, SAS/Data Flux, and others all offer tightly integrated but differing levels of profiling solutions. In addition, all these vendors either partner with a data quality vendor or have a data quality offering of their own. FirstLogic still stands out as a privately held company in the market.

Data quality is manifested in the corporate world through emerging data stewardship and governance roles, with the true context of data and data quality being derived from logical business process rules, not application processing rules.

Business context is key to implementing a successful data quality program. Harte-Hanks’ acquisition of Trillium and Pitney-Bowes’ acquisition of Group1 saw two established data quality vendors subsumed by companies that offered larger business-oriented services that are highly dependent on data quality. Harte-Hanks and Pitney-Bowes introduced data quality into their account management, customer service, and top-line market contributions.

The implication of these data quality vendors succumbing as acquisition targets may be that the data quality market is moving toward extinction. It is more likely, however, that data quality tools are evolving to become the mechanism by which the business will resume the ownership of corporate data.

When the business community accepts the value proposition of any technology, it moves out of the realm of laboratory or pilot funding and becomes a fully funded business operation.

The business must have a true desire to resume ownership of data and information and must be ready to face punitive regulations (SOX, Basel II) that do not forgive poor management or permit alibis for deliberate malfeasance based on ignorance or information failures.

Of course, with any well-funded area of interest, existing vendors will face increased competition. The largest application and software vendors are already seeking this revenue.

Beginning in 2000 or so, ETL vendors repositioned themselves as data integration services vendors and have been pursuing the data quality market through homegrown or partnered offerings.

The market is expected to exceed $8B in 2005 (and could be as high as $12B). As the ETL market grows organically at less than 8% per year, the data quality market could be key to continued ETL vendor growth.

ETL vendors currently partner with data quality vendors because their research and development dollars are committed to expanding their role under enterprise data integration services.

As ETL vendors face serious challenges to their data integration role from database and application server manufacturers, they cannot afford to divide their R&D dollars to include data quality offerings.

Notable exceptions to the partnering standard are well-funded SAS/DataFlux and Ascential Software, which are pursuing both the data integration services and data quality fronts. Even these tools can and do integrate with other data quality offerings. Group1 executed a reverse strategy and acquired an ETL tool when it bought Sagent in 2003.

The Harte-Hanks acquisition of Trillium put together a need for real-time data application processing with a solid real-time data quality solution.

An enterprise data quality offering will continue to receive funding from the business because it delivers value, maintains constituency interest, and answers the technical requirements of information systems. Such a solution remains independent of application-processing requirements, and instead follows business requirements and business-concept models.

Arguably, the standalone data quality vendor market is already on the path to extinction. Group1, Trillium, and DataFlux are all subsidiaries of larger companies. Smaller vendors such as Netrics, DataLever, and Similarity Systems are pursuing embedded partnerships, or have at their core a technical road map that follows the integration services framework approach.

By 2007/08, data quality will exist as prebuilt data services running in the data integration services layer. Demand for quality prebuilt services will continue to force spending of research and development funds. However, data integration vendors will either partner with quality experts (e.g., Informatica and Business Objects with FirstLogic) or attempt to replace them with a services development and deployment environment of their own.

Data integration services vendors pose a real threat to standalone data quality vendors. Integration services inherently seek to decouple data management and delivery from applications and repositories.

Data stewardship and governance will also be decoupled from applications and repositories. It is a natural progression for data integration services to enforce data quality rules at the time of data acquisition, or transport data according to enterprise rules, independent of application logic.

Data only has value in context – it is the location of that context that must be resolved. Technology limitations originally forced data context into application design, and as those barriers are removed, so is the impetus to place that context into application logic.

Companies should ensure that data quality applications and services achieve interoperability either using APIs or at the command-line level in such a way that enables both passive batch and real-time services for data quality cleansing and enrichment.

Data quality services should be operating-system and processing-language agnostic. Data quality tools will need to include graphical support for data relationship modeling and the ability to develop or derive rules from internal and external logical and physical data models.

Yet data quality vendors represent an experience base that is not easily replaced. Twenty-five plus years of data consistency rules, entry validation expertise, and post-acquisition cleaning will not vanish. Data quality programs in the enterprise will exist as modeled services.

A data quality vendor that combines logical and physical modeling to design and deploy data quality service objects and makes them available to enterprise processing plants will emerge as a market leader.

Some services will use commercially available databases for look-up and cross-referencing of data quality; others will train neural networks, using profiling and “seed” data. They will use logical model-based interfaces to portray data relationships among data attributes, entities, and subject areas.

Bottom Line: Organizations should pursue an enterprise data stewardship and governance center of excellence that focuses on business conceptual models to feed the business abstraction layer of the data quality services engine. Data quality program tools should read or produce logical business models that serve as the core of the design and development interface for data quality service objects.

Business Impact: Vendors that continue to focus on passive, post-acquisition data quality will continue with a profitable market share for the present, and live happy but much shorter lives.

Related Download
CanadianCIO Census 2016 Mapping Out the Innovation Agenda Sponsor: Cogeco Peer 1
CanadianCIO Census 2016 Mapping Out the Innovation Agenda
The CanadianCIO 2016 census will help you answer those questions and more. Based on detailed survey results from more than 100 senior technology leaders, the new report offers insights on issues ranging from stature and spend to challenges and the opportunities ahead.
Register Now