IBM spills beans on Xperanto

XML (Extensible Markup Language) has been causing quite a splash in the database world, particularly in the last few weeks, and IBM Corp. is the latest vendor to detail plans for the standard.

In IBM’s research labs, the company is working on a project, code-named Xperanto, which will be a native XML database that acts as a subset of DB2, said Janet Perna, general manager of Armonk, N.Y.-based IBM’s data management solutions group. By using XML and relying on the XML query language XQL, Xperanto will be a critical piece of IBM’s long-term vision to marry structured and unstructured data.

“The value of this is it’s the next step beyond a federated database,” Perna said.

That step, Perna added, is information integration. IBM has application integration via its WebSphere products, business process integration from its recent CrossWorlds acquisition, and Xperanto acts as a dedicated server for data or information integration. “We have a new class of software that really is about information integration,” Perna said.

Nelson Mattos, a distinguished engineer and director of information integration at IBM’s Silicon Valley Labs, said that the customer pain point Xperanto is aimed at is how to tie together all the systems in an organization.

“Xperanto is the technology that allows customers to integrate all their systems,” Mattos said.

Mattos continued that Xperanto will be the materialization of IBM’s work on a number of Web services-related standards, including XQuery, XML Schema, UDDI (Universal Description, Discovery, and Integration), SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language) and WSFL (Web Services Flow Language).

The end goal of IBM’s integration strategy is to be able to combine structured and unstructured data, thereby enabling access to a broader array of data sets within an organization, such as Office files. So organizations would be able to access the content in the Word files that reside on individual employees desktop systems.

Both Microsoft and Oracle said they are working to enhance XML support in the database as well as toward the same goal of providing users more insight into all of the intelligence within an organization.

“The ability to search against XML data is going to be key,” said Peter Urban, an analyst at AMR Research in Boston.

Useful information also can be found in historically unorthodox data mining locales. Perna pointed to audio files with recorded conversations between customer service representatives and prospective customers as an example of data sources that potentially can be mined to glean nuggets of gold.

“The XML approach will provide the lingua franca for getting at various types of data; it is providing a sort of structure for unstructured data,” said Henry Morris, an analyst at Framingham, Mass.-based IDC. “The question is how much of this unstructured data is going to be in XML. It will be a small part relative to the total amount of unstructured data that is in a company.”

Within IBM’s strategy, DB2 handles structured data, OLTP (Online Transaction Processing), BI (business intelligence), and Web applications, while the Content Manager software takes care of unstructured information, such as rich media and flat files.

Perna said that the widespread adoption of XML has made the idea of combining structured and unstructured data come alive.

“Will the native XML database support replace the relational database? The answer is no,” Perna said. She added, however, that XML will work for certain applications. “Think of this as an application of the database,” she said.

IBM plans to post a pre-beta version of Xperanto technology to its developer Web site in the first half of next year, and the technology will be finalized toward the end of next year.

Perna said that IBM has yet to decide how Xperanto will be sold, but the options Big Blue is considering include a standalone product, or pulled into either WebSphere or DB2.