Open source and free data

According to Charles Goldfarb, the difference between data processing and document processing is an artificial distinction that “on one hand made modern computing possible but on the other hand set it back a century at the same time.”

Goldfarb, a long-time IBM researcher and co-developer of SGML (Standard Generalized Markup Language), was a surprise guest speaker at Big Blue’s Solutions 2000 developer conference, held recently in Las Vegas.

In the days before computers, there was no difference between data processing and word processing, he said.

“It was the same technology. Get out a quill pen, a candle, a piece of vellum and you could rule the world.”

But now two worlds exist: the physical world, which includes humans and their presentations; and the computer world, consisting of virtual representations of information.

“Most people haven’t thought this kind of thing though – the kind of stuff I’m sharing with you is not known to the vast majority of XML practitioners. But I think it’s important to know, because it can change your whole approach,” he said.

“One of the amazing things about XML is that there really is no difference between data and documents. It frees your data from a hostage relationship with particular software because XML is a standardized data representation – it doesn’t matter what program you created it in. That is freedom indeed.”

In most enterprises, the data is really, really important, he stressed.

“You spend a lot of money trying to acquire data: you manage it carefully in databases, dole it out to the right people in the right circumstances, send the data processing people to expensive conferences in Las Vegas. Nothing is too good for data and those who know how to manipulate it. “

Knowledge, however, is a different thing altogether. “That isn’t a corporate asset – that’s just stuff that happens to be in the heads of the people who work for you. They keep it written down in documents…stored in word processing files or some other place where very few people can get at it. And there’s little systematic connection between the knowledge and the data, even though they are intimately related.”

When we talk about data vs. documents, what we really mean is abstractions vs. renditions, he said. And capturing ideas as they exist in the human mind is a “fundamentally unnatural act.”

Ideas don’t have typefaces, or page layouts – when you first conceive an idea, you don’t see how to render it, “but with SGML and XML, we can capture the abstractions. That’s what’s new and special here,” he said.

“But like any good revolution it’s got enemies – and there are things you’ve got to look out for.” One of those is the erosion of the standards process.

“‘Standards’ has become a marketing term. Software, for instance, claims to be standards-compliant, but is it?” Often products include proprietary enhancements, which, if used, break the standard.

And many software vendor don’t like standards, he warned. “They always embrace standards because the market insists. But the line between embracing and smothering is very thin.”

XML will be a key driver of content and searches on the Internet of the future, said John Patrick, IBM’s vice-president of Internet technology. Already, the Web has caused a “massive transfer from institutions to people” as individuals now have greater control of their transactions, he said. And, in the near future, speed and size of applications will not be as much of an issue, because we will be “awash in bandwidth.”

Irving Wladawsky-Berger, vice-president of technology and strategy for IBM’s enterprise systems group, said although the Internet has evolved quite rapidly, going on-line still “feels like a drive in the country where you have to dodge sheep and cows, and every so often something happens…and you have to reboot the damn car.” But in spite of all that, “the applications promised at the end of the wire are so compelling that we put up with it all.”

According to Patrick, Linux is “The next big thing” in the computing industry. “In my 34 years at IBM, I have witnessed three major technological events: the first was the PC in 1981, the second was TCP/IP in 1991, and the third, in 1999, was Linux.”

At the conference, IBM announced a partnership with Red Hat to distribute bundled IBM, Lotus, Tivoli and Red Hat software to customers.

“This is the first time that a commercial Linux distributor will bundle all of IBM’s Linux-enabled software into a business solution,” Wladawsky-Berger said.

“The reason we are so excited about Linux is that we really believe that Linux can do for applications what the Internet did for mail.”