There’s an alphabet soup of tools with which to create and share information. Here’s how to make sense of them all

A dictionary of markup languages

Sharing information is what the information age is all about – but sharing isn’t often simple. Electronic links with partners, customers and suppliers demand ways of defining information so applications can understand the data they share.

This has led to a proliferation of markup languages. The most familiar is HTML, the basis of the Web, which virtually every computer user encounters whether they recognize it or not. Any business not using Extensible Markup Language (XML) today probably will be soon. And there are many others with more specialized purposes.

IT professionals need at least a nodding acquaintance with the major markup languages, so here is a quick guide:

Standard Generalized Markup Language (SGML) is the granddaddy of markup languages. Based on a language developed at IBM in the 1960s, it was originally intended to simplify the sharing of electronic documents in large projects, by marking up or “tagging” elements such as body text and headlines in a generic way. One of SGML’s early uses was in publishing the second edition of the Oxford English Dictionary.

The International Standards Organization’s ISO 8879 standard, approved in 1986, covers SGML.

SGML is still in use, says Tommie Usdin, president of Mulberry Technologies, Inc., a Rockville, Md., specializing in SGML and XML. But “I haven’t seen a new SGML application probably in five years, because there are so many cheaper and better XML tools than there are SGML tools.”

Hypertext Markup Language (HTML) has been the lingua franca of the World Wide Web since its inception. Tim Berners-Lee, the Web’s creator, developed it from SGML to provide a standard way of defining web pages. The World Wide Web Consortium (W3C) has maintained HTML specifications since 1996, and in 2000 it was standardized as ISO 15445.

HTML is still widely used in Web pages, but its dominance is gradually being eaten away by the next generation, particularly XML.

Extensible Markup Language (XML) is also derived from SGML, but is more powerful than HTML. While HTML has a fixed set of tags, XML lets you define new tags, so it is really a basis for creating special-purpose markup languages.

Tags in XML can be used not just to tell a browser how to display information, but to identify the information for other purposes. For instance, an XML tag might identify certain information on a web site as the price of an item offered for sale.

XML 1.0 became a W3C recommendation in 1998. It is now in its fourth edition and is still widely used. XML 1.1 got W3C recommendation status in 2004 and is now in its second edition, but is less used than XML 1.0.

“These days there’s XML under the covers practically everywhere,” Usdin says. Many industries have defined specialized XML vocabularies. Among the most widely used are the Universal Business Language (UBL) for exchanging common business documents like invoices, and DocBook for technical documentation, Usdin says.

Extensible Hypertext Markup Language (XHTML) is what the W3C intends as the successor to HTML. Where HTML is based on SGML, XHTML is based on XML. XHTML 1.1, the current version, differs from HTML mostly in enforcing stricter rules of structure, which should make browsers more efficient by saving them the trouble of deciphering poorly formed HTML files.

XHTML 1.1 became a W3C recommendation in 2001. The W3C is working on a draft of XHTML 2.0, which will incorporate new features such as revised ways of handing forms and frames, and won’t be compatible with previous versions.

XHTML adoption has been slow so far. It is primarily used to create versions of XML documents for display on the web, Usdin says, and is of little interest to most end users.

Resource Description Framework (RDF) and Web Ontology Language (OWL) are key elements in the W3C’s vision of the semantic web. They make it possible to define sets of terms for specific purposes like accounting or medicine.

RDF is meant for describing abstract data relationships, explains Ivan Herman, the Amsterdam-based leader of the W3C’s semantic web work. It can be used to create simple ontologies, or data models, but for more complex models the W3C has also defined OWL, which relies more on formal logic and permits more complex models, Herman says.

Both RDF and OWL are published W3C standards; RDF was published as a recommendation in 1999 and OWL in 2004.

Topic maps are another way of defining data models, and Usdin says they could be seen as competing with RDF and OWL. Topic maps grew out of efforts in the early 1990s to exchange documentation among several makers of workstations supporting the X Window standard, explains Steve Newcomb, a consultant at Coolheads Consulting in Blacksburg, Va., and a veteran of the topic maps committee.

Topic maps are used to organize many kinds of information – sessions at a recent conference on topic maps discussed applications such as aircraft maintenance documentation and their use in libraries and intelligence.

ISO standard 13250, formalized in 2003, covers topic maps. Work is under way on standardizing Topic Map Query Language (TMQL), a tool for retrieving data from topic map databases.

Voice XML and Call Control XML (CCXML) are designed for programming interactive voice response systems and private branch exchanges (PBXs). Both are derived from XML.

Ken Rehor, chair of the VoiceXML Forum’s conformance committee, says VoiceXML is to voice applications roughly what HTML is to web pages. It uses tags to represent the structure of a dialogue that may include prompts, voice commands, touch-tone responses and other actions. A customer calling a telephone banking line to retrieve a balance, for instance, might be using a VoiceXML application.

VoiceXML 2.1 is nearing approval as a W3C recommendation, Rehor says. Work is under way on VoiceXML 3.0, which will be more modular and will include support for voice biometrics, but its timetable remains uncertain.

CCXML is designed to manage telephone connections in a PBX or call centre. Its capabilities include rerouting calls, adding parties to a call and so forth, Rehor says. CCXML is still a working draft, with a candidate W3C recommendation expected soon. 071558

COMMENT ON THIS ARTICLEE

Related Download
Customs 2015: The Smarter Planet strategy for customs administration Sponsor: IBM Canada Ltd
Customs 2015: The Smarter Planet strategy for customs administration
Download Customs 2015: The Smarter Planet strategy for customs administration to find how why modernization of the paper information channel can yield significant benefits and how adopting a Smarter Planet customs strategy works to improve the efficiency and effectiveness of customs operations
Register Now
Share on LinkedIn Share with Google+ Comment on this article