Several big problems with the Web today are preventing the flow of information, resulting in roadblocks to Internet-enabled commerce, which costs lost dollars for business and headaches for you and me.
As the poster-boy for laziness, I’m a true fan of online shopping. But having broken a sweat recently while trying to put together a friend’s gift from multiple Web sites, I’m reconsidering the online-offline energy equation.
My goal was to buy a digital camera, a case roughly matching the dimensions of the camera with a couple of extra compartments for other gadgets, and a book on digital photography that would hopefully, even remotely, relate to the features of the camera. After a lot of time and trial and error through countless searches on multiple sites, the gifts were bought. Frustration arose, though, from the amount of effort required to manually piece together so much electronic information.
If all this information is electronic, what is restricting its flow so profoundly?
First and foremost, the collection of open standards on which the Web is built is made up of either network protocols (namely HTTP and TCP/IP) or languages for information display (namely HTML). The Web wasn’t built for describing and sharing information, just for shuffling it around.
In fact, the Web in the mid-’90s started off as just a bunch of online brochures where information was trapped within hand-coded HTML pages. Then came technologies such as JSP, ASP and PHP for generating dynamic pages for a better user experience. Though labelled as dynamic, these pages are mostly templated with only certain information available at one time.
But thankfully, XML came to the rescue, allowing information to be described for better searching, reuse and repurposing, right? Not quite. XML has found its triumph as yet another layer in the networking stack, through SOAP for example, while its information management capabilities have had very limited traction.
Another part of the problem is that Web searching today is broken. It’s certainly far better now than first attempts by engines such as Yahoo to stuff billions of bytes of Web content into categories reminiscent of a library catalogue system. Library catalogues are a poor way of enabling information flow on the Web.
Google’s got it about half right. On the one hand, they’re not imposing artificial categorization over the Web (also known as ontologies) but instead using page-ranking algorithms that measure popularity from factors such as links back to a page. But on the other hand, they’re search lacks the ability to draw connections between pieces of information, costing several steps to get from “Google Search” to the desired result.
One possible solution that seemed promising at the outset was efforts by Tim Berners-Lee and the W3C (famous for specifications such as HTML and XML) to put together specifications and how-to for moving from the siloed information of today’s Web to one wherein information is richly connected — known as the Semantic Web. The example (and working demo found at: http://onto.stanford.edu:8080/wino/index.jsp) is provided showing how ontologies (hierarchies that express relationships among things) are created and used to help in the selection of specific wines that go with particular meals. Not knowing a thing about wines myself, the search agent came up with what seemed like a good suggestion for my meals of choice.
But the lofty goal of a Web richly defined by semantics requires that ontologies be created for myriad subjects of interest. Unless Web search narrows in the form of separate portals providing specific entry points for a particular subject, then this model won’t fly. This is because the average Web developer or other content creator is not going to spend the time to accurately categorize and describe continuously-changing information.
Though challenging, technical issues are addressable through innovations (such as RSS, Wikis and natural language processing), authoring tools, best practices and software vendor support. The larger hurdle to information flow is socio-economic, as companies need to transition from viewing the Web as an ego-system where information is hoarded to an ecosystem where automation beats frustration.
–Senf is the manager of IDC Canada’s IT business enablement advisory service. He can be reached at email@example.com.