How Big Is the World?

An undergraduate student told me last year that “if it was not on the Web then it did not exist.” The “it” she was talking about was research material.

She had a very important point (not to mention the implications a statement like that has at a research university such as Harvard, which has a breathtaking variety of available resources in its libraries and museums). Most people–and most students are people–are beginning to act as if the Web is the world’s only real data source. This is more than a bit troubling on many fronts.

The Web is now big enough to pass at first glance for a world surrogate. The Online Computer Library Center (OCLC) recently published its annual research results. OCLC projects that there are some 3.6 million Web sites (+/- a 3 per cent fudge factor) with 288 million Web pages (+/- 35 per cent). The centre only classifies 42,000 of those sites (+/- 30 percent) as adult sites, though those sites sure do raise a political ruckus far in excess of their numbers.

OCLC has quite a good methodology, which is well-explained in a document reachable from the centre’s site. So you should feel comfortable trusting the centre’s numbers as a first approximation

There is clearly a lot of stuff out there. But what are the characteristics of what is there and what is not there?

One of the biggest problems with the ‘Net is knowing the qualifications of those people creating and posting information. A particular document could have come from a future Nobel Prize winner writing in his field or it could have originated from a demented teenager spewing out fantasies. Unquestioning reliance on what you read on the ‘Net is as productive as unquestioning reliance on what you read in a supermarket checkout line.

Another significant problem with using the ‘Net as a primary or sole source of information is that the ‘Net is woefully incomplete. Very little current information is actually on-line. Some areas are far better represented than others, with the national newspapers and some areas of scientific research leading the way.

But there is a real dearth of material from most areas. This is largely a result of the fact that most people like to get paid for their labours. The Web is currently mostly no-cost access to information. People with valuable content, such as most printed books, tend to avoid putting it up lest they reduce sales of their content. Out-of-print books might seem a good target for ‘Net-based access, but copyright laws get in the way.

You’re missing a lot if your world is just the Web.

Bradner is a contributing editor to

Network World (US).