Lost in cyberspace

Librarians and archivists have been saving artifacts, newspapers, photographs and books for years to preserve historical records. Today, their work is made even more complicated because so much of our unfolding history is chronicled on the Internet.

That has meant changes to archiving procedures, including decisions over what electronic information should be saved and how to store it.

According to Elizabeth Adkins, global information manager at Ford Motor Co., “There is a general recognition in the archival profession that people want to do this, but technology hasn’t caught up yet.”

These issues also affect businesses, Adkins and others pointed out. But in many cases, they said, companies haven’t been paying close attention to their own digital histories. Although large, established companies have for years saved much of their past on paper, it’s unclear whether they have been as thorough with their first forays on the Web.

“There’s a whole lot that’s come and gone,” said Carol Baroudi, owner of research firm Baroudi and Associates in Arlington, Mass.

Some companies have made progress. Amy Fischer, corporate archivist at Procter & Gamble Co., said the Cincinnati-based consumer products maker has been saving digital records since it started its first Web site in 1994.

But there has been no archiving road map, she said, and much of what has been done was by trial and error. “There’s been a lot of hand-wringing in the business archive community,” Fischer said.

Procter & Gamble, established in 1837 as a small, family-operated soap and candle business, has for years collected and archived its printed ads for products ranging from Ivory soap to Pepto-Bismol. “It makes sense that we save the electronic stuff, too,” Fischer said.

The company saved its earliest Web sites only on paper printouts because no one knew how to save them digitally at the time, she said. Now all of the company’s Web and intranet sites are archived electronically to maintain a link to the company’s past.

“Not everyone is doing it,” Fischer said of other companies. “But by now, most people are aware that they need to be doing it.”

Adkins said Ford is getting there. Several years ago, the company began looking at what to do about its digital history. Next year, it will begin a formal archive using Portable Document Format files created with Adobe Acrobat software, she said.

Ford’s first Web site was launched in 1995, but neither that version nor the updates that followed were ever officially saved, she said.

“The Internet is a tool that’s designed for moving a business forward, and the people involved don’t necessarily think of the ground they’re breaking,” Adkins said. “People are more caught up in getting the job done than about preserving their efforts.”

Last month, the Library of Congress announced it had begun collecting a massive digital record comprising gigabytes of data culled from thousands of news, personal and tribute Web sites related to the Sept. 11 terrorist attacks on the U.S. The project began when a library official realized that if the electronic history of the unfolding events wasn’t carefully preserved, it would probably be lost forever.

The official, Diane Kresh, director of public service collections at the Washington, D.C.-based library, said the agency entered into a US$100,000 contract with non-profit Internet Archive, which has been saving history digitally on its network of servers since 1996. Under the deal, Internet Archive continues to do daily digital captures of Web sites related to the disaster. The archive did a similar project last year for the Library of Congress, creating and maintaining a digital history of the tumultuous U.S. presidential election that included news Web sites and the original campaign sites for both George W. Bush and Al Gore.

Brewster Kahle, CEO of San Francisco-based Internet Archive, said his group has been saving Web pages from approximately 1,900 sites for the Sept. 11 project. The organization’s mission is to build a digital library of the Internet, which it’s doing with several partner groups. The data is stored using software called a Wayback Machine, which lets users go back to sites and see them as they were when first posted on the Web.

Internet Archive has a total of about 100TB of storage space so far and has currently saved 5TB of information in the Sept. 11 archive, which can be seen at http://september11.archive.org.

The goal is to save “material that is here today and gone tomorrow,” Kresh said. “This was an event that had a profound influence,” she said. “The Internet has become the public commons and has connected people in ways that were unimaginable.”