Cheap and plentiful data storage can be both a blessing and a curse. It creates the illusion there is infinite space available for data on networks. This intensifies human pack-rat tendencies to hoard data instead of evaluating what’s really needed, so much so that demand for storage is growing by 50 per cent annually in some areas, far outstripping the rate of storage cost reductions.
All this accumulating data has impacts far beyond storage costs, cascading into all aspects of IT operations and management.
“That’s the trivial part of the problem,” says Roy Wiseman, CIO for the Region of Peel, pointing out there would still be a big problem even if petabytes cost pennies. Time, resources and energy are being consumed to house, back up, archive, secure and manage vast quantities of junk data along with the good. “This is unsustainable,” he says.
Waste is not the only issue. Public sector CIOs are expected to respond instantly to requests for information contained in their systems, says Wiseman.
“If someone says, I need you to produce any document in your system that refers to X, there will be billions of files to be searched to find it: e-mails, drafts, duplicates, multiple versions, all considered corporate records,” he says. “The information management component is fairly scary relative to what people are expected to do.”
Digital preservation for eternity
Hoarding data at exponentially increasing rates is a universal problem across all sectors. But what makes the problem trickier in the public sector is the fiduciary responsibility to preserve public records for very long periods of time, says John Webster, storage analyst at Nashua, N.H.-based Illuminata Inc.
“Thirty years is a long time in the private sector,” he says. Not so for government records, and some types such as legislative documents need to be preserved for centuries.
Electronic storage media are constantly evolving and being replaced with new forms. But all are magnetic-based, and magnetic fields degrade after a few decades. While most public entities have a migration strategy to shift critical documents from obsolete media to the next incarnation, no one has figured out how to deal with software obsolescence, says Webster.
“If you want to read data back in the future, will there be software that understands it? The CIO of Massachusetts says this is his biggest issue.”
Wiseman agrees the hardware component of digital preservation is well understood, but not the software issue. For example, documents created in defunct software such as WordStar can’t be read by other programs today. “So you may have the information stored, but is it accessible? We haven’t come to grips with maintaining the capability to read it over time.”
Another tricky aspect is the need for public disclosure, and to be in a position to respond to any request for information, says Webster. “In the public sector, the attitude is, ‘You’ll never know what you’ll be asked for, so you may as well save it all.'”
The move to make government services available online has also resulted in increases in the volume of transactional data that could be subject to public access, he adds. Data can be gotten rid of within reasonable timeframes in the private sector, if policies for deletion are developed that can be legally supported, he says. “But I’m not sure they have that luxury in government.”
Tiered storage is only a stop-gap measure, adds Webster. Moving less frequently accessed data from servers to cheaper secondary storage media does reduce costs.
“It may get you bigger bang for your storage buck, but if you’re not figuring out what you can delete in that process, then all you’re doing is creating a great big back-end that keeps growing. And that’s getting bothersome not just from a technology and management standpoint, but also energy consumption,” he says, pointing out data centres everywhere are scrambling to keep up with space, cooling and power requirements as more and more computer gear gets added.
The limits of technology
Paper records had an in-your-face visibility that imposed discipline. “There was limited room in the warehouse, and as new boxes arrived, you had to do something with the old ones,” says Wiseman, pointing out that paper enforced retention schedules. But that physical requirement is gone with digital records.
“You now have all this invisible information accumulating.”
Lack of process discipline is at the core of the problem, says Rose Langhout, head of I&IT strategy, policy and planning at the Ontario Ministry of Government Services (MGS). “We bought into a whole bunch of technology and assumed it would take care of things,” says Langhout.
Technology exists to solve the five per cent of the problem that it can, adds Wiseman, but 95 per cent of it revolves around human process and policy issues. Even software that tackles some components can get tricky if people don’t look beyond costs to management issues.
For example, network analysis tools routinely find hundreds of copies of the same file, notes Wiseman.
“De-duplication software exists that can eliminate multiple versions of files and have them all point to a common storage place in a way that’s invisible to the user. There will probably be a significant investment in these tools as an alternative to storage in the future, but it may actually be cheaper to buy storage.”
There are also IT policy issues to settle around formal and informal documents, he says. For formal digital documents that have equivalents in the paper sphere, there are established retention policies. But e-mails, voicemails, instant messages: these new informal communications are in a gray zone today.
“At a simplistic level, you can say they’re not corporate records. However, legal cases are increasingly finding that e-mails in fact are, and cases are being won and lost based on e-mails sent years ago.” Voicemail may well become future legal fodder as storage costs drop further, suggests Wiseman.
“Few voicemail messages are stored today, but they have exactly the same characteristics as e-mail, so why are expectations different?”
Many telephony networks enforce discipline by automatically deleting voicemails after a period of time, and Wiseman says some government organizations are doing the same with e-mails. Their policy is to view e-mail as equivalent to hallway conversations, not corporate records, and to delete them periodically.
“So if it’s something that should be retained, it’s the recipient’s responsibility to ensure the e-mail is moved out of the e-mail system, which is not considered a corporate repository. But only a small number of jurisdictions have done this.”
Another emerging area of concern is tracking information made available on government Web sites. “What is the information provided to the public through our Web site and what did it say at a specific point in time in the past? There are some new information requirements coming out of this,” he says.
For example, if a Web site that informs the public about road conditions announces they’re good on a particular day, and if someone has an accident, then IT staff may need to reproduce the Web site’s contents later.
Many government organizations are implementing content management systems to facilitate information management both offline and online, but these don’t obviate the need to hammer out policies and procedures around retention schedules, access permissions, version control, and so on, says Wiseman.
And IT has no magical ability to make people follow rules. “If I want to impose discipline on my organization around how information is stored, the technology for that is available. But getting people to use it is ultimately what it’s about,” he says.
“Even with a well articulated storage framework, it doesn’t mean people will store it that way. Having the digital equivalent of the Dewey Decimal System won’t ensure the books will be on the right shelf.”
Nor can IT make people share resources. Server virtualization and consolidation are good ways to manage storage resources more efficiently, says Wayne Jensen, executive director of Workplace Hosting Services, the shared services provider for the B.C. government. But some departments are reluctant to share servers with others, he says.
“Some customers still have a sense of ownership of assets that were transferred to Shared Services B.C. and in holding on to this ownership they are not concerned with the broader perspective of resource sharing across government. There are cultural problems around virtualization that need to be overcome.”
Information management practices in government are inconsistent, says Langhout. There are few issues around documents known to be destined for the archives. “Legislative documents, cabinet submissions: in areas where we know documents are required forever, there is a higher level of practice.”
In other areas, incentives are in place to inspire good practice. “Program records are also well kept because we know there are program audits,” she says. But there are issues in other areas. “The administrative machinery of government produces a lot of material, and we’re not good at picking the wheat from the chaff there.”
Langhout points to policy development as one example of an ambiguous area that tends to be overlooked. “We might have a dozen initiatives in a year, some of which turn into decision documents, and others are research that gets put aside.”
These files belong to the analyst, who may or may not maintain them well. “If there’s analyst turnover, the material may stay in a shared drive for a couple of years, and by then the institutional memory is gone.”
The file may be stored somewhere, but without knowledge of its contents, it’s lost for all intents and purposes. Stories abound of new staff redoing research or other work that has already been done by predecessors, she says.
Lack of standards contributes to the problem. “Take something as simple as file naming. If everybody’s doing it differently, it’s hard for people coming in later to figure out which documents are important,” she says. “Part of the solution is providing guidance so people all do it the same way, so there’s less dependence on getting information from the person who had the file before and more ability to use conventions to identify what’s important.”
While policies may need to be established in some emerging areas, there are established retention schedules for most types of documents, she says.
“But adherence to policy is not what we would like it to be.” Major cuts in the public sector in the past two decades play a role. “We have zillions of copies of stuff, but it’s less work to store it than to figure out what we need to keep,” says Langhout.
“Many ministries have good intentions, but not the resources to put better practices in place – until there’s a problem,” she says, noting that huge resources were invested in tracking the paper trail in the Walkerton contaminated water inquiry.
Langhout believes these information management issues will become an impediment to modernizing government services if they aren’t resolved. “The pressure is on us to use information across many different areas in order to respond to the public’s needs,” she says, noting Web mechanisms to communicate targeted information to citizens will be key in the future.
“It’s how we make use of the information available to us that helps us re-organize our service delivery to produce better results. In the modernization agenda, it’s imperative to recreate a Service Ontario organization that becomes the retail expert for the government.”
Amazon.com uses information successfully, she says, because it has a simple goal: profit. “They know if they get more information about books out, they’ll get more people to their site and they’ll sell more books.”
The public sector may have more complex goals around tackling social ills or promoting economic development, but Langhout believes a profit-like reward system could be built around improved information management that delivers results.
“Where efficiencies are federated and produce more effective services, there should be a provision to reinvest some of those savings, so there is an incentive system for program areas to get better. While staff may get pats on the back and good performance reviews, there’s no incentive structure for the program itself.”
The archivist’s perspective
Two camps need to join to solve the information management problem efficiently, says Illuminata’s Webster. “We need to put together IT people, who are good at automating processes, with records management people, who are adept at classifying information, so we don’t rely solely on humans to do it. This is just starting to happen in government and the private sector.”
And this is long overdue. System implementations over the past few decades have primarily focused on getting information into systems, says Paula Johnson, director of administrative records at the University of California in San Diego (UCSD). Managing the information once it’s collected has been ignored until recently, so records management experts have not been part of the process in developing infrastructures that manage information throughout its lifecycle, she says.
“Providing secure access to information over time, being able to migrate it as IT architecture changes, determining the best storage media – this is where our profession is frustrated, as we’re not called in.”
This has left IT struggling with many inappropriate areas, such as user compliance. Johnson tackled this issue at the UCSD, which includes six colleges, two hospitals and several research centres in supercomputing, oceanography and other areas.
“We enacted a policy in 2003 that made e-mail recipients the owners and custodians of the record, and provided guidelines that helped people understand what was business e-mail or incidental use,” she says. “So the IT group looks after the e-mail system’s efficiency without having to deal with retention issues.”
Johnson says she often speaks at technology conferences to educate IT folks about areas where they may be misguided. For example, many are surprised when she tells them incremental back-up tapes don’t serve as a credible archive from a business continuity perspective. The amount of data on networks is getting so huge that doing a full back-up of all the data and software becomes difficult, she explains.
Instead, many resort to storing only what’s different over a period of time, typically a day. “But if you have 2,000 back-up tapes with incremental pieces and you need to restore your system, you have to feed them all in, and this is very difficult. Everyone goes into this mode: So what do we really need on our system; can we pick and choose?” Incremental back-ups have become popular but are not ultimately fully recoverable, since systems typically need to be reconfigured after a systems failure, she says.
Archivists are deeply concerned about these types of misconception, which have played a role in the obsolete storage media issue now coming to the fore, she says. Optical disks, for example, were used in the past. But once stored and put away, these are typically left out of later storage upgrades and migrations to more current storage media.
“We’ve created an environment where we think we can go back and pull the information so long as we keep the disks, but this becomes more difficult and expensive over time,” she says, echoing Wiseman’s comment about eventually losing the technology to read it. “We’ve lost so much of our recent history in government sectors. The pyramids lasted thousands of years, but we’ve already lost the knowledge to recreate the technology of NASA’s Apollo moon missions.”
A similar point about lost information was made by John Reid when he left the helm of the Information Commissioner’s office in 2005: “The 20-year period from 1978 to 1998 significantly threatened the public record and destroyed the audit trail.” And Robert Marleau, appointed as Information Commissioner in 2007, has noted that revisions to access laws may be in order, given that technology and the way government manages information has changed in the last quarter-century.
Access, security and privacy
One area that has really taken off in recent years is the public’s request for information, says Johnson. Legislation such as Public Records Acts and Freedom of Information Acts was enacted in recent years to allow public access to information collected by government. “Every U.S. state and most Canadian provinces have opened the public records required by law,” she says.
But legislation requiring the security of data and privacy of individuals has also been enacted. Without a comprehensive IT architecture that addresses all aspects of information lifecycle management (ILM), it is becoming increasingly difficult to satisfy and balance all requirements, she says. “At first, the idea was to make data exposable and access available; then we needed to compartmentalize it to make it secure; and now we’re saying we need to make decisions based on privacy.”
Johnson believes legitimate access to information is sometimes trumped by security and privacy considerations. For example, the UCSD’s campus council had a problem with scanning paper records that might have social security numbers into their system, even though the move would improve business operations and make records more accessible.
“I would be more concerned about securing access to the network once they’d scanned them, than that they’d scanned documents with personal information,” she says. “Many people are taking this a shade too far.”
The authorities need to reconcile this, as different areas of government have competing priorities, she says.
“In legislation, there’s the objective to secure information, but this also rides hand-in-hand with our sunshine laws and improving exposure to the public. Often the two don’t gel, and it affects how we develop guidelines. We don’t have one overarching policy that encompasses access, security and privacy. Nobody’s looking at all of it as one piece of data.”
These issues affect IT operations, as masses of data requiring decisions about storage, security and other aspects are piling up, says Jensen. “Providing security for information that is not defined is difficult,” he says. “We’re spending money and effort securing data that may not necessarily need to be secured.
“If the data isn’t differentiated, then we have to secure everything equally, instead of putting higher security on information that really needs it. If we had a complete philosophy around ILM, security would be exactly where it should be, and we would know what the information is, how long to keep it and who needs access to it.”
Rosie Lombardi is a Toronto-based freelance writer. She can be reached at firstname.lastname@example.org