Few organizations think about how much time to recover from a disaster they have. That could be trouble
The typical Canadian enterprise is certain it has recovery of the business in the event of a disaster well in hand.
Few of them have ever worked out exactly how much time to recover they have. So far, it’s only the lack of a major disaster that’s hidden that.
Disasters come in a variety of forms. Anything that corrupts data, for instance, qualifies, even though we don’t think of it as a “disaster”.
In today’s packaged software world, and with an increasing percentage of corporate assets parked in cloud-based applications, the potential points of corruption or failure are increasing. That’s not a reason to eschew the offerings — but it does suggest that it’s time for enterprises to up their game on recovery.
Let’s take an example. About a decade ago, one of Canada’s chartered banks put in an application change that started corrupting its data.
The error was caught fairly quickly — within one business day — and the requisite backups existed. So, too, did the journalled transactions allowing all the work done before the error was caught and the systems shut down to be recovered.
Few amongst us journal all inbound transactions since the last backup. We’d rather save pennies on disk space.
The problem for the bank — and it’s a problem for most enterprises — is that business doesn’t stop just because you have a problem. So new transactions were being added to the journals even as recovery from the backup and processing of the journal was done.
In fact, it took a little over a month to get the bank back to running in real time. Hours of corruption — day after day of catch-up.
It wouldn’t have taken too many more hours having not caught the problem, in fact, to get to the point where the institution would never have caught up: forever into the future, it would be journalling, then processing later, because it was behind.
Now, add the requirements for compliance to Bill 198, or Sarbanes-Oxley to the mix. How does an executive sign their life away saying they know the operational status of their enterprise when the quarter can’t be balanced on time? How does their compliance auditor sign off?
Or think about publicly-traded organizations: financial statements must be filed, or trading is halted. (This, in turn, sets off a flurry of credit-related issues: suppliers stop extending credit, and don’t ship goods.) If you can’t close the books because you haven’t “completed” the period yet, you can’t file.
In the case of the banks, adequate reserves are dictated by the Bank of Canada, and must be maintained. Likewise, instruments (e.g. cheques) drawn on your bank but presented for payment at another bank must be exchanged and the net positions made whole, transferring funds to do so. It would be easy to deplete a bank’s capital simply from being unable to provide matching data, forcing their reserves to be raised unnecessarily.
Every business — not just a bank — has these issues. We all run on a thread of continuity. Break it, and pieces start falling down all over the place.
Business continuity, for far too long, has been seen as a “disaster recovery problem for the data centre”. Or as a “where do we send the troops if the office can’t be used” problem.
Only a very few companies have done the analyses required to figure out just what their recovery window is, how to deal with missing or corrupted data, and how to restore confidence so that that thread of continuity can be kept.
If your enterprise still sees this as an “IT problem”, then you don’t have effective IT governance. For it’s really a business problem, and needs to be handled that way.
The day we decided to use IT to automate was the day this problem became real. Stop talking about “mission-critical” applications. Start talking about how the business keeps its thread of confidence in continuity.