Anatomy of a royal snafu

As far as corporate nightmares go, the recent IT meltdown at the Royal Bank Financial Group has to rank among the worst to come true.

From many perspectives it is a lesson in what not to do when a crisis occurs. But, to give credit where credit is due, the bank also did some things right.

It all started when an error in a new bit of code “manifested itself on the Monday/Tuesday (May 31/June 1) overnight system runs,” explained Chris Pepper, an RBC spokesperson, during a recent conversation with ComputerWorld Canada. This caused random occurrences of duplicate withdrawals and deposits to show up on client accounts. This did not affect account balances but it did cause a lot of calls to RBC. In fact, ComputerWorld called RBC the night it started and was assured all would be cleared up by Thursday, June 3. It was not, as duplicate withdrawals were still showing up as late as Monday, June 7.

The exact problem, defined as “human error,” was “incorrect pieces of code in the key banking software,” Pepper said. It was only a few lines, thereby reinforcing the notion that a few misplaced bytes can cause terabytes of problems downwind.

The erroneous code affected both the main and backup systems.

Ironically, the actual code problem was fixed within hours of its occurrence. At this point the recovery was delayed because the bank did not want launch into the end–of-day production based on incomplete information. “Until we could be sure this wouldn’t compromise other systems, (the) decision was to stop production on Tuesday, June 1 and restart later the same day,” Pepper explained. “But the verification process took longer than expected because of two days of transactions needed to be processed on the same date.”

From a technology perspective, the bank’s IT department was on top of things, although overzealous in its predictions on how quickly it could catch up. “On Wednesday, June 2, it was our belief that we could catch up to Tuesday’s processes by late that evening,” Pepper said.

However, by the weekend, paycheques from the previous week had still not shown up in some customer accounts. Horror stories surfaced of clients, tapped of cash, being told by the bank they could take out a loan — although the bank, in subsequent interviews, was adamant that it would float interest-free loans and cover any NSF charges.

During much of this, CEO Gordon Nixon was in Europe. For many, this was the biggest mistake the bank made. Nixon left the country Wednesday after he was told things would be back to normal by Thursday, Pepper said.

“There is a temptation on people who hate to be bearers of bad news to hold it at a lower level,” said John Layne, managing partner of the Orinda, Calif.-based crisis management company Contingency Management Consultants. “The ancient kings used to kill messengers who bore bad news (and) I don’t think that has entirely died out.” Though Layne did not suggest that RBC’s IT department intentionally painted a brighter picture than they knew to be true, he said with IT failings there is a need to view worst-case scenarios as a reasonable likelihood. “These are foreseeable events…if it goes beyond a certain amount of time; let’s assume that there is more going on.” Nixon should have stayed put until the fires were out, he said. “If I were CEO, I’d want to be on the scene.”

Carolyn Burke said it looks as though RBC’s contingency plan “hadn’t been run through before.”

“In a technical incidence…you’re going to have an escalation hierarchy,” said the CEO of Toronto-based Integrity Incorporated. “You have a bucks-stops-here kind of person, and if within two hours that person hasn’t dealt with the problem, that is where it will escalate to the next person up.”

According to Pepper, RBC’s IT people thought they had the problem under control. But Burke isn’t too sure how they could have come to this conclusion. “It sounds like they miscalculated, not what the problem was, but what the recovery problems were going to be,” she said. “What they (apparently) didn’t take into account was that they were dealing with another live working day.”

Pepper said the bank was fully aware of this. “The verification process took longer than expected because two days of transactions needed to be processed on the same day; this created additional complexity,” he said. Because IT had to manually override automated scheduling systems, it “significantly slowed down processing times.” Burke’s hypothesis proved true; the bank was not back to normal until almost a week later.

It is the first 24 hours of any crisis that are most important, said Lisa Lewis, president of Winnipeg-based media relations firm Beyond Excellence Inc. “The important message is not what happened, but it is what (a company) is doing to resolve it and make sure it is not going to happen again.”

“You need to communicate with all of your customers…and I think that there was a bit of a delay on that,” Burke said.

The bank had to be aware of the scope of the customer-relations problem just from the volume of traffic to its call centres, Layne said, so a “CIO type should have been front and centre.”

During this critical 24-hour period the bank, officially at least, was silent. In fact, it didn’t publicly acknowledge that there was a problem until June 2, with a release issued just before 4 p.m. The release contained quotes from Rod Pennycook, an executive vice-president.

This was part of the second big mistake the bank made. “One has to send out a single spokesperson,” Layne said. Pennycook pretty much disappeared from the scene and was replaced by Gay Mitchell, another executive vice-president. Pepper and his cohort Judi Levita were also frequently quoted. “The greatest (positive) impact would be having a single point of contact or a single face,” Lewis said.

Suspiciously absent was CIO Marty Lippert, the one RBC executive actually capable of fixing the problem. But Pepper said this was intentional. Lippert was “most useful” solving the problem, not talking to the press, and since it was a problem affecting customers, Mitchell was a better fit.

Regardless, Burke said the top of the escalation hierarchy, which should not have more than five levels, should have been hit sooner. As it stood, it was days before Nixon spoke to the public.

Though the IT problem is behind it, RBC still has a lot of work left to do. “We have to re-earn the trust of clients and re-earn the trust of, frankly, a lot of Canadians,” Pepper said. The bank has started to review the problem, how it occurred and how it was dealt with.

In a move all the experts said was wise, RBC has brought in perennial IT heavyweight IBM Corp. to offer an objective look at what went wrong and how to avoid similar problems in the future.

Related Download
3 reasons why Hyperconverged is the cost-efficient, simplified infrastructure for the modern data center Sponsor: Lenovo
3 reasons why Hyperconverged is the cost-efficient, simplified infrastructure for the modern data center
Find out how Hyperconverged systems can help you meet the challenges of the modern IT department. Click here to find out more.
Register Now