The cost of systems downtime for CIBC

The Canadian Imperial Bank of Commerce (CIBC) is a full-service financial institution that boasts more than six million individual customers, 350,000 small business customers and 7,000 corporate and investment banking customers. John Lahey the CIBC’s chief operations officer of e-commerce, recently talked with IT World Canada about the effects of systems downtime on such a large financial institution.

IT World Canada: What does CIBC do to eliminate systems downtime?

Lahey: First of all, you can’t ensure that there’s never any downtime. I think that consumers, including internal users, treat many of today’s electronic delivery channels like they would a utility. When you come to work in the morning and flick the light switch, you expect the light to come on. It doesn’t matter that the hydro company may have had 40 people working through the night to make that happen. If you’ve paid for the service, you expect it to be there no matter what. The same is true with information technology systems. Yet in spite of the fact that the world has changed for the better from a systems reliability perspective, technology departments of large corporations still tend to measure how they do things based on an old model created around largely unreliable systems.

IT World Canada: In what way?

Lahey: What they measure is percent of uptime, percent of items processed accurately and that type of thing. That was very useful in the past but it is no longer an accurate measurement of performance. The built-in assumption when you’re measuring systems availability is that 99.6 is better than 99.5. That assumes that every minute of downtime is worth the same as the next minute of downtime. Now that business is using technology beyond the mere automation of routine manual transactions, things are different. In some cases these systems are the business. It also means that all downtime isn’t the same.

IT World Canada: Can you provide some examples?

Lahey: Absolutely. Let’s take the ATM system as an example. If it’s down at 2 a.m., there’s no effect at all because customers aren’t typically standing in line waiting to use it. But if it happens at 3 p.m. Friday afternoon, it has a significant impact. You can actually calculate how many customers are likely to have been inconvenienced because they walk into the branch. The branches are all staffed with the assumption that customers are using ATMs for 90 percent of their transactions. When the system is down, all of a sudden you have a customer expectation problem. The customer flicked the switch and the light didn’t come on, so he’s upset. There’s also a fallback problem because the customer will automatically move to another channel. Consequently, you have to build a lot of redundancy into the other channels because of how critical uptime has become. Debit is another good example of that. If the debit system goes down, lines of shoppers with full carts are standing around with no way to pay for their groceries.

I really think that the calculation of uptime as a performance measurement is a big problem for companies in that it causes them to think inaccurately about it.

IT World Canada: What kind of measurement would work more effectively?

Lahey: What you really need to look at is the number of customers that could be affected if you experienced downtime and actively manage the situation. You don’t want the debit system down on December 24 , so you organize it to ensure that you have peak load capability and that you’re not running systems changes that date. You might even want to take down the system all night on December 23 to make sure it’s ready. An organization that’s measuring uptime would never take it down for eight hours to make sure that it’s up for eight hours at a critical time, even though that might be the best thing to do from a business perspective.

IT World Canada: Why not?

Lahey: Because the bonus programs in some technology groups are actually partly based on meeting systems availability objectives. Sometimes you have the technology and the businesses almost at odds with each other even though they’ve agreed on the performance metrics in advance. I think that we’re in a whole different world now, and I’m not sure that the measurement that drives business-appropriate behavior and investment has truly caught up.

IT World Canada: So far we’ve been discussing mainstream banking systems. What happens when systems fail in e-commerce applications?

Lahey: I think that the situation becomes even more difficult, because customers don’t have a bricks and mortar alternative in the event of systems failure. They may have a call center option but that’s about it. A company that relies exclusively on electronic payment has no alternative when the system is down. Chances are that it never had direct contact with its customers. It’s extending goods and services on receipt of payment, and if there isn’t one, the company has only two choices: Don’t ship the product without payment, or accept the risk and provide the product. For companies that are truly electronically based, the risks are a lot higher.

IT World Canada: What about the special systems demands made by on-line trading applications?

Lahey: All of the major financial institutions that run discount brokerage platforms would likely say that they get a disproportionately large amount of the pressure and headaches from those particular applications. I personally wonder how we make money at it. It really depends on volume because you’re giving away a lot of your margin, and in view of all the technological problems, it’s challenging to make a profit. When they log on, customers have an expectation of systems availability. What happens if at 5 p.m. today the customer puts in a sell order for shares of a particular stock and our system is down, and by tomorrow morning when our system is back up those shares are worth a fraction of their earlier value? The consumer has acted in good faith and has naturally assumed that certain things would happen. Although the fine print and the courts may say that the bank is not at fault, the damage is done. In the end, the relationship between the customer and the bank is all about confidence. If customers use something as important as a trading platform and they’re not satisfied with the service, it’s easy to switch to another one and they will.

IT World Canada: In your estimation what is the right way to plan?

Lahey: From a business perspective, the technology people must have a very open dialogue with the business people. They also must have contingency plans and backup plans. I wonder what the major discount brokers would do if the market crashed? What would they do if they received 10 times the number of orders anticipated? Because of planning for Y2K they may be better prepared than before, but they need to be aware of where their big risks are. From a business perspective, customer expectations are what are driving technology initiatives, although I’m not sure that the connection between them is as close as it should be.

IT World Canada: What has the bank done to prepare for the special demands of e-commerce applications?

Lahey: It’s had to implement the required communications capability. In fact, we could stand up to any financial institution in the world in terms of the capacity of our customers to communicate with us. For example, we bought exclusive rights to a thread of communication between our Streetsville and Markham sites, and today, if we loaded all of our bank traffic on it, we’d probably consume about 10 per cent of it. Our customers’ expectations in the market continue to escalate, and we’ve purchased an enormous amount of bandwidth capacity to meet them.

Pat Atkinson is a freelance writer and editor based in Oakville.