A co-CEO of Research In Motion has apologized for the massive three-day failure of the BlackBerry email and messaging service, said full service around the world is back to nomal now and promised a thorough investigation of the problem. However, he gave no assurances it won’t happen again.
“I want to apologize to all the BlackBerry customers we’ve let down,” Mike Lazaridis told a conference call with reporters this morning, shortly after releasing the same apology on YouTube
. Lazaridis has been in charge of overseeing the restoration of service, while co-CEO Jim Balsillie has been reassuring enterprise customers and carriers.
“You expect better of us. I expect better of us. Our inability to quickly fix this has been frustrating. We will take every action feasible to address this quickly, efficiently to minimize the risk of something of this magnitude happening again. We value your trust and commitment to BlackBerry. We are committed to restoring the trust we worked so hard to earn over the years.”
How bad this will be for RIM as it fights Apple’s iPhone and the multitude of Android smart phones is the question.
RIM won’t lose many enterprise customers because of service problems, predicts Brownlee Thomas, a Montreal-based analyst for Forrester Research who covers enterprise telecommunications and networking services issues. BlackBerry users that need RIM to meet regulatory obligations can’t choose other providers. Business and government users are more likely to switch to other handsets because their organziations increasingly allow them to bring their own devices rather than because of this week’s troubles.
Besides, she said, “these things happen.” The real test for any company with a systems failure, she said, is how fast it restores service.
The real threat to RIM, she added, is the defection of consumer subscribers because of the outages.
Iain Grant, managing director of SeaBoard Group, a Montreal-based telecommunications consultancy, said the week is “a disaster for the company’s faltering credibility” — particularly because it claims to have a messaging service that is reliable.
“The fact that this is a repeat of earlier problems, a year or so ago, suggests that the fixes weren’t as solid and robust as they needed to be,” he said in an email message. “That RIM didn’t heed the last wake-up call, and that the problem has re-occured to bite the company comes at what, arguably is the worst time possible: When the company needed something to point-to to assure investors, customers, partners that it is still in the game. It is really sad. Sad for the company, sad for the leadership, and sad too for the country as its tech star loses what remains of its lustre.”
“We run frequent tests of our system,” Lazaridis told reporters. In last 18 months, he stressed, it has been up 99.97 per cent of the time. “We are taking aggressive steps to minimize risk of this happening again,” he said, including working with equipment manufacturers to correct the particular failure mode in the core and backup switches blamed for the problem, auditing the RIM infrastructure and conducting root cause analysis to find out why the system took longer to restore than expected.
Asked if he’s worried the incident, the worst in the company’s history, could damage its reputation and cause customers to leave, Lazaridis replied “we’re very concerned. It’s a great concern.”
“We’ve worked 12 years since the launch of BlackBerry to win the trust of our 70 million subscribers, and we’re going to fully commit to win that trust back. One hundred per cent.”
RIM [Nasdaq: RIMM; TSX: RIM] still isn’t clear what happened. “On Monday we had a hardware failure that caused a ripple effect in our systems,” Lazaridis said. A dual-redundant high capacity core switch in RIM’s European network operations centre in Slough, Britain, which is designed to protect the infrastructure, failed and caused outages and delays for some customers in Europe, the Middle East, Africa, India, Brazil, Chile and Argentina.
Then a backup switch “didn’t perform as intended,” which led to a cascade failure that rippled across RIM’s systems and backlogged messages. RIM was able to restart the system in Europe, but the processing of the backlogged messages “took longer than expected.”
The initial switch failure wasn’t preceded by a network change like a software upgrade, he said.
Lazaridis wouldn’t reveal the names of the equipment makers whose gear is in the redundant switches. But there was a hint of frustration in his voice when he said, “Systems like this don’t fail this way. They’re designed not to fail this way.”
RIM has several network operations centres around the world that securely process and forward BlackBerry messages, but the nature of the problem prevented the company from merely switching traffic to them. In fact, Balsillie said part of the difficulty was making sure the problem that started in Europe didn’t bring the entire system down, and he stoutly backed the company’s strategy.
“You have to be cautious that in a global system you don’t infect the global system with your approaches,” he told reporters. “So quite frankly it was about being prudent and cautious on the global system for BlackBerry. That was the decision we took and we stand by it.”