The outage last week JPMorgan Chase’s online banking site is an example of how pushing to maintain absolute data integrity could end up creating big problems for companies, a veteran database analyst cautioned yesterday.
The financial services firm suffered through intermittent problems on the site for three days earlier this month. At one point, Chase customers could not carry out any online banking transactions for a period of more than 24 hours.
The bank initially blamed the disruption on a ‘technical issue,’ but later said the problems were tied to a third-party database product used to authenticate customer log-ins.
Curt Monash, an analyst at Monash Research, said a source with knowledge of the incident told him that the outage was traced to an Oracle database used by Chase to store user profiles and authentication data. Monash said the source, who he wouldn’t identify, said that four files in the Oracle database were corrupted and that the error had been replicated in the mirror copy of the database that Chase maintained for backup and recovery purposes.
In all, Automated Clearing House transactions valued at about $132 million were held up by the disruption. In addition, about 1,000 auto loan applications and another 1,000 student loan applications were lost due to the outage, Monash said in a blog post detailing his conversation with the source.
In an interview Thursday, Monash said Chase’s problems appear to have been exacerbated by the design of the database. He said the incident was likely prolonged because the bank had to restore a lot of data that didn’t have to be stored in the user authentication database in the first place.
Bank databases such as the one that was affected typically ensure that stored transaction data is ACID, or Atomicity, Consistency, Isolation and Durability-compliant, and thus designed to guarantee transaction integrity and data recoverability in any system failure.
Monash said a lot of the data stored on the database that crashed appears to have been customer Web usage data that was but didn’t have to be ACID-compliant. Most of that data could have been stored in a separate system, he said. The excessive ACID-compliant data in this case affected the bank’s ability to recover quickly from the problem, he added.
“Not everything in the user profile database needed to be added via ACID transactions,” he said. It’s likely that even if some of the Web usage data had been lost, it would not have impinged on the integrity of the bank’s financial dealings, he said. “At a minimum recovery would have been much shorter had the data not been there,” he said.
The fact that problems with a single database affected the main Web portal, its Automated Clearing House functions and loan applications suggests that the product was a single point of failure for too many applications, Monash said.
“This was a large and complex database that when it went bad brought down many applications,” he said. It’s not clear if the benefits of tying so many applications to a single database exceeded the risks in this case, he added.
Oracle did not immediately respond to a request for comment.
A Chase spokesman too did not respond specifically to Monash’s comments. In an e-mail message, he said that the “long recovery process” was caused by a corruption of systems data that disabled the bank’s “ability to process customer log-ins to chase.com.”