This is the first part of a multi-part series that will hopefully get you thinking about how to better leverage your own data.
There is no shortage of pressure facing the banking world today. You have customers expecting personalization and innovation, talent is getting more expensive, and technical debt is rising. Add to that the introduction of complicated legislation like IFRS 9, combined with the pressure from shareholders, and the need for analytics to help make important decisions increases.
It seems everyone is part of some big or small analytics and digital transformation initiative that is meant to help address these issues. From what we’re hearing from our clients, these initiatives are bearing some fruit, but the prevailing sentiment is that even better results can be achieved.
Currently, the conversation around analytics is focused almost exclusively on “interesting” topics, like the capabilities of artificial intelligence and machine learning. To be sure, these are important topics, but modelling techniques are often not proprietary. People working with data are often using similar tools and skillsets.
Let’s take a moment and review a less appreciated component of the analytics lifecycle: the data that you have stored away in your various warehouses. A lot more attention should be paid to this treasure trove of data because it’s proprietary and where your competitive advantage comes from.
The data you’re sitting on is vast
Research suggests that Canadian banks are collectively sitting on 98,000 terabytes of data. To give you a sense of the scale, according to Dropbox, one terabyte is the equivalent of 6.5 million document pages. Consider all the different ways your customers interact with you through your digital and brick-and-mortar channels, in addition to the information associated with the types of devices they use, where they shop, how much they spend, who pays them and how much, the investments they make, and the bills they pay. It’s not hard to see that it’s pretty easy to get to 98,000 terabytes.
Broadly speaking, this data falls into three categories: transaction data, identifying data, and other data which is more internal in nature
The data you’re sitting on is under-used
A 2022 Publicis Sapient survey identified that 86 per cent of surveyed Canadian Bank executives felt that they have a clearly articulated digital transformation strategy, while only 34 per cent feel that their organization has made significant progress on executing on their plans.
One of the major reasons is that not enough attention is being paid to existing data assets.
Relatively speaking, we see a lot more news about the latest and greatest modeling advances like ChatGPT, but very little on ways to get data to a state where it is in some sense quantified and it can be used as inputs for modeling.
To most people, data in itself isn’t very interesting. But data, and recognizing there are layers to data, is the starting point for any analytics exercise or model that needs to be built.
If you think of a long process where the end result is a delicious cake, the data I am talking about today represents the grain, and the sugar cane that must be processed into raw materials that serve as ingredients in the cake. Sticking with the same analogy, badly processed or poor-quality grain yields a badly tasting cake. Garbage-in, garbage-out.
The task of getting the data into a state where it is set up for modelling seems daunting. Usually, the end goal is a 360-degree view of a customer, because the customer is the entity that we interact with, and serve. When we try to make predictions, it is the customer that we try to make those predictions about. Is the customer going to default? Is the customer a fraudster? Is the customer going to attrite? Is the customer likely to accept an offer?
From the survey, we see that only 34 per cent of executives are focusing on unifying their customer data into that 360 view to really enable those kind of predictions and insights.
Uncovering valuable insights
You can uncover clues about a customer being a fraudster by dissecting their e-mail address. For example, if a customer does not have their name in their e-mail address, they are more likely to be fraudsters, so creating a very simple binary variable that tracks this can be beneficial in fraud models.
Fraudsters also tend to be lazy, so when they create e-mail addresses they intend to use on fraudulent applications, those e-mail addresses tend to differ by a number, and that number is regularly incremented. So, if you see something like sas_guy_1, sas_guy_2, etc.., it could be indication of fraud.
On the credit risk side, dissecting actual transactions is really valuable. For example, for a particular group of clients, charitable donations were correlated with higher future credit risk.
Other interesting relationships hidden in your data have to do with how many people are using one device. If many people use one device to log into their accounts, the likelihood that the accounts are associated with fraud is high. This makes sense because fraudsters aren’t going to buy different devices to access different accounts.
These are just a few examples of what I like to call “zero-day data” which allow you to predict customer behaviour from the very first couple of interactions that you have with them. When your goal is to grow in the new-to-credit segment, this type of data is valuable and often much more accurate than, for example, credit bureau data. It’s also proprietary to you.
Your data is more powerful than 3rd party data
When compared to 3rd party data, your data is much timelier.
The data bureaus serve is reported to them once a month, or in some cases less frequently. That data then gets aggregated, checked, and served back to you. So, in most cases the data that banks buy from them is two months old or older, which isn’t very useful when you are trying to prevent fraud or detect attrition.
Bank data is much more up to date because transactions, or interactions with your web page, provide up to the minute data points that can be leveraged to, for example, detect attrition, or job loss, or some other event that is important to you.
Bank data is also much more expansive. By contrast, whatever you get from 3rd parties is one or two lines per customer, with features that are derived from a very small number of “raw” variables that are themselves aggregated to reflect a customer’s state at the end of the reporting period. It doesn’t factor in the ways customers interact with you through your website or physical store.
Also, you have to pay for those one or two lines of data every time you ask for them. This is not an insignificant cost as you try to develop the digital part of your business. Once your own data is understood, accessing it is a lot easier and less costly – if not free.
To re-iterate, modelling techniques are most likely not proprietary, but your own data is. Your own data is also much more timely, voluminous and applies to your specific customers. Watch out for part 2 where I will share ways to get a handle on your data without overwhelming yourself.