The game of banking

The ATB AI lab is working in partnership with the University of Alberta on applications of AI and machine learning to create better banking solutions for ATB customers.

We are leveraging the expertise of Alberta in reinforcement learning for first-in-the-world applications. The application of deep reinforcement learning that was famously used for solving games like Atari with defined rules and bounds is being applied by our team of researchers to effectively turn ‘banking’ into a closed game.

Guided by our North Star of zero dissatisfied customers, our AI Lab is finding novel, real world applications of advanced machine learning with the end goal of improving our customer strategy.

We recently sat down with one of ATB’s data scientists dedicated to the partnership, Mark Sebestyen, who is on a mission to gamify banking and will explain how his team applies reinforcement learning to their data sets.

Why gamify banking?

The reason that reinforcement learning techniques work well for solving atari games, for example, is due to the strict rules of games - at any given moment in a certain state there are a finite number of options or actions with clear end targets or rewards and future scenarios to run through to select an optimal move.

What we can do is come up with a strategic algorithm for banking. The ‘game’ we are trying to optimize is a multifunction optimization problem. Essentially, maximizing customer value while simultaneously maximizing value for ATB given a specific set of constraints.

Banking is complicated and complex. After all, we’re looking at the financial wellness of individuals and their business. In order to take this complicated context and simplify it into a well-defined context, we start by identifying the three key variables of solving any reinforcement learning games: states, actions, and rewards.

Source: Mark Sebestyen, ATB
Source: Mark Sebestyen, ATB

The state is the financial context vector augmented with information relevant to financial decisions, banking needs and personal behavioural data. For example, a time-series of account balances on what banking products a customer has, how long they’ve been banking with ATB and what specific interaction patterns they portray. Like in the game of Monopoly, each customer is a ‘player’. At any given time, they occupy a certain location on the board, have a certain amount of money, and have certain features. And at each moment there are a number of options that they have in working towards their goal of winning the game. That is where the action part comes into play.

The action could be a customer action or an action taken by ATB. For example, offering a product, change of interest rates, or consultation on financial goals all the way down to the granularity of adjusting and personalizing the UI experience for that customer while they are logged into their online banking platform. Really our goal was to systematically go through all the actions we as ATB can digitally adjust for our customers and let our engine use those to come up with the best strategy for these clients. As an example, a feature product that can be powered by our engine is a digital banking platform that can continuously learn from and adopt to varying customer needs and will continue to learn more and more from ATB customers the more they use the platform. This is the most state of the art solution to deep personalization (this algorithm is literally straight out of the hot oven of RL research) and, to our knowledge, ATB could be the first bank that is using anything like this in application.

The reward. Once an action has been taken in a certain state, the monopoly piece moves to a different position on the board. Here, we can evaluate whether financial value (e.g. investment portfolios as well as customer lifetime value, etc.) and other customer experience factors have increased, decreased, or remained the same. We do this through our own internal algorithms in combination with utilizing customer feedback and incorporation of survey results. As an example, after every google hangout session you get a mini-survey asking about the smoothness of the call. Similarly for us, these micro opinions can be easily incorporated into the platform and can provide huge value in custom tuning and deeply personalizing the UI/UX experience. For changes to our digital experience, a measure of reward may be the number of clicks it took to complete an action. For customer satisfaction and happiness, we may measure if they have moved further towards their indicated financial goals or look at how their engagement scores have changed.

Source: Mark Sebestyen, ATB

Source: Mark Sebestyen, ATB

Finally, problem constraints map out the boundary conditions - eg. everyone would be happy with a low or no interest MasterCard, but that is not something that we can necessarily offer.

We know that banking is not a game where you can boundlessly experiment with clients, which is why we are also committed to developing what we call in RL robust models. Thus, for example, our decision engine won’t offer a MasterCard to a client already possessing that very product. In our models, this can be achieved by enforcing strict penalties for approaching bounds or violating game rules.

Specifically, the K of N learner is a state of the art reinforcement learning technique for improving model robustness that we borrow from researchers on our team (Dr. Bowling and Dustin Morrill) as they employed similar techniques in solving the game of Poker. Test

Cepheus, a poker-playing program, was developed by one of the lead researchers on our ATB AI team. Dr. Michael Bowling’s team’s algorithm is able to play a nearly perfect game of Heads-up Limit Texas Hold'em. To quote Dr. Bowling, “It is so close to perfect that even after an entire human lifetime of playing against it, you couldn't be statistically certain it wasn't perfect. We call such a game essentially solved.”

In our “game of banking”, we sample N function distributions of the model at random from infinitely many choices while an antagonist fixes K of these distributions. The purpose is to train the protagonist to outplay the antagonist no matter how non-advantageous his fixed choices are. We need to still be able to choose the best solution regardless of what the adversarial fixed factors were.   

Photo by: Dustin Morrill, PhD Student - AI

Photo by: Dustin Morrill, PhD Student - AI

Additional robustness is provided by testing our engines on certain data sets where states, actions, and rewards are very well defined, allowing us to closely monitor results and test robustness of our systems in the iteration process.

While we are applying advanced machine learning models and computer systems to solve the problem of banking, we haven’t lost touch with the humans behind it. Our front-line team members are a critical part of the system, helping us quantify human factors and provide insights that allow us to monitor and adjust our models.

We are using the same information we’ve always used about our customers, but we are learning to use it in a better way. These powerful AI algorithms are capable of solving games by coming up with best strategies - called policies in RL - to play them. In a similar sense we are going to be able to provide customer strategies for ATB at a scale that humans simply can’t comprehend because of the magnitude and complexity of the data that is involved. Providing our team members with deep insights from data that they wouldn’t otherwise have and freeing their time to provide the best personal service to ATB customers.

Interested in learning more about ATB’s AI efforts? Subscribe to alphaBeta below to stay informed as we continue to explore real-world applications of AI through machine learning and share peer reviewed publications related to gamifying banking and reinforcement learning applied to complex problems. Want to build with us? Visit and apply today.

We are ATB transformation - innovating at the forefront
of robotics, AI, blockchain and the future.