Innovations in the Delivery of AI-Powered Financial Services - MATLAB
Video length is 25:07

Innovations in the Delivery of AI-Powered Financial Services

Paul Peeling, MathWorks

Explore the latest advancements for accelerating the delivery of financial services, and discover how cloud-connected and cloud-hosted software, including private and public cloud options, can bring AI-powered financial services to market. In the midst of the hype, AI, deep learning, and large language models are rarely acceptable for production use, unless their development and implementation has been carried out in a transparent, explainable and reproducible manner.

See how a ModelOps platform with data governance, continuous integration, and testing is built from reference architectures to leverage the growing plethora of technologies in a coherent, reproducible, and transparent workflow. This will include:

  • MATLAB® in JupyterHub and data science platforms.
  • Verification, documentation, and management of deep learning networks.
  • Deployment and monitoring of containerized models.

Published: 22 Oct 2024

Yes, so thank you for the introduction. I'm a consultant at MathWorks. And we're going to be discussing the innovations in how we deliver, bring AI-powered financial services to market. Hopefully, in this talk, you'll get some understanding of making AI ready for production. And I'll cover some key principles of developing these models-- explainability, and reproducibility in mind.

So I'm going to start off with this AI services roadmap. It's a cycle that outlines the critical stages in delivering AI-powered financial services. And it's intended to ensure that each step all the way from problem identification on the left all the way around the cycle to continuous improvement is addressed systematically. You may have seen the DevOps Infinity diagram. It's quite similar.

That also emphasizes the continuous nature and iteration between developing these models, and then operating them in production. And there's some ideas around collaboration and integration in these teams. So I want to go through and talk through various points of this roadmap as we go through the talk. But I want to take it on a particular theme. And it's around confidence.

So if you're ever going to bring an AI model into production, you need to build confidence in this model. And one way to achieve building that confidence is through continuous feedback. So the green lines, the green arrows you can see in this diagram are going backwards, mostly indicating that each latest stage can inform and improve previous stages. But also notice the area I've highlighted on the AI system verification level. The information you get back from this feeds back into all sorts of different design stages. I'm going to spend a bit of time on that in particular.

So today, I'm going to focus on five areas-- not everything, but five key areas here. And the first one, as mentioned, data is critical for building these models. So how can you ensure your data is high quality? How can you put robust management out around it? And how that data be validated?

The second one is on AI model development. And I want to specifically talk about the state of the AI in this case, which is LLMs and generative AI, and give some ideas of how you would develop models using those techniques. Then I'll move into the verification portion. Hopefully, you'll find some new and interesting techniques in these areas which apply generically to AI models.

And finally, we're going to be talking about the deployment and monitoring of those models. That is where you verify the model behavior after its deployment. And over pinning this all, the concept of an AI governance framework. So that's going to form the five parts of the talk.

I'm going to just dive straight in with data, which is, I'm sure, of interest to many of you. So let's talk about data first. And then we'll talk about Jupyter. So Jupyter is one of the most popular and de facto environments for data science. There are several reasons why.

First is the use of notebooks to experiment and analyze data with them. Support for multiple languages, so you're not restricted simply to one language. MATLAB is one of those key supported languages in the Jupyter environment. And then there is a lot of integration with data platform and libraries, and we'll see some examples of this later.

So let's talk about MATLAB running in Jupyter. So here you have it. You can, through your existing Jupyter-based infrastructure, JupyterLab, JupyterHub, you can access MATLAB. You can use this integration with any environment running on premises or in the cloud. And so when you then open up a MATLAB-based notebook, here, you can edit. You can run MATLAB code. You can see visualizations. And you don't need to switch between multiple environments.

You're not simply limited to the notebook interface. However, you can also directly from here open up a browser-based version of MATLAB. So this will contain the apps and the other interactive capabilities that you're already familiar with inside the MATLAB desktop.

So an example of being that would be the Data Cleaner app. So this is quite interesting that we've actually got the data that we loaded in the Jupyter environment. And then because we've got MATLAB co-located with data, we can start using MATLAB capabilities alongside other capabilities for cleaning, standardizing, normalizing, and ensuring the quality of this data.

So that's MATLAB in Jupyter. But what about a larger scale? So many of the data and also the way we run our models and algorithms are now based in the cloud. And although, we can run MATLAB in very low-level scenarios, many, many of you out there will now be looking at data management platforms such as Databricks, which might be running on Azure or AWS.

And so we're going to look at the MATLAB interface to Databricks specifically that will enable its users to be able to connect to data and connect to compute managed by Databricks, so you can query big data sets remotely or deploy your MATLAB code to run on Databricks.

So I've just got a couple of videos from one of my colleagues having this in action, which I'm just going to show you a couple of screenshots from. So on the left here, you see a MATLAB and on the right is a set of Databricks clusters here. And so I am able from MATLAB to spin up a Databricks cluster dynamically. And I can even set the scale for it. And you can see that cluster is now just starting up on the right hand side in Databricks. It's very simple to connect up there. And it will obey all of your organizational permissions and so on like that.

From there, we can then go and work with a data set that's actually now located completely remotely. So this data set happens to be on some S3 buckets in Amazon, but I'm able to connect to it through the Databricks and run it as a spark data frame. And here, I'm then able to filter. I can test, and I can plot and visualize data that's actually stored in that. And only at the point where I actually want to visualize the data do I need to bring any data back into my MATLAB session, as you can see here.

And you can take this even further. Here is an example of looking at some of the data for a maintenance problem, but also being able to build some of the particular functions and the visualizations with that data being located on the cloud. So I just wanted to give you a snapshot of the ability these days to be able to access your data and also bring your MATLAB code closer to your data. Now, we've got managed connection data. We can start doing some modeling with that data. And we're going to particularly talk about large language models and generative AI.

OK, so in terms of-- I'm not going to talk about developing a large language model. That is out of reach of pretty much all of us here. But anyway, there's ways you can use these to build models from them there. So the first type of using large language models is the familiar prompt engineering interface, which means you're directly querying the model with your data and getting results from it.

The second one is retrieval-augmented generation. So this is a concept you may have heard of which enables you to pass your own data in context to the model. The results are more based on your own data itself. And then, the fine tuning side is the ability to actually get some additional information and train it to the particular problems and the characteristics of the data that you need.

So the first one, most of all of us will have now used ChatGPT for coding and modeling problems here. So I can basically ask, for example, ChatGPT to generate some MATLAB code, to produce it there, and that can form part of my model there. And then, this can be refined into actually getting some more realistic and established results back. So you can use prompt engineering to say, I want to have data in this format and my results to return to me in this format. And it will use large language and code processing in order to achieve that. That's probably one of the simplest cases you can use.

But typically, when we're going to operationalize something, we need to use this a bit more programmatically. So this workflow looks like this. You have a bunch of documents that you're going to summarize. Each one of those then you'll need to copy and paste into the chat interface, assuming it doesn't run out of space. You ask for a summary, then you're going to copy and paste that back.

And then effectively each time, if you want to automate this process, you want to read those into the loop. You use an API instead of directly into the chat interface to create those answers, and then you'll automatically get the results back from ChatGPT and to put that into a file. So MATLAB comes with a range of APIs to allow you to connect to LLMs. And I'll provide links to that at the end of the slides. OK, so basically, this is the case for being able to automate all of these processes in order to build modeling and data applications using an LLM through an API that we provide.

Let's talk about retrieval-augmented generation. So this enables you to actually specify and ask questions about your data. So in this case here, one typical problem is we have an existing model. It's got some missing data in it. May be one of the fields is missing. What should I do?

So in order to build an answer to that, you're going to need some existing model documentation, maybe some financial reports. And on the basis of that, you'll then search through those reports to find out information relevant to your questions, such as, which column of data was missing? How would what the meaning of that column is? And what should I do with that information?

And we have the MATLAB Text Analytics Toolbox that can actually perform that search very quickly in memory and able to then add that direct information to your query. And then, you call it exactly as before. So although, retrieval-augmented generation may sound quite complicated and involved, it's not particularly. It does involve some of the aspects of the searching and the retrieval, which you can do very quickly.

And it also involves the prompt engineering concepts that I talked about before. So this is a very accessible mechanism to take existing data and use a large language model to help answer more relevant questions and have less hallucinations than you would do if you were just working with data that the large language model had already seen.

Finally, on this section, I just want to talk about fine tuning. So if you've got a more well-scoped task, you can fine tune these existing models. So a very popular model is BERT, which is a language transformer. And you should be able to modify and be able to tune these types to your own purposes, even with desktop computing capabilities.

So BERT itself is pre-trained on very large data sets with self-supervised training. So that means that the data sets weren't entirely labeled there. But then, you can also add an additional completely labeled data set in order to fine tune BERT to help you perform a specific task. So some of the applications could be documentation and word classification, and again, some more lower level question and answering than you can achieve with ChatGPT, for example.

So I hope that was helpful-- and some ideas and thoughts of how you can create new modeling and application capabilities using large language models as the basis for doing that. Now, let's going into the actual explanation and verification of those. Many of you, when you've used things like ChatGPT, you'll be very well aware of its limitations. It is capable of hallucinating, which is a fancy way of saying it will just generate nonsense or things that are factually incorrect.

And also, ChatGPT struggles to provide citations of where it got that information from. Sometimes, it isn't able to construct that information. So with using all of these models, you should clearly verify the output.

Now, the techniques I'm going to cover in this section don't specifically talking about large language models. They're more generally applicable to any deep learning network, which is a model that is sufficiently-- is far more complex than a traditional AI ML model, and therefore needs some advanced techniques to work.

So one area I'm really excited to present to you today is an area that MathWorks is doing some advanced research in and are publishing our results on GitHub as we go. And this is in the area of constrained deep learning. So this is the first time of model explainability, which we call intrinsic explainability, that the models themselves have some properties that lend themselves to being explainable and verifiable. So I've got three of them here monotonicity, boundedness, and robustness. And we'll go through each one in turn. I'm not going to cover the maths, but I want to give you a little bit of intuition and insight behind each one of these. So when you then look for these and study them more, you've got some ideas of what you're looking for.

So the first one is monotonicity. So in many financial models, it's pretty important that those models behave in certain ways with respect to their inputs. So a simple one might be an econometric model where GDP may result in a more positive outcome for whatever that model is there. So even if that model is trained on very large amounts of data with very complex decision-making going on, you'd still expect to see that linear or monotonically increasing relationship on the output of these models.

So there are methods that can be applied to deep learning, which can give you convexity in your outputs. As you can see in this diagram here, the red line is the convex line, even though the training data and the normal standard fitting model is rather more decreasing and increasing over a smaller time scale. Definitely, the convexity is a desired property of this particular type of network. So this means that you can drop and replace existing models with deep-learning equivalents if they have these properties.

The second one is boundedness. So this is particularly important when you're doing forecasting models that-- time series forecasts that could be built with neural networks such as long-term, short-term memory networks, have a property that their outputs will always remain within a certain interval over time. So this is just a plot where you've got the two outputs of this network are just superimposed on their time vector there. Otherwise, they may go negative or they may produce some erroneous values there. So this is another property you expect.

The third one, going into quite a lot of depth on this one, is robustness. So you'll see the slides later and study this. But the overall picture of this is we're talking about the different outputs, the sensitivity of the output with respect to small changes in the input data. And we're able to verify how stable the network is formally. And this formal verification, the outputs of this are shown by the-- if I try to get this working, this one here is the set of verified properties and this is-- and the set of unproven or non-verifiable properties.

You can see the pattern of that changes based on this criterion, which is the Lipschitz upper bound, which is a measure of the stability of the network. So this can be used to assess the robustness of networks, as in they will-- given small changes to the inputs, they won't create unusual or spurious results. So those were three aspects of the constraint deep learning, which you'll see links to later.

I also want to talk about explainability-- post-hoc explainability, which is where you're looking at the outputs of networks and trying to understand why a certain decision was being made. So in this case here, we've actually taken a network that isn't particularly robust. And we've got these three channels-- this time series network, and you can see that we've modified a small window of the input data. And this has lead to a complete flip in the output of the network and misclassification.

So there is a technique called Grad-CAM, which is typically-- you see on images where if you're doing the usual cat-dog image classification, you can see what parts of the image are related to making that decision. Usually, the ears and the facial features of the puppy, but not the background. Well, a time series is quite useful as well.

So here this color map here shows you the areas of highlighted where the gradients of the input is making an effect on the decision point and superimposed on the changes we made. So you see it's actually made a pretty good job in this case of mapping the contribution of the output in terms of where the inputs were made. So this is a very good example of being able to explain a time-series forecasting-based decision directly superimposed on the input data.

So that's a range of different techniques for both building explainable and verifying AI models there. I hope you find it useful. And there's plenty more you can look into this here. Now, let's say we've got a verified model and we want to take it into production and monitor what's going on when it's live. So the idea here is that the verification that happened pre-production then continues into production and beyond. And then later on, finally, when we get to the governance, we'll see what the governance framework will help us establish what we should do when a model starts to diverge from its expected behavior.

So first, I'm not going to talk a huge amount about this. But it's key to know that any MATLAB code of models there effectively can be containerized and deployed into docker containers. And at this point, they provide a REST API. So you can deploy this as microservices.

You can deploy them into MATLAB Production Server and expose them that way or you can deploy them into custom frameworks in Java and Python, for example, and expose them that way. In all these cases, this means that the model, then can be connected up to production data and production decision-making systems.

And then ultimately-- oh, sorry, no. I'll talk slightly about one of the use cases first. So the common most easiest to implement beginning is data drift as a thing that you can monitor. So data drift refers to the fact that most of our models that we are building are effectively, intrinsically stateless. So if you give them the same data over time, they'll give you the same outputs over time. But this doesn't happen in practice.

In practice, you will see that the model outputs do vary. And they can vary on different time scales as well. So I've got a link here and some different types of drift that you see in practice. Some drift, it changes at this point in time. That could be some extrinsic event. Gradual drift is a shift in the distribution. Incremental is another type of shift. And sometimes, the model goes back to where it was.

So when you're thinking about monitoring models over time, don't just think about the sudden concept. Although, that may be the easiest. Also, think about all of these different types of drift that can take place. A couple of examples we've seen recently, what we're looking at is negative interest rates. So this caused all sorts of modeling problems there. Can you handle this? Could you have encountered it with monitoring first?

Also, if we going back to the large language model example, there are limitations, for example, in the size of the input that a large language model can have. And when that gets truncated, users can get very peculiar results that they're not expecting. How might we detect these?

So going back to the diagram of when we were actually deploying these into models, the way that you answer these questions is through a technique called instrumentation. So instrumentation enables external systems to access the internals of the models in a coherent way. So this is not just information about the model inputs and outputs, but it's also information that's essentially inside the model or how long the model took to execute and so on. Things that are not directly exposed to users.

So we at MathWorks have built a solution called Modelscape Monitor. And what this does, it both enables you to monitor all of the things externally to the model, but also the model developers can add their own instrumentation within their code that can then all be captured and captured into a database, and then that database is then exposed through dashboards.

And the dashboards can provide diagnostics of how the model is doing both visually in terms of the drift concepts that I was showing before, but also indicating whether or not some of those statistical thresholds have exceeded certain bounds, and therefore the model should be stopped or a warning flag should be applied. So it gets investigated as well.

And so the last part of the talk that leads on from this then is, how do you deal with these results in practice? And how do you deal with misbehaving models that are subject to data drift or other forms of cases where models . And so this is about AI governance. Now, there's a lot of increasing focus on the use of AI. So the EU AI Act is having some impact on businesses that are using AI models in any part of their business, but especially anything that is on the consumer basis there.

And so companies, even outside the finance industry, are now having to grapple with some of the questions that are being raised here. And they're starting to realize that they need to build up an AI governance capability for their models where they had nothing before. Now, at MathWorks and-- actually, in the industry, the use of models for business decision-making is nothing new at all. We have SR11-7, the Federal Reserve letter from 2011 and, of course, SS1/23 and other regulatory papers there.

So these form a great guidance of how to build and model risk management capability, including governance that can be applied to AI models. In our discussions with the regulators, they're very strong feeling that there is no reason why AI models can't be managed in the same way as other models with just some differences in the data.

And so for this, we have model Modelscape governance, which is built for regulating the model risk management for all sorts of models. And here, we're showing it applied to SS1/23. So this can apply both understanding the metadata associated with this models.

And here's another screenshot showing you a typical workflow. So this is a workflow going from submission of a developed model through its validation stages and the final approval. And that could be used when we were talking about the explainability techniques earlier for deciding whether or not a model is ready to go into production.

So all of this comes together in a solution we call Modelscape, which is supporting an entire workflow that MathWorks has built for AI models. And today, we've just shown you a bit of snapshots of governance monitoring and the validation of such models. We've been running a series of webinars over this last year of various capabilities here. And we don't have time to go into today. But please do speak to the MathWorks team if you'd like to learn more and understand how you could apply this workflow to your own modeling.

And with that, I'd just like to conclude that we've covered these five specific areas in this AI roadmap. We could spend many, many more sessions and many more hours there. But I think there are some of the key ones that are worth highlighting and giving you some ideas. But the talk doesn't really stop there. As in, you will get these links here. There's quite a lot to cover.

But I have touched upon every single one of these links here, which is both on GitHub. A lot of our open source code contributions are here as well, as some MathWorks-- MathWorks.com capabilities. So you can try out and look at any of these things and the examples that I've showed you today.

And with that, I would like to thank you for your time and attention. And hope we may have some questions in the chat. Or if not, then we'll definitely hear from you later. Thank you.