JPMorgan: data is fuel for the business

December 14, 2023 by Chris Skinner

Building on yesterday's blog, I just stumbled across a podcast called Tech Trends by JPMorgan Chase – everything to do with tech, digital, data and fintech, broadcast by a bank. Fantastic.

The first show I downloaded is all about organising data to fuel growth. Interestingly, their Chief Data Officer defines data as the fuel for the business and that too much data is designed as exhaust, as in data that is collected after it has been used rather than predictive data patterns that can be used as the fuel for growth. I liked this analogy, and it builds upon my analogy about data as the air that we breathe.

Anyway, I enjoyed the podcast so much that I got it transcribed and highlighted the best bits, so here they are (skip to end of blog if you want to hear it or watch it) …

Anish Bani, Chief Product Officer for Commercial Banking, JPMorgan Chase:

Welcome to Tech Trends. Tech Trends is a podcast series that provides perspective on the latest trends in technology, FinTech, and digital. On today’s episode, we’re going to talk about what it means to be a data-driven business and how data can help you to grow and retain business. I’m Anish Bani, Chief Product Officer for Commercial Banking, and joining me today is Steve Turk, Chief Data Officer for Commercial Banking here at JPMorgan Chase. Steve, welcome to Tech Trends.

Steve Turk, Chief Data Officer for Commercial Banking, JPMorgan Chase:

Happy to be here. Anish.

Anish:

So Steve, data is at the core of any successful business. It enables informed decisions, enhances business performance, streamlines operations, strengthens customer relationships. What are the key components to building a data-driven business?

Steve:

First of all, let’s talk about what data means. Data literally means all data, whether it’s structured data that sits in relational databases, unstructured data, for instance; output from your operating systems; data generated from the public outside your firm; machine learning models; all of it. When we talk about data, our strategy needs to incorporate data from everywhere. The second piece is it’s really a cultural change. When you think about data, historically there’s been data experts who knit together information from systems and put it into a data warehouse. We call that data as exhaust because data is really a secondary function from the core underlying business. Where we want to go is data as fuel. So all of the systems that we bring data into a central place normalize it, use that information to not only represent individuals, a single version of truth, but also to generate valuable insights about where the business is going.

Because we now have data in the context in one place, this really requires a top down direction from the executive team and a commitment to move to a world in which all decisions are made, supported by data, and the results of those decisions are then reported on, as well as using data. The second piece is that it’s a multi-year commitment. This stuff does not happen overnight. You don’t go from multiple data warehouses to a single data-driven business in 30 days. This is a multi-year effort that requires continuous executive support and investment. And the final piece is with that data team, making sure that they’re always looking to three years down the road at what the emerging technologies are that can solve some of the data problems that they’re encountering today. As an example, when we started this process many years ago, some of the capabilities that we’re now incorporating things that are enabled by the public cloud weren’t available to us in year one, but we structured our strategy so that when technologies became available, we could then fold them into our overall strategy and improve our data driven capabilities.

Anish:

I love that analogy of data as exhaust versus data is fuel. Can you explain that a little bit better? What do you mean by companies treating data as exhaust?

Steve:

Well, if you look around historically, if you looked around a typical organisation or an enterprise, data sits in a couple of places. First of all, it sits in applications. So applications that are used for sales or to provide customer service or to keep track of financial reporting. Those systems house their data in data islands and they help to perform an individual function. But what they don’t do is, because they’re in islands, they don’t build a central knowledge set that can be used to not only better understand how the business is working today, but also to become much more predictive and where the business can go in the future. So we call that first situation, in which data is in different systems and it’s brought in secondarily to a single data warehouse to understand what’s been going on, exhaust. What we mean from data to drive fuel the business is we bring together information in a consistent way, but we also use that same information in those same applications. So people using those applications can make better decisions using all the data available within our organisation.

Anish:

Got it. So are you saying basically that you take data from different places, you bring it all together, you normalize it, you correlate it, and then you’re able to sort of see patterns when you put all that data together that you wouldn’t otherwise see if they’re sitting in those islands or those individual databases or things? Is that it?

Steve:

That’s it. And a couple of great things happen when you bring data together to see patterns. First of all, those patterns enable people to think about what else can we do with this data, now that it’s all put together. So you get innovation out of pulling together a much more complete view of what the data in the business looks like. The second is it becomes fuel for all the advanced data capabilities that you hear about in the business publications, how to use machine learning and how to use AI to become much more predictive about where the business is going.

Anish:

So let’s talk about some of that, the technology as well. Technology’s rapidly changing, keeps getting better. You mentioned data warehouses. We’ve been talking for the last couple of years about data lakes and, increasingly now, you hear about lake and things like this as well. What trends and technologies do you see emerging in the data space?

Steve:

Well, I think trends in technologies in the data space. I mean, it’s a constant subject area. The lakehouse terminology is a relatively new one at least within the enterprise. A couple years ago people would talk about data lakes and they would talk about data warehouses, and they wouldn’t consider them part of the same subject area. To me what a lakehouse structure means is that becoming much more of, I’m going to call it almost a factory type of approach to bringing data together to normalize interpreting it, and then using that as fuel for insights to drive the business going forward.

Anish:

Well, you talked about machine learning. We’ve done a couple episodes on machine learning and it’s all the rage I’ve said for the last couple of years. Nobody knows what the question is, but the answer is definitely machine learning. But you’ve also been very vocal that, hey, before you get to really take advantage of machine learning algorithms and models and artificial intelligence and things like that, there’s a lot of work to be done on the data foundation and getting your data ready. You don’t really earn the right to do machine learning until you get your data in a form that you can take full advantage of it. Can you talk a little bit more about that?

Steve:

Yeah, so to that point Anish, moving from data called architecture to data enablement through AI ML is a journey that does not happen overnight. If other organisations are like our own, initially people leapt feet first into solving problems using these fancy machine learning techniques. We were able to answer a one-time question. But what was difficult was getting adoption of those models across the firm for a couple of different reasons. Number one, it was difficult to get the actual model output for end users to use. Second of all, which is probably most important, is we found that individual machine learning initiatives were having to solve exactly the same thing. They were having to solve the context of the problem. They were having to solve the data wrangling, bringing the data together in a central place. They were also having to figure out how to get to an audience.

So what we did is say if you look at that problem, to get to higher order machine learning type algorithms in the business, what we really needed to do is build a very strong foundation. The foundation is bringing together information in a consistent way with a consistent context. As we used data to drive our machine learning models, the context was reusable and those models began to complement each other across a use case by use case.

That idea of a foundation for context, to then earn the right to be able to start building machine learning algorithms that were complimentary, was a big breakthrough moment for us.

The final piece is that last mile, getting the right information and the right algorithm output to the right person is a very difficult one. We had to build that capability of a channel to get our audience, not only that first insight but also all the other insights that we could offer against a given client or a given business process. So solving the foundational piece, and solving the ability to get distribution in a consistent way for our machine learning insights, was a key part of our journey. That sequence, to go back to your original question, the sequence and the order of doing things is very, very important if you’re going to achieve scale in your data initiatives.

Anish:

I heard a couple very important points in there. Number one is people have a tendency to jump feet first into machine learning just because it is this hot new thing that everybody wants to go into. But to really build a repeatable evergreen process, you have got to spend the time on that data foundation. You have got to get to the point where the data and the models are both in the same environment, and you can learn and train on data and other things like that. I heard that once you have models in that evergreen process, they can start building up on top of each other and you get an exponential network effect, where each model learns from the last. The third thing, which is an important one, is that you can build all the models but unless you can distribute the insights from those models in context, how people consume things in their day-to-day work life or other things like that, it’s not going to have the same value. Did I get that right?

Steve:

You got that exactly right Anish, and I think the most important thing is that the experience you learn by working in that sequence not only compounds itself, in terms of the models that you’re able to do, but you build muscle memory in the organisation for the right way to do things. So the time to that next insight is reduced. The more experience you get working within a common data platform dataset, as well as a known available distribution channel to get to the end users.

Anish:

So many corporations, and we’re no different here at JP Morgan Chase, are in the process of a large digital transformation. A big part of that is modernisation of the technology stack, the technology environment, moving to public cloud and leveraging microservices architectures and building autonomous systems. With all that as background, how do you deliver more value through data in that kind of an environment?

Steve:

So I think that first of all, the cloud itself is the single biggest enabler that I’ve seen within financial services in my twenty year career. And when I say that, it’s because it benefits in multiple ways how we think about the business. So first of all, clearly cloud and microservices architecture and all the great things from a technology perspective mean the cost of ownership of technology. You get efficiencies in that place. We’ve certainly seen that in our firm. That’s a major reason for the growth in cloud adoption across enterprises and businesses of all sizes.

But when we go back to the original point, how do you build a data-driven business? A key part of the data-driven business is to get everybody operating off the same set of facts, and also to continue to extend the ability to add new facts and new insights in a way that accelerates the capabilities of a business.

What the cloud allows us to do is the architecture of cloud. If you think about how to move data assets and how you can perform certain value added services, including machine learning in the cloud, you get the economies of scale, you get the consistency, you get the ability to incorporate new capabilities and new technologies from a road-mapping perspective, all in one place versus having to manage that across multiple different data environments and data warehouses. So truly a breakthrough in terms of not only efficiency, but the ability to create compounding knowledge and value.

Anish:

So we’re obviously a very big place. We get a lot of benefit from moving to the public cloud. How else do we optimize data to deliver greater value to our customers and to our business in a global firm the size of JPMorgan Chase?

Steve:

So the idea of creating a central data dictionary, the idea to come up with a common set of ways to calculate certain measures or certain attributes, and making that information available in a consistent traceable way throughout the enterprise is a huge enabler because now you don’t have to have as many subject matter experts around the firm who know how those old systems work. You can continue to use those systems but convert them to a normalised language that people throughout the firm can now collaborate, using the same set of facts. It takes a lot of, I’m going to call it elbow grease and rolling up your sleeves, but once you get to that normalised dataset and agree to a common set of definitions, that’s where you really get the foundation to accelerate your higher value-add ML and AI capabilities on top of that same set of facts.

Anish:

And just to drive that point home, I mean that’s a lot of the data cleanup stuff we were talking about before. So you might look at one system and they reference a (US) state as a two digit or two character state, and another system over here they reference a (US) state with the full name of that state and you can’t correlate things together. So, to what you said before about data lakes and lake houses to be able to bring all this data together, that normalisation is important. I heard somebody describe it once as, in the absence of that, you spend your whole life doing what they call archaeology, which is just getting into where does this data come from and how did I get it? And it’s a massive amount of work, but it’s really, really crucial and foundational getting the value out of this, right?

Steve:

Yeah, and I’d say from a scalability perspective, when you think about what we were trying to achieve in our foundational data platform, what we tried to do is find all of those processes that were being done either manually or differently across the enterprise and come up with a single way to do them. This would mean that we can scale that and free up resources to do value add across the enterprise. It’s not something that happens overnight, as it’s something that we continue to work on, but the benefits are not only from an efficiency standpoint and ability to do higher order analytics, but also having the firm work in much more of a networked way, being able to communicate across organisations and across functions. You can’t underestimate the value of that,

Anish:

And it also feels like something that you’re never done doing, right? You have to have that evergreen process and always there’s a lot of care and feeding of the environment around this as well. So we talked about how we deal with something at a large global firm. Let’s look at the other side of that. What about smaller firms? A lot of our clients, a lot of our listeners may not have the same resources of a JPMorgan Chase. What advice would you give them for handling the growing demands of data management?

Steve:

Get an understanding of what is happening today in your firm around data. If your firm, and I think one thing that’s common across smaller size firms as well as enterprises, is that there’s people in your firm that are doing your data work and there may be more than one of them. They’re some of the most valuable people in your firm for helping senior management and helping the organisation work today. Find out who those people are, get them together and have them come to you and say, if we can only do this, we can solve these problems and make our company work better.

Data tends to be stored in systems we’re all familiar with, but moving information to the public cloud opens up a world opportunity and helps de-risk your operations. People that know where the data comes from that helps make decisions today, what you’re able to do is convert what they’re doing today into something that is much more automated and also the ability to control that information, both from a risk standpoint as well as a privacy standpoint. You get the benefits of that too. So it’s really about elevating the people that work with your data today, listening to them and helping them build that data-driven future for your company. And I think you’ll see benefits not only from an efficiency standpoint, but also from a data risk and addressing some of the privacy regulations, which we know will continue to come a much more stable and ability to extend that environment.

Anish:

Yeah, I think one thing to add to that as well, as you mentioned the concept of data dictionaries, and I think there’s the old story. About a decade ago we were talking to somebody and said, Hey, where do we keep all of this data? And the answer of course is there’s copies of things all over the place. I think one of the most important things that you and the team has done is define all those data elements and the definitive system of record that is the one true source of all of that data where everybody should get it from. Otherwise, inevitably in any firm of any size, you have somebody and they go into a database and they make a copy of the data and it goes over here, or they manipulate a little bit and next thing you know you have got eight copies of the same set of data, all of which are just a little bit different. So I think that data dictionary work, that system of record work and that traceability, as you said is super important.

Steve:

Investing in a data dictionary, and this is a really important, you have to not worry about perfection, because using best data in a consistent way is going to get you along that journey and also point you to areas in which, hey, we need data better here. Let’s invest in getting better data so that we can continue to build our data-driven capability. It’s only in getting to that kind of strategy, in which there’s a commitment to build that single version of the truth of the data dictionary and always looking for the best source of information, the best data, and being able to augment what you’ve built. That’s how you move from a data archipelago approach, in which data exists in multiple places, to a data-driven business using the same set of facts.

Anish:

Yeah, I love that as well. Don’t let the perfect be the enemy of the good. Keep iterating and it gets better as you go. Right. So Steve, it’s impossible to have a conversation around data without also talking about privacy and security. Data, especially customer data, can be very sensitive. We have the same issue with non-public information that has to be handled with care, and there’s lots of regulatory requirements and controls required and handling procedures and things like that. What are some of the biggest security and privacy challenges that companies face around data and what do we do to address it?

Steve:

Security and privacy are incredibly important now, and will continue to be important in the future. So starting the process of understanding of how to reduce your risk around data is really important. We’re not going to talk about the cyber risk, those bad actors that are trying to break into your company, but rather let’s talk about internal things you can do to protect your data and also make sure you’re keep up with the ever-growing regulation around data, in particular privacy.

The first step is to understand where your data is. If you’re like most companies, a lot of very important data is in spreadsheets that are spread around various data stores, local hard drives, shared drives around the firm, collaborative tools. There’s information that’s stored in collaborative tools and if you look into this the protection schemes, if there are any, are pretty rudimentary.

So we really have to take a step back and say, okay, we have to protect our data and, if people are moving things into these local data stores, we need to know where those are. We’re putting adequate controls around that information and where they’re stored. The second piece is that if we are bringing our data out from things like data warehouses that individuals can access and pull information from, what we really need to know is what’s in those data warehouses. We need to be able to segregate data by data risk, personal information, things like individual names and social security numbers, etc. The highest risk information. It’s really what the privacy regulation is all about. How do we protect that information and not use it in ways that are detrimental to the individual? Segregating that information and knowing where it’s at and limiting access to what we call pi personal information is critically important.

The second is if there’s information that is confidential, either from a company standpoint and agreements that they may have with your company or from an internal operations perspective, segregating that and knowing where it’s located and making sure that only the right people have access to that information. And then finally, there is data out there that is not risky and may be important from a data standpoint. A data-driven standpoint to make sure that your firm can continue to grow and operate using things like a single version of the truth, understanding what the relative risk is of information and building the appropriate controls, and also enabling individuals to operate with non-risk or less risky information is also very important. And one of the things that we discovered around this is that less sensitive information can be used as fuel, if the information is available publicly.

And if that’s a key part of, for instance, sales and marketing efforts, then there is no sense in making that information under lock and key. We need to make sure that as many people can use that information as possible, especially if they love data, because new insights, new capabilities can come out of that information that can help your overall operation.

It’s really about democratizing data in a way that is also highly controlled to who the people are and how they use the information. In summary, making sure you know where your data’s at and building controls around it. If you do have a data warehouse or building a data lake platform, making sure you protect the riskiest most sensitive data, personal information, what the company considers to be confidential information, and understand who’s accessing that information and how they can use that information. And then finally, if there is low risk information that would help your business run better, making that much more available to individuals so that people are not only working off the same version of the truth, but also it can develop innovations off that information.

Anish:

Finally, Steve, I’m going to ask you to get your crystal ball out. So if you look forward three to five years down the road, what do you see emerging trends in data and data science? What’s going to be in the headlines?

Steve:

Well, it’s a great question because I think, going back to where we started, one of the key parts of being data-driven is trying to predict where the world could go based on elements of data problems that we face today. One of the data problems that we face is that there are some repeatable machine learning tasks, whether it be trying to extract using NLP techniques, trying to extract information from documents or information from things like voice or even from video or pictures. That takes a heavy investment in a machine learning specialist to extract that value. Three to five years from now, that is going to be a commodity that we will be able to buy. We’ll be able to buy services at a very economical price that will interpret all of this information and extract the information from documents, from pictures, from videos that today takes a heavy investment internally.

The second piece is that the idea that data is one way, and that you take a platform that brings together information, developing insights, and then pushes it out to individual users, the individual users of that information are going to become the most important data source. So not only that, it’s not a one-way communication, but it’s when we push information to individuals; what they do with that information; how they respond to it; how they improve upon that information; what are the business results. Those feedback loops will become much more real time. Today, we do get feedback loops, but it’s a very arduous process to bring that information back and say, what does it mean? How should we change models?

The ability to continually improve models based on individual user interactions in near real time is something that we certainly expect to happen, but it requires thinking about what’s the right architecture now to enable that to happen.

The capabilities around AI and ML, and being predictive and automated, will only continue to get better. And so those things that really get us to more real-time insights, more real-time feedback, more ability to understand the intent and interpret any type of data stored, whether it be documents, voice, email, video, pictures, those things are going to be part and parcel to how a data-driven organisation works three to five years from now.

Anish:

It’s fascinating. So let’s see. We’ll keep looking at the crystal ball and see how many of those things come true going forward. Steve, thanks very much for joining us and providing your insights on this fascinating topic.

Steve:

My pleasure. I really enjoyed the conversation.

If you would rather watch the show, you can do that here …

https://www.youtube.com/watch?v=XGrKa-hyico

… and there are six podcasts out there to date. Find them at JPM’s Tech Trends page.

Technology ToSlider Most Viewed Articles Featured Post Grid Digital BankCategories

Chris M Skinner

Chris Skinner is best known as an independent commentator on the financial markets through his blog, TheFinanser.com, as author of the bestselling book Digital Bank, and Chair of the European networking forum the Financial Services Club. He has been voted one of the most influential people in banking by The Financial Brand (as well as one of the best blogs), a FinTech Titan (Next Bank), one of the Fintech Leaders you need to follow (City AM, Deluxe and Jax Finance), as well as one of the Top 40 most influential people in financial technology by the Wall Street Journal's Financial News. To learn more click here...

JPMorgan: data is fuel for the business

Share

Chris M Skinner