Last week I blogged about Chime’s issues with their third party partner Galileo going down, blocking payments and cards for their five million users for a short time. I made the comment:
I would still claim that using cloud native third party partners in a technology ecosystem is far better than trying to do everything yourself
So it amused me when, yesterday, I got a call from BBC Radio 4 asking me to contribute to their consumer programme You & Yours on the new government report related to bank IT failures. The mainstream (old) UK banks average ten outages a month, with Barclays Bank the worst:
- Barclays 33 (4)
- NatWest 25 (7)
- Lloyds Bank 23 (2)
- RBS 22 (7)
- Santander 21 (4)
- Bank of Scotland/Halifax 19 (2)
- HSBC 14 (5)
- TSB 12 (1)
- Metro Bank 8 (3)
- Co-operative Bank 7 (1)
- Virgin Money 5 (3)
- Nationwide 5 (0)
Figures are from 1 July 2018 and 30 June 2019 (figures in brackets for the 3 months between 1 April and 30 June 2019)
You can hear what we said on the BBC programme here. The gist of it is that, after several major IT failures in the past year, the government is telling the UK financial regulator to get a grip on the banks. Putting this in proper business speak, here’s the report’s summary:
The current level and frequency of disruption and consumer harm is unacceptable
With bank branches and cash machines disappearing, customers are increasingly expected to rely on online banking services. These services, however, have been significantly disrupted due to IT failures, harming customers left without access to their financial services. While completely uninterrupted access to banking services is not achievable, prolonged IT failures should not be tolerated. The current level and frequency of disruption and consumer harm is unacceptable.
The Treasury Committee’s report has made a series of recommendations to overcome this and improve operational resilience, including ensuring accountability of individuals and firms, increasing financial sector levies to ensure that the regulators (which are the Financial Conduct Authority, Prudential Regulation Authority, and Bank of England) are sufficiently staffed, and ensuring that firms resolve complaints and award compensation quickly.
Key conclusions and recommendations
As an increasing number of people rely on accessing their banking online, the resilience and availability of digital channels is brought into sharper focus. The ability of firms to prevent, adapt and respond to, and recover and learn from, operational incidents such as IT failures is known as operational resilience. The number of IT failures is increasing, with the impact ranging from inconvenience or harm to customers though to threats to a firm’s viability. However, the lack of consistent and accurate recording of data on such incidents is concerning.
- The regulators must intervene to improve the operational resilience of the financial services (FS) sector, as has been required recently with financial resilience. To do so, they must also ensure that they have the appropriate skills and experience. If this proves challenging, the regulators should increase the financial sector levies to ensure that they can hire the staff with the expertise and experience required. While the role of regulators in supervising operational resilience is still developing, they must ensure that their approach is agile to adapt to changing risks. They must maintain a very low tolerance for service disruption by providing guidance on what level of impact should be tolerated. The regulators cannot allow firms to set their own tolerance for disruption too high, to avoid lax operational resilience.
- The regulators must use the tools at their disposal to hold individuals and firms to account for their role in IT failures and poor operational resilience. The Senior Managers Regime should be expanded to include Financial Market Infrastructure firms, such as payment systems. To ensure accountability for failures, regulators must have teeth and be seen to have teeth. However, we have yet to see a successful enforcement case under the Senior Managers Regime against an individual following an IT failure, which may be evidence of an ineffective enforcement regime. If future incidents occur without sanction, Parliament should consider whether the regulators’ enforcement powers are fit for purpose. The regulators must provide us with the outcome of their investigation into the TSB IT failure as soon as possible.
- Firms are not doing enough to mitigate the operational risks that they face from their own legacy technology, which can often lead to IT incidents. Regulators must ensure that firms cannot use the cost or difficulty of upgrades as excuses to not make vital upgrades to legacy systems. Given the potential for short-sightedness by management teams, if improvements in firms’ management of legacy systems are not forthcoming, the regulators must intervene to ensure that firms are not exposing customers to risks due to legacy IT systems. When firms do embrace new technology, poor management of such change is one of the primary causes of IT failures. As time and cost pressures may cause firms to cut corners when implementing change programmes, the regulators must adopt a proactive approach to ensure that customers are protected.
- There are many cases where FS firms use the same third-party providers, such as cloud services. The regulators should highlight potential concentration risks and consider whether mitigating action is required. Where common providers are systemic, the Financial Policy Committee should consider recommending regulation to HM Treasury. The cloud service provider market stood out as such a source of systemic risk. The consequences of a major operational incident at a large cloud service provider, such as Microsoft, Google or Amazon, could be significant. There is, therefore, a considerable case for the regulation of these cloud service providers to ensure high standards of operational resilience.
- As the impact on customers when IT failures occur can be harmful, firms are right to adopt a ‘when not if’ approach, ensuring that they have robust procedures in place in the event of an incident. When incidents do occur, poor customer communications can exacerbate the situation. Clear, timely and accurate communications must ensure that customers are aware of the incident and that they receive advise on remediation timelines and alternative access. When customers complain, the time taken for some customers to hear an answer is shocking and unacceptable. Firms must resolve complaints and award any compensation quickly.
The report then suggests that the three major regulators – Financial Conduct Authority, Prudential Regulation Authority, Bank of England – do not have the staff and experience to deal with the growing number of computer failures. As a result, it advises that an increase in the financial levies on banks will be needed to ensure that the regulators are adequately funded and resourced. Furthermore, on the cloud issue, it says that “the consequences of a major operational incident at a large cloud service provider, such as Microsoft, Google or Amazon, could be significant. There is, therefore, a considerable case for the regulation of these cloud service providers to ensure high standards of operational resilience.”
Interesting, and something that will no doubt be reflected as regulations in many other jurisdictions.
My own view is that third party providers fall into two categories:
- Major cloud providers providing platform-as-a-service, such as Amazon and Microsoft (who just beat Amazon to a massive Pentagon contract); and
- Specialist cloud providers offering software-as-a-service for specific activities such as APIs for checkout (Stripe) or card payment processing (GPS)
On the former, of course there is a systemic risk. However, it is far more likely that Amazon, Google and Microsoft will look after technology better than most banks, as they invented it, manage it, deal with it for companies worldwide, have disaster recovery planning that is more exhaustive than The Hitchhiker’s Guide to the Galaxy, and know what they are doing. In other words, I have more faith in cloud providers getting their platform operations right than any bank.
On the latter, specialist firms also provide singular processing that is highly specialised for thousands of clients. They also know what they are doing and know their specialist activity inside out. They can usually fix things in a matter of hours, and regularly do so, and update maybe twice a day if needed. A bank doing the same thing has hundreds of generalists doing a specialist thing that has to be integrated into half a century of legacy spaghetti. I know which one I have more confidence in.
So, the crux is that the government’s report is timely and notable. However, if you read it in full, it doesn’t mention much about risks of IT failures in challenger banks. Only in big old legacy banks. In fact, I don’t think I can find any mention of a challenger bank in the context of this report. Take note.