The Reserve Bank of Australia (RBA) has lifted the lid on a crippling outage that halted transactions flowing through the New Payments Platform (NPP) in mid-October.
A report into the incident reveals hundreds of thousands of supposedly real-time payments sent by the public were delayed for four hours up to “more than five days”.
The official autopsy paints a bleak picture of the supposedly highly resilient real-time processing capability between the central bank and the relatively recently created consumer- and business-facing department that has struggled to generate volumes on par with payment card schemes.
The outage’s post-mortem has major implications for Commonwealth agencies like Centrelink, Tax and Medicare, which use the RBA’s real-time infrastructure for their transactions, not least delivering welfare benefits and pensions directly to bank accounts.
Self-roasting regulator
The RBA is supposed to be the regulator and disciplinarian of payments outages hitting banks and payment schemes, but instead on Monday, it found itself apologising to the very institutions it is meant to police.
“The RBA acknowledges the seriousness of this incident and sincerely apologises to industry participants and their customers for the widespread repercussions it caused,” the RBA said in the incident report, adding it had created a list of “action items”, including a review to “clarify communication roles between RBA and NPPA”.
The incident has cast a spotlight on whether the RBA is the most appropriate regulator of the payments system, given it remains a major infrastructure stakeholder and provider that previously helped forcibly create the NPP, which was controversially merged with BPAY and EFTPOS last year to create Australian Payments Plus (AP+).
It will also generate questions as to whether the NPP’s and RBA’s architecture is yet stable enough to absorb any shuttering of essentially fraud- and outage-free BPAY in an effort to consolidate platforms under AP+.
The oligopoly strikes back
Having been gradually forced to adopt the NPP by the RBA over the past decade, banks are now openly questioning whether efforts to expand NPP functionality, dubbed “action initiation”, using the Consumer Data Right is worth the risk.
In a submission to Treasury in October, the Australian Banking Association recommended “a full strategic assessment and a cost/benefit analysis be undertaken by the government to determine whether the cost of building for an action type is outweighed by the consumer benefit.
“Work should be undertaken to understand potential use cases, the scams, fraud and cyber risks, the utility to customers compared with alternative options, and the regulatory or technology barriers that need addressing ahead of implementing any action type,” the ABA said.
That’s before the consequences of the October outage hit.
Transactions aborted
According to the RBA, its systems and the NPP’s were hit when planned system work went awry.
“On 12 October at around 19:00, an operational error occurred during a planned Bank wide change using the software that provisions the RBA’s virtual servers. This error triggered a process that disrupted a significant number of servers in a random pattern over a period of approximately 25 minutes.
“The scale of servers affected was caused by a failure to comply with the RBA’s Technology Change Management policy and control gaps associated with the virtual server solution design contributed to the rapid propagation of the error. The incident affected multiple systems across the RBA,” the central bank said.
Timelines published by RBA reveal that its Fast Settlement Service (FSS) started to croak just after 7pm. Successful settlement notifications stopped working, with the NPP then advising at 8.33pm “that some aborted transactions were occurring” but that the “number and extent of aborts was reasonably low”.
Then, at 9.21pm, the NPP “advised the RBA that there had been 408,000 aborted transactions in the previous two hours”, with system recovery teams then directed to put all of their attention into investigating the cause.
The timeline also reveals that despite the incident starting at around 7pm, an NPP Incident Response Group (NPP IRG) meeting wasn’t held until 11.15pm, at which it was made clear “that the disruption to NPP processing was widespread with a far greater percentage of transactions aborting”.
“Full redundancy for the FSS was restored by the afternoon of Saturday, 15 October,” the RBA incident report said.
Mind the gap
Perhaps the biggest eye-opener in the incident report is that while transactions may flow in real-time, transaction monitoring does not.
“There is a gap in the RBA’s ability to rapidly monitor and assess the business impact of FSS incidents on the broader NPP ecosystem. While settlement of FSS transactions was confirmed to be uninterrupted, there was no available internal option to check the end to end flows as part of the business impact assessment to identify settlement aborts,
“It also took the RBA too long to determine the extent of settlement aborts occurring. The RBA will investigate, and where necessary implement, improvements to its monitoring that could have detected this and discuss options with NPPA as to whether its participant communication options can assist,” the RBA incident report said.
It also noted system pings for failed transactions are not appearing as they should.
“There is also a potential gap in centralised monitoring of settlement aborts and timeouts. Unlike clearing message aborts, SWIFT (and therefore NPPA) does not currently have centralised alerting in place for settlement aborts, or the associated NPP payment message timeouts, that can be monitored by SWIFT or NPPA (aborts are sent to both sender and receiver).
“The RBA will seek guidance from NPPA and SWIFT about whether additional centralised alerting, or any potential alternatives, should be considered.”
A billion-dollar, real-time payments platform that immediately lets you know when transactions aren’t flowing? Now there’s an idea.
This article was first published by The Mandarin.
COMMENTS
SmartCompany is committed to hosting lively discussions. Help us keep the conversation useful, interesting and welcoming. We aim to publish comments quickly in the interest of promoting robust conversation, but we’re a small team and we deploy filters to protect against legal risk. Occasionally your comment may be held up while it is being reviewed, but we’re working as fast as we can to keep the conversation rolling.
The SmartCompany comment section is members-only content. Please subscribe to leave a comment.
The SmartCompany comment section is members-only content. Please login to leave a comment.