How can data science support model management? A reverse stress test illustration using the STAMP€ IT platform


Financial stress testing generally involves a team of modellers working closely together, applying complex mathematical models to large and granular datasets. So, how can data science support the management of complex mathematical models, such as used in the financial sphere?

STAMP€ IT – a data-science platform developed by Dr Jerome Henry and his colleagues at the European Central Bank, can be used to manage models employed in the case at hand for financial stress testing.

Read the original research:

Read more background research:

Handbook on financial stress testing:

Guide on reverse stress testing:

Disclaimer: The podcast builds on work by Jerome Henry that does not necessarily reflect the views of the ECB.




Image Credit: Adobe Stock / Sergey




Hello and welcome to Research Pod! Thank you for listening and joining us today.


In this episode we look at how data science can support the management of complex mathematical models, such as used in the financial sphere. By way of example, we investigate STAMP€ IT – a data-science platform developed by Dr Jerome Henry and his colleagues at the European Central Bank, that can be used to manage models employed in the case at hand for financial stress testing.


Financial stress testing generally involves a team of modellers working closely together, applying complex mathematical models to large and granular datasets. The team must be able to adjust their models, as well as sharing and running multiple simulations swiftly and securely under strict confidentiality. Performing these tests effectively and efficiently requires an innovative IT infrastructure. It can also help evaluate processes and controls within the organisation, as well as mitigating operational risk.


STAMP€ IT is a data-science platform where models can be stored and shared. Users can create and change models, run, and analyse simulations, in a secure environment. STAMP€ is flexible, adapting to changing needs, and scalable so it can handle expanding workloads.


The platform can be used for both model execution and development. Team members can run a model and analyse the results, or they can change the equations and trial different versions of the model. Multiple models can be hosted, developed, and run, using programming languages including Python, R, and Matlab, covering many modellers’ standard needs.


Embedded in the platform is a project manager function that defines and monitors three types of users: Basic Users who just want to press the buttons to run simulations and get the results; Advanced Users who are allowed to make changes to the model assumptions and some equations, but only to a limited extent; And Developers who have the autonomy to amend and fix both the models and the simulation designs.


The project manager function also provides a framework  for ‘stamping’ the models employed. Any version of the model has to be validated through an integrated governance process to ensure that both the basic and advanced users carry out simulations and generate results based on the “stamped” version at any point in time. This versioning and tracking of executions means that there is a record of which set of equations generated which results from which data, so any team member can safely and accurately reproduce a specific simulation outcome, using a management approved toolkit.


The STAMP€ toolkit is made up of three building blocks: Jupyter, Bitbucket and MLflow. The Jupiter Hub generates notebooks; notebooks are lines of standardised code that users can run to load data, read the model, perform the simulation, and then process, format, and store the results. Bitbucket is a code repository used to store master copies and variants of the models, while Mlflow facilitates the comparison of different simulation outcomes under a variety  of assumptions.


When users run simulations with different assumptions, codes and banks, they clearly end up willing to compare the results. For that purpose, they can use Mlflow to prepare tables and charts in an easy and interactive way to, for instance, see the outcome of maybe 20 simulations at once. This particular functionality, namely  to run and jointly report and assess a host of executions of a given model, proves especially relevant when considering a large set of alternative  assumptions. This is specifically required for the conduct of  so-called “reverse” stress tests.


The European Union regulatory agency for banks, the European Banking Authority, have published generic guidelines on stress-testing for banks. They recommend among other things that all banks include reverse stress testing scenarios in their stress testing programmes. Reverse stress tests are used to find out what can really go wrong in the bank. Searching for worst-case scenarios, reverse stress tests work out what event, or combination of events might lead to the failure of the bank at hand. Reverse stress tests can then identify any weaknesses and vulnerabilities in the individual bank, with a view to developing resolution and recovery plans and mitigation strategies.



In contrast, there is no commonly accepted  macroprudential approach to reverse stress testing for the whole banking sector. This prompted Dr Henry to investigate how a macroprudential Reverse Stress Test could in effect be defined and performed for a system of banks. His recent publications demonstrate how a platform such as STAMP€ IT is particularly suitable for computation-intensive applications such as Macroprudential Reverse Stress Testing.


Considering the whole system is key to adapting the concept from micro to macro. Simply combining individual banks’ results will provide system-wide results in regular stress tests such as EBA or ECB ones, but this is not feasible for reverse stress testing. The specific events that cause a particular bank to fail do not necessarily have the same effect on other banks. Identifying the sub-set of factors that are common to all, or nearly all banks, that could risk the whole system’s failure is a challenging exercise.


Conducting a reverse stress test on a macroprudential level for a system of banks involves complex processes and a suitable model, but given the model and the STAMP€ IT toolkit, the user can experiment with numerous scenarios. Scenarios are the keystone to any stress-test, but even more so for reverse stress testing as modellers are looking for scenarios that haven’t been considered before, but would endanger the whole system’s stability. Finding one or more of these failure scenarios is the end-objective hence the need to  embed a scenario search facility in whatever Macroprudential Reverse Stress Test framework.


Regular stress tests for the banking sector usually build on narrative-based scenarios that combine systemic risk assessment with a storyline. To design these narrative-based scenarios, first the risks are identified and the macroeconomic events that would trigger the scenario are defined. Next, the financial and economic shocks are calibrated  considering in particular their severity with respect to previous crises or history. Then the system responses to each of these shocks have to be specified. Finally a derived impact assessment model can be run to check whether the examined banking system  can withstand the envisaged adverse shocks or not.


Designing scenarios for reverse stress testing, however, involves some notable differences from this standard multi-step approach. The standard risk identification step is either only used to derive a shortlist of risk factors or skipped altogether. Likewise, the severity of the shock is not a key ex ante consideration as the impact of a very broad class of events need to be assessed. Then the scenario generation and impact analysis models have to be run for all suitable scenarios, compiling many results for a wide range of shocks, instead of the standard couple of scenarios. These simulation results are then screened to identify those causing the failure of the system – assuming a metric has been set to characterise such systemic failures.


While this process sounds fairly straightforward, it still raises a number of methodological issues. After reviewing the limited literature available, Dr Henry decided on a strategy that involves grouping the large set of relevant macro-financial risk factors into manageable subsets. In an example application, he classifies risk factors in three groups. For instance, investment and house prices are domestic factors, competitors’ prices and world demand external factors, while equity prices and interest rates are categorised as financial factors. A number of alternative shocks are then applied to each group of factors reflecting various selected probability assumptions for each of the shocks. This approach enables to run a parsimonious set of simulations that suffices to provide the material for completing a reverse stress test exercise.


Reverse stress testing helps identify the risks that are more likely to adversely affect the set of banks considered in the analysis. It also determines the more damaging combinations of factors, showing that  multiple risks can be relevant, beyond those derived from a standard risk identification process. Reverse stress test results can also help set priorities, indicating which risks should be mitigated first as well as highlighting which parts of the system regulators should be most concerned with.


Analysing the simulations run on the STAMP€ IT platform, it appears that as expected for such exercises, a variety of configurations across shock types can cause the system to collapse. Meaning that many banks may end up under serious stress with capital ratios below solvency requirements. These configurations are however shown to exhibit lower likelihood than that of scenarios underlying regulatory stress tests such as conducted by EBA. In addition, along the various risk factor dimensions, a domestic shock appears relatively more damaging for the hypothetical sample of banks analysed. Increasing the severity of domestic shocks has a faster and greater impact on the system than when similar changes in the probability level are applied to either the external or financial shocks. In other words, for a given shock likelihood, banks considered are then more at risk when facing domestic shocks.


While this is only an illustration, it shows that modellers can carry out reverse stress tests for systems of banks, and characterise which type of risks can more likely lead to the materialisation of a worst case scenario. Still, Dr. Henry points out that there are limitations that would have to be addressed to carry out a real-life system-wide reverse stress test. For example, the illustration had only three groups of factors. In reality, there would be many more. In addition, the shocks would probably be correlated, so a more sophisticated probabilistic approach would be required. A multivariate approach would be computationally heavy, but it would still be possible to run Monte Carlo simulations, provided the model doesn’t have too many variables – this is rarely the case for stress testing models though.


Building and using the STAMP€ IT platform has shown how to make both model management and model computations far more straightforward. It has opened up opportunities to carry out demanding simulations and proven its suitability for complex and computationally intensive tasks such as Reverse Stress Testing. Moving forward, further developments in the field can take inspiration from this work, considering for instance extensions of the IT set-up to a Cloud-based instance within the ECB Virtual Lab, as well as on the modelling side generalising the search for worst case scenarios in macro-financial stress testing.


That’s all for this episode – thanks for listening. Links to the original research can be found in the show notes for this episode. And, stay subscribed to Research Pod for more of the latest science.


See you again soon.

Leave a Reply

Your email address will not be published.

Researchpod Let's Talk

Share This

Copy Link to Clipboard