When it comes to predictions, artificial intelligence techniques are much more powerful than simple statistical approaches such as extrapolating simple trends. However, these methods initially only provide point predictions, while it is often desirable to also predict risks, i.e. improbable (and often adverse) possible events. Several approaches exist for this purpose, but they are either "appended" ex post to the actual AI procedure and/or are only applicable in special cases.

Here, an approach is presented with which AI procedures are generically able to "learn" risks directly. This is demonstrated with various American stock prices.

**Why predict risks?**

Why does it make sense at all to predict risks and not to content with point forecasts? This is simply because point predictions are either not possible or not practical in many situations. Instead, only probabilistic predictions can be made. Risks are thus omnipresent and the reasons for this are manifold.

Based on the fundamental principles of nature of quantum physics itself, it is impossible to make accurate predictions at the microscopic scale.

This is complicated by the fact that often very small initial lack of knowledge can lead to serious differences in the results; this is described by chaos theory and can be well observed in the example of the weather. In addition, highly complex systems are characterized by feedbacks. A particularly complex system is, for example, the stock market, in which the future expectations of many different players are reflected.

In this context, it is of enormous importance to quantify risks as accurately as possible, as they often have catastrophic effects. Examples include natural disasters (e.g. storms, earthquakes), economic crises, financial market turbulences (stock market crashes) or even traffic jams, operational risks, hacker attacks and terrorism.

**Advantages and shortcomings of AI**

Artificial intelligence methods are, of course, the contemporary means of choice for quantifying risks. After all, these are – in contrast to conventional statistics – very good methods for finding patterns in data. In particular, very precise point predictions are possible. Furthermore, these methods – if they are already trained – are very fast compared to Monte Carlo simulations and can partially replace them.

However, there is – in general and a priori – no automatic handling of risks, i.e. of different probable outcomes. Moreover, the nomenclatures and definitions of risks are inconsistent and there is often confusion between model risk, uncertainty and the inherent risk considered here; this will be discussed in more detail below.

**Excursion: Necessity of an appropriate nomenclature**

At this point, it is appropriate to address the different types of unknowns. This is important because both the concepts about them and the procedures for dealing with them differ fundamentally. In general, one can distinguish between

**Uncertainty:**These are "unknown unknowns", i.e. unknowns that can neither be quantified nor predicted by an AI.**Model risk:**This refers to the AI process itself. It decreases with better model and data and is subject to model validation. A term used in the literature in this context is also "epistemic uncertainty".**Inherent risk:**This risk is "real-world" and can be quantified, so it is "known unknowns." Better models provide better predictions; however, the risk itself does not decrease. This risk – in the literature also known as "aleatoric uncertainty" – is the subject of this post.

**Procedures used so far**

A variety of procedures have been designed to mitigate the problems described, some of which vary widely in application and complexity. The following is a brief – incomplete and subjective – list of the procedures.

** Self-Aware AI:** Especially in novel situations it is important to have a measure about uncertainty of the AI model. This method provides that measure based on Bayes statistics. However, it does not provide risk predictions in the strict sense and is located at the gray area between unknown unknowns and known unknowns.

** Restricted Boltzmann Machines:** These are neural networks with a special architecture that makes it possible to "learn" distributions. These can then be reproduced and corresponding forecasts made. Inherent risks can thus be predicted directly. However, the procedure is bound to the special architecture with its related peculiarities.

** Residual error analysis:** The predictions are compared with real events in backtesting and the deviations (residuals) are determined. Risks are thus considered as historical model errors. As a result, model and inherent risks are sometimes mixed. Moreover, this procedure is only performed ex post and is not part of the AI model.

** Probabilistic Forecasting:** Here, probability scores are compared with real probabilities and the model is calibrated if necessary. A special case of this is the calibration to binary possible outcomes, as is done, for example, in the case of default probabilities when determining credit ratings. However, distribution assumptions must be made for this; moreover, this procedure is also carried out only ex post and is not part of the AI model.

** Variational Autoencoders:** Here, a special neural network "learns" the essential characteristics of the data that are necessary to reproduce them. By varying these characteristics, probability distributions can be generated. However, distribution assumptions must be made for this as well. In addition, the procedure is bound to the special architecture with its peculiarities in this respect.

** Generative Adversarial Networks:** Here, one neural network analyzes the errors of another and "learns" to reproduce the possible result space. Here, model errors are basically also mixed with inherent risks. Moreover, probabilities are only "learned" indirectly. In addition, this procedure is also bound to the special architecture with its relevant peculiarities.

**Excursion: What are risks?**

In the following, the method mentioned at the beginning will be discussed, with which risks can be learned directly and independent of the model. As has become apparent, it is of enormous importance to have a practicable definition of what risks are in the first place. The following approach is accordingly based on the following definition used for this purpose:

**"Risk is when different outcomes are possible with the same information ( initial basis)".**

Thus, for example, one has a risk that it will be colder tomorrow if one only knows what the temperature is today. No matter how good the model is, if it only has this information, it will only be able to provide an inaccurate – risky – forecast. The risk is thus considered to be part of the available information.

**Chosen method: Probabilities as model parameters**

In the procedure for the direct and model-independent prediction of risks, probabilities or quantiles were considered as **direct** model parameters, i.e., as part of the data. This allows AI models to learn probabilities – and thus risks – directly. It does not matter whether the probabilities are defined as input or output data.

Thus, no novel models or procedures are required. Instead, risks are mapped into the structure of the training data as follows:

- Given identical (or made identical by rounding) input data and multiple possible output data, the quantile serves as an additional ordering parameter, i.e., an additional data field.
- The learning of the input-output relation follows as usual in classification or regression methods. The quantile (or the probability) is just an additional variable.
- The replication of the learned distribution takes place then also with new data and given probability.

**Example: American stocks**

The goal of this example was to predict the distribution of the returns for the following day for shares of different large American companies on the basis of only a few input data (returns of the past 5 days).

The daily stock prices of the 20 largest American companies between 2008 and 2020 were used as the data basis. Returns were calculated from these and five input variables and one output variable were formed for each data point by windowing, i.e. the returns of the following day were to be forecast on the basis of the returns of the last five days. So far the procedure corresponded to the usual procedure for a regression.

Subsequently, however, the input data sets were grouped into 10 clusters by k-means clustering. For each cluster, the input values were then replaced by the mean values of the respective cluster. This revealed that different output data were possible for identical input data, consistent with the risk definition above. Each cluster was now sorted based on the output data and the corresponding quantile was calculated as an additional – sixth – input variable.

This data set was then used to train a five-layer neural network with dropout layers and ADAM optimization over 30 cycles, according to common practice.

The resulting network was then able to predict a corresponding return of the following day based on given returns of the last five days and a quantile. The results were all plausible. An example over several quantiles is shown in the figure below. One can clearly see that with smaller quantiles the outliers become more extreme, as is also known from real distributions. **The AI model has thus learned to predict risks.**

*Figure 01: Returns of the following day for different quantiles with the input data [-0.8%, -0.7%, -0.1%, 0.2%, 0.8%] for the days [-1,…,-5]*

Moreover, the risk increased with the time horizon considered. Finally, "shaky" (volatile) inputs yielded to broader distributions, which is also to be expected for riskier stocks.

**Perspective**

It has been demonstrated that, after appropriate data preparation, AI methods are generically capable of directly predicting risks. In the future, this could open up opportunities for rapid and accurate risk prediction, for example in the financial sector. Conventional complex methods such as Monte Carlo simulations could be replaced and, after appropriate extension, correlations could also be detected in the multivariate case.

**Author Dr. Dimitrios Geromichalos**

Founder / CEO | RiskDataScience GmbH