It is not just risk managers who occasionally confuse correlation with causality, neglect variability and sampling error or multiply the probability of occurrence by the extent of loss and forget that this only applies to risks that have a particular distribution (for example a Bernoulli distribution). They might also apply a normal distribution to risks that are not normally distributed. When it comes to risk measures, the catalogue of misunderstandings and misinterpretations is long and varied. Occasionally, people forget that the Value at Risk measure so popular among risk managers has nothing at all to do with a "maximum" loss. The list of misconceptions and application errors goes on and on. But it is not just risk managers who should have a fundamental understanding of statistics. Everyone who reads a newspaper or listens to the news should be familiar with the basic tools and pitfalls of statistics. We spoke to Katharina Schüller, a statistics graduate, statistics expert at DRadio Wissen, a lecturer at various universities and recipient of the "Statistician of the Week" award from the American Statistical Association.
She highlighted to us the fact that statistics is an essential skill for classifying, evaluating and understanding the world in which we live, and for making decisions when faced with uncertainty.
Is statistical thinking in-built?
Katharina Schüller: Yes, definitely. Every day, each of us has to find our way in a complex world and to do that we have to process data and recognise patterns: Are the people around me friends or enemies? Will I make the train on time? Do I have enough money for my weekend shopping? That is one side of statistical thinking. The other side is a critical attitude that enables us to be aware that our day-to-day perception is not representative but highlights just a small section of reality. It appears so – but there could also be another explanation. We can adopt this critical attitude but it is very taxing and therefore we don't like to do it.
What do you think of the phrase "the only statistics you can trust are those you falsified yourself"?
Katharina Schüller: Every statistic is a summary of data and thus a compressed version of reality. Just as MP3 compression brings together the key elements of a piece of music so that it does not take up so much memory, statistics mean that we can get a quick overview without having to study interminable data tables. The mean is a good example here. But this compression means that some information is necessarily lost. Ideally, it will be the relatively unimportant information, but there is no guarantee of this. There is a great example where dentists use their median income to argue that they earn very little, while the health insurance providers take the mean value and say that dentists earn far too much. This is because the mean is pulled upwards by isolated extremely high values. Objectively, they have both put forward correct statistics, but who should we believe?
In the book "Statistics and Intuition", you listed a range of statistical misconceptions that you have come across in recent years. What is the most original or most absurd misinterpretation you can remember?
Katharina Schüller: One example that I used towards the end of my book occupied me for a long time, and that was the Edathy case. A very expert-sounding article in "Spiegel" was using Bayes' Law to "prove" that investigators are statistically on the safe side if they draw the conclusion that someone who looks at legal naked pictures will also possess child pornography. The Spiegel writer compares Edathy with O.J. Simpson ("Someone who has killed his wife has normally beaten her previously") and a bank robber ("Someone running out of the bank after a robbery is usually the robber"). But what they overlook here is that these conclusions are only valid if we actually look at all the possibilities (wife is killed or not, wife was beaten or not, husband is the murderer or not) and can evaluate the probability of their occurrence. In the Edathy case, this does not apply. Because the state prosecutor only acts if there is a justified suspicion, they cannot evaluate the pornography consumption of respectable citizens.
In other words, the judiciary does not take random samples. That may be regrettable for statistics fanatics but it is good sign for our constitutional state. I find this case so alarming because throwing around specialist vocabulary gives the impression that everything can be calculated. But in my role as an expert witness in criminal trials, I see the serious consequences this can have in individual cases. In murder cases, if the argument is that a DNA sample comes from the suspect with a probability of one in a billion, a calculation based on that data is generally sound. But the fact is frequently suppressed that the data is often not totally clean, because DNA is a natural material, because it has to be copied for analysis using the technique known as PCR (polymerase chain reaction) and because errors can occur during this process. And that brings us back to statistics expertise. Only clean data delivers clean results and therefore we have to take a critical look at every step in the process, from obtaining the data through to the final interpretation.
The Indian statistician C.R. Rao has said that certain knowledge results from a new kind of thinking based on a combination of uncertain knowledge and awareness of the extent of its uncertainty. What can be done to increase knowledge of the extent of uncertainty? How can we learn to "think statistically"?
Katharina Schüller: The major way to learn to think statistically is by asking questions. Where does the data come from? Does anyone have a vested interest in the data not being gathered objectively and representatively? Was there missing or unusable data and how was this handled? For example, it is common – but should be viewed very critically – for missing data to simply be ignored or replaced with the average value. In the first case, we are overlooking the fact that sometimes data being missing can have a particular significance, for example because people with very high or very low incomes are not so keen to respond in income surveys. In the second case, uncertainty is lost – in an extreme situation, 99 missing values would be replaced with a single available value. It only makes sense to look at the actual statistical methods once all questions regarding the data and its processing have been resolved. The uncertainty here is known as the modelling error and we could ask ourselves why someone has used this specific method to analyse the data and not an alternative. But this calls for greater specialist knowledge. In day-to-day life it is much more useful to not just read press releases, but to consult the underlying study wherever possible. At least the summary and the critical discussion at the end. If the result of a study sounds sensational, I normally tend to be very sceptical.
Predictive Analytics is one of the latest bandwagons in the world of big data. What potential do you believe huge data mountains offer? Where are the risks we should have on our radar?
Katharina Schüller: Data – sometimes called the "oil of the 21st century" – is flowing everywhere in huge quantities and costs next to nothing. Nevertheless, I think these data mountains are simultaneously under and over-valued. This data is under-valued because countless data generating systems produce massive volumes of bits and bytes. Most of it remains unused because many people appear to be satisfied with simply having the data. But it's not that simple. At the same time, the data is over-valued because the whole world is talking jubilantly about the "power of data". But hardly anyone is clear about where the boundaries of that power lie. After all, the majority of the data is not linked or organised. This means that it cannot be converted into knowledge, at least not without further efforts. Data is just the first step – the crude oil if you like – but we have to refine it and convert it into information – the fuel. This fuel has to be provided to filling stations and then put into cars – this is done by linking data to knowledge. Finally, we want to be able not just to take a test drive but to use the car continuously – this would give us the power to act. To do this, we have to apply the knowledge, in other words create an organisational (and where necessary legal) environment in which new business models based on data can be established. Perhaps we need new traffic rules to ensure mobility for everyone. This relates to the impact on civil society, and there are numerous open questions in this area. Who does the data actually belong to, and how do we protect the people who cannot defend themselves against misuse?
Recently someone was enthusiastically explaining that in the future everyone will be able to travel by underground free of charge if they look at advertisements on their phones. However, taking this to its natural conclusion it would mean that those people who cannot afford the wonderful things being advertised would have to walk. Is that what we want? Mastercard has patented a process by which they estimate customers' size and weight from credit card data and then sell this information to airlines. The airlines can then offer us individual prices – more expensive the heavier we are, or they can even refuse us a ticket altogether. Is that what we want? We need to consider questions like these now, not in the future. That's where I see the biggest dangers – we pay too much attention to technologies but fail to properly assess their consequences.
Occasionally, you meet risk managers who are convinced that a sheet of paper and a pencil are sufficient for managing risks. What relevance do statistics and quantitative management have in risk management for companies?
Katharina Schüller: Statistics and quantitative methods are very powerful instruments and, in my experience, they are often used far too little. For example, the rapidly developing area of Visual Analytics, where visualisation methods are used to represent data and relationships between that data, is something I think is extremely useful. It brings statistics to life and – especially when combined with simulations – enables us to immediately see what the impact of particular decisions will be. This is particularly useful because I continuously find that managers are much easier to convince if they can actually see something, rather than being presented with just the dry results of incomprehensible formulae. We have set up a very exciting project at Frankfurt Airport combining a classical statistical method – generalised regression analysis – with descriptive visualisations and a simulation tool. The aim was to answer the question of how aircraft can be optimally positioned so that passengers buy as much as possible at the airport. Ultimately, we were able to not only determine the incredible potential of targeted positioning, but also the extent to which specific restrictions and possible developments such as future exchange rate fluctuations affect the result.
You have four children. Do schools teach skills in basic statistics and mathematics to promote "statistical thinking"?
Katharina Schüller: Looking at my children, I think they do quite well. For example, I was really proud when my daughter Valentina was telling me about a radio programme. It was discussing the question of whether people had ever shown moral courage, and only a small percentage of people gave a positive answer. She said that the response was worthless, as long as we do not know what percentage had actually been in a situation that called for moral courage. This demonstrates that observation alone cannot be used to draw conclusions. In the same way I can't say whether my children learn statistical thinking at school or because we talk about it so much at home. I think that critical thinking and questioning are definitely taught in school, just not in connection with mathematics and statistics. There is always a "correct" answer, and that is precisely what we need to learn to question.
What skills should risk managers have in the context of statistics?
Katharina Schüller: Of course, a risk manager should be able to call on the tools you learn on any statistics course: What is a normal distribution, how do I calculate a variance, and so on. Ideally, they should also have a basic understanding of the latest methods in data mining: for example, decision trees or neural networks. However, I believe it is much more crucial for them to have an idea about what method is appropriate in what situation, and when it is not such a good idea to use a technique because it could lead to false conclusions. This depends on having a good understanding of the data, which means that a risk manager has to talk to the people who understand the data generating processes. How are the returns actually achieved on my new financial product? What markets influence each other and to what extent? What role does pricing play in my customers' willingness to buy my product? As strange as it might sound, I believe that a good risk manager must be skilled at what a good statistician should also be able to master – communication.
Katharina Schüller, born 1977 in Rosenheim (Bavaria). Studied psychology at the TU Dresden, studied statistics at the LMU Munich, doctorate at the TU Dortmund, scholarship holder at the Bavarian Elite Academy and the Nobel Prize committee in Lindau. In 2003, she founded the company STAT-UP Statistical Consulting & Data Science in Munich, which now has offices in Madrid and London and works for companies, research institutions and public authorities across Europe. She is publicly known for her regular radio and TV appearances, and for her academic and popular science publications.
During the American Statistical Association's International Year of Statistics, Katharina Schüller was presented as a "Statistician of the Week". She lectures at various universities and, as an expert in digitalisation and data analytics, is a member of the advisory councils of Deutsche Bank and Burda Forward. Her book "Statistics and Intuition: Everyday Examples Critically Evaluated" was published by Springer in January 2016.