Big Data and the world of algorithms and analyses

Big data is in fashion. Especially the economy has been calling for greater implementation of new methods of analysis for years. Some people are convinced that they can maintain an overview of everything and make decisions with foresight at all times. But caution is advisable, because apart from the alleged opportunities that the big data world offers there are also risks lurking. gis.Business spoke about both sides of the coin with two experts who should know: Ernest McCutcheon of DDS in Karlsruhe and Frank Romeike of the RiskNET competence portal.

Mr. McCutcheon, location intelligence and therefore the topic of geospatial data is part of your core business. Can you give our readers a brief insight into what precisely you do in this area and what added value you expect for your company?

E. McCutcheon: Location intelligence and space-related data are not just part of my core business but are my entire core business. Over 20 years ago my company, DDS Digital Data Services GmbH, specialised in the field of spatial data. Our main focus of activity is specifically the use of such data in connection with customer data in order to optimise their business processes. Meanwhile, we call this "location intelligence". The data required differs greatly depending on the customer and the application. We support and advise our customers in the selection, licencing and integration of this data or now also cloud-based data services.

Our customers themselves are frequently solution providers who in turn are looking for supplementary data (or services) for the users of their solutions. If the customer also needs software or tools for his or her application we can support him or her in this. We offer PTV (xServer), Pitney Bowes (MapInfo, Spectrum Spa- tial), Integeo (MapIntelligence), Microsoft (Bing Maps) und Here (Here Location Platform) components and/or services.

Our added value lies in our long-term experience with data in business applications such as for location planning and analysis, marketing optimisation, customer analysis and segmentation.

Mr. Romeike, wherever there are opportunities and benefits there are also risks. Where do you see the critical moments of a comprehensive data analysis and evaluation?

F. Romeike: The modern oracles of our digital and networked times are called big data, data analyses and predictive analytics. But a large number of self-appointed and dreamy big data apologists are familiar with neither the fundamental numeric algorithms to solve linear equation systems nor linear optimisation processes, dimension reduction or interpolation. They naively see only the commercial opportunities of this beautiful new data world.

Can you be more precise?

F. Romeike: One of the biggest risks from big data consists in the fact that coincidence (correlation) is confused with causality. It is possible to calculate an almost perfect (positive) correlation between employees on campsites and the quantity of cucumbers harvested outdoors or the number of McDonald’s locations in Germany and the wind energy output installed in Germany. But is there any causality here? In the dreamy pro big data propaganda world we would be told that this doesn’t matter. Mathematicians, on the other hand, describe this kind of thing as a spurious relationship.

Above all, big data carries the risk of extremely bad decisions since we no longer critically question legitimacy. Newton’s idea of the law of gravity didn’t come to him because he incessantly let apples fall from trees. In other words, bits and bytes must be joined by the ability to not only evaluate the accruing data but also to interpret it. And many experts fail precisely here in practice because the fact that a pattern exists assumes that this originated in the past.

This in turn does not inevitably mean that a conclusion on the basis of this pattern is also valid for the future – a topic which we will discuss extensively at the upcoming RiskNET summit at the end of October 2017.

I see further risks in the fact that people are reduced to data trails. We need a social and constructive debate about data protection, ethics and informational "self-determination". Do we want to surrender to a dictatorship of data and live in a world in which big data knows more about our risks and our past, present and future than we ourselves can remember? These drawbacks of big data should lead to transparent and binding rules and a broad discussion on the dangers of unfettered data technologies and the increasingly uncontrolled power of rulers over data.

Let us dwell for a moment on the negative consequences: Data analyses are not always per se the universal remedy, as the latest cases of misinterpretations of the election in the USA and the Brexit show. Are we not falling here into the trap of a false causality with data evaluations?

E. McCutcheon: Since this discussion is about big data I will try to go into some detail on the growing role of big data with such events. I need to go further back first of all to do so. The huge increase in mobile phones has led to the situation that by 2016 less than half of all households in the USA have a landline phone. Since phoning mobile phones in surveys is prohibited (in the USA, mobile phone users also have to pay for incoming calls), it is becoming increasingly difficult to form a really representative group. Life for the collectors of such survey data is becoming ever more difficult and in the long term they will have to adapt their methods if they want to give realistic forecasts again in the future. One important source of big data is social media platforms such as Facebook and Twitter. The election of Donald Trump has sparked the discussion about the role of fake news and alternative facts which are spread via such media. Studies show that it is possible to influence the behaviour of "connected influencers" and also their voting behaviour with the targeted transmission of information.

Of course, both phenomena are connected. Since the majority of the users of such platforms are active via their mobile phone, geospatial data is often also involved– whether directly via the phone’s GPS or indirectly via a wireless LAN MAC or IP address. This means it is possible to detect regional trends and to deploy targeted local measures based on this.

A classical data processing saying is: "garbage in, garbage out". With big data this would rather be: "lots of garbage in, lots of garbage out". Data scientists therefore have to ensure that they possess a good understanding of the input data, otherwise results are produced which have nothing to do with reality. Since many decisions can have negative effects on people, for example in the insurance and financial sectors, caution is definitely required.

F. Romeike: I can totally agree with the statement about "GIGO" (Garbage In, Garbage Out). But in the specific cases of the Brexit and the US election I don’t really see the risk of a spurious relationship, in other words pseudo-correlations, and also not "GIGO". Here it is more important to critically appraise the method of voter targeting. Based on big data analytics, data which people leave on Facebook or other social networks and blogs, for example, can be analysed to produce a psychological profile and in a further step to supply voters with precisely targeted election messages and fake news. The Cambridge Analytica company carried out such analyses both for the last US election for Trump and for Brexit supporters (Leave.eu campaign). To what extent the big data experts actually influenced the election is a matter of great controversy amongst experts. In retrospect, Cambridge Analytica also denied influencing the Brexit vote after the British data protection officer initiated an investigation into events.
However, it is a fact that such voter manipulation is principally possible with the help of big data analytics and fake news and we need to conduct a critical discourse about it.

How can such undesirable developments in the big data environment be prevented?

F. Romeike: If the interconnections and hypotheses are not understood, the patterns and correlations of big data broadly remain random. We should beware of immediately identifying causality in every statistical correlation. Based on Kant’s "Critique of Judgement", both determinative and reflective judgement exist. Determining judgement subsumes something specific under a given law or rule whilst reflecting judgement should find the general to the given specific. Transferred to the world of big data and predictive analytics, this means that we must link the massive flood of data with theories and laws. Remember Newton, who has already been quoted.

And even more important: as people and as a society we must concern ourselves with the question of how much (supposed) security and foreseeability on the one hand and freedom and risk on the other hand is desired. Do we actually want such an acceleration in the process of human cognition – without emotional and social intelligence?

E. McCutcheon: In my last answer I mentioned caution. However, caution is not to be equated with prohibition. Decisions which have positive effects can be made through the use of big data analyses. A few examples: Real-time traffic information and traffic forecasts are constantly improving through the analysis of mobility data. Energy providers can ensure that electricity production starts up when electricity is needed. Foodstuffs companies can ensure that the correct quantity of fresh products is available in shops at the right time and that less needs to be disposed of. We are offered products and services which really interest us. All in all, I believe the quality of life for us individually but also that of society can generally be increased by the use of big data.

However, I think that the risks of misuse as well as possible error sources should be pointed out in training for data scientists. For example, they should receive assistance to avoid so-called closed feedback in the design of their analyses. In studies or analyses, the formation of control groups is often forgotten.

In addition, one should not do everything which is possible, especially if it concerns human livelihoods. Special caution is necessary und here statutory bans are certainly conceivable. The greater part of big data analyses, however, does not deal with such topics. Nature alone offers many opportunities to perform big data analyses. The findings gained can be used to improve the environment and to increase yields in the agricultural sector whilst at the same time reducing the amount of artificial fertiliser used.

Does this mean that there is a lack of methodical approaches in organisations to achieve the orderly handling of the flood of data?

E. McCutcheon: That’s one way of looking at it, but I regard the situation in a more nuanced way. On the one hand there are or- ganisations which are already very far advanced. In most organi- sations, however, it is rather a matter of understanding the new possibilities and challenges and developing methodical approaches. In my opinion, big data is still not yet a big thing for most com- panies today. Many organisations often equate "lots of data" with big data. But in the general definition of big data there is talk of the "5 Vs". I think the most important of these "Vs" is Velocity (fast pace). This means that the data changes greatly within short periods of time.

However, since we are facing a veritable flood of genuine big data in the coming years and an increasing number of organisa- tions will be affected, now is a good time to devote some time to this topic. Ever more sensors are being built into ever more devices and these sensors are connected to "home" via the Internet. Apart from purely methodical approaches, data protection (especially the protection of personal data) and data storage will have to be addressed. The challenge for organisations will be to find suitable people who can understand the concepts, then design the processes and set up the necessary checks. Such people are currently rather thin on the ground.

F. Romeike: Yes, competences in these topics are not very pronounced. But it is not only methodical approaches which are missing. I would go a step further. If data is to be the currency of the 21st century, companies must ask themselves how they can use the knowledge latent in the data as a competitive advantage. How, for example, can an insurance company or a bank use customer data to prevail against the new competitors from the world of data (Google, Facebook, Amazon & Co.)? Even if big data is both a blessing and a curse, many companies which do not set themselves up strategically will fall by the wayside in competition with the born globals and data leeches.

Companies first need a strategy for big data before they think about how they can analyse and profitably use the data. When creating the strategy, companies should include the following points: linking of the business strategy to the big data strategy, data governance, data culture, management of big data and methods and tools. Companies should then publish a big data policy to make transparent for customers, employees and the public in what form data is collected and what happens to it.

Is it not rather up to the economy and science to further extend the necessary knowledge for handling digitalisation and big data and to produce less empty phrases?

F. Romeike: Constructive critics don’t have it easy with a euphoric hype object. But in both the economy and in science there are many critics who view the new calculation of the world with the help of big data critically. Big data protagonists pronounce the "end of theory" and on the other side of the critical spectrum it is pointed out that we should reflect on the theories, laws and history which have led to the world in which we live today. In my view, a decisive step is missing: We must transport the discussions which are taking place in science into society.

Critical journalists and politicians play a decisive role here. In addition, we need greater transparency for the algorithms. And we should all beware of elevating big data to the status of an all-purpose weapon. In addition, I would donate a copy of the film "Democracy – Im Rausch der Daten" to all pupils and students and all citizens so that we all learn more about the current state of democracy and the fighting of windmills in the digital world.

E. McCutcheon: Of course it is up to the economy and science, but rapid development in this and other areas is partly overwhelming the entire system. The economy cannot invest in every new idea and technology and not every trend contributes to entrepreneurial success. It is the same story with science. The term data scientist has not been around in Germany for long and some universities are only just setting up degree courses in data science. Yes, in my opinion it is initially a matter of a lot of empty phrases. Many readers are familiar with the Gartner Group’s hype curve; many big data topics can still be found there in the areas of Innovation Trigger and Peak of Inflated Expectations.

The "InGeoForum" has held a number of events on the topic of big data and it can also be seen here that ideas and approaches are there but actual implementation has not yet taken place. In future, the "InGeoForum" will focus on the possibilities of big data for the geo sector itself and less on implemented customer projects, as there is simply too little of this in Germany.

At the DDS Data Days in Heidelberg we will offer an impulse lecture on this topic entitled "Geo intelligence – how big data can be turned into local success" followed by a discussion forum on "Geomarketing approaches with big data: the current state of affairs". Put casually, even if people are only now starting to examine the topic of big data they have not yet missed the boat. It is still swimming at the quayside and it is still possible to embark.

Looking ahead: What developments will big data methods make in the coming years and how can companies profit from it in future?

F. Romeike: There can be no digital economy without trust. Before companies can even start to make money out of big data analytics they should first create transparent and clear rules so that people know who is doing what with their data. Some day, people will understand that with every click and every download, every Facebook entry and every book they order and every credit card payment online they leave behind a digital fingerprint and become ever more transparent. Are they happy for this data to be concentrated in the hands of just a few companies which results in unbelievable informational power? Especially in the Internet of Everything we need a clear legal framework, because the right to a private sphere is not obsolescent. I am convinced that a new data awareness will form in the next few years, for we are only slowly beginning to understand the monitoring practices of the data collectors. One person’s big data is another’s stolen goods. Do we really want artificial intelligence to replace humans? Do we really want an uncontrollable surveillance machinery to accompany us at every turn? Big data, nudging, behaviour control: Do we want our lives and thoughts to be defined by algorithms? Do we want to communicate with things and amalgamate with them into one single super-organism? Do we want criminals, terrorists or extremists to bring the digital wand of big data under their control? We are at the digital crossroads and should remember Immanuel Kant: "Tutelage is man’s inability to make use of his understanding without direction from another."

E. McCutcheon: I have already given a number of answers to this question in my previous answers. But first I would say that the question is not quite formulated correctly in my view. I don’t believe that exciting developments will be found with those methods. In the coming years it will much rather be the sheer quantity and availability of "genuine" big data and the application of existing methods from which companies and organisations can profit.

As soon as products with built-in sensors have achieved a significant prevalence – this also includes autonomous vehicles –, we could be talking less about a flood of data and rather about a tidal wave. It will then be really interesting to see what insights can be gained from the data and where all this data is stored. And precisely that is the challenge for companies and organisations.

Based on this data, business processes and decision processes can be optimised and even automated. For example, orders for the carrying out of maintenance work and repairs can be placed before the customer even knows that they are necessary. The right replacement parts are already ordered and are available in the right service centre. Resources and expense will be saved. Insofar as customer expectations do not exceed the actual possibilities, customer satisfaction will also increase.

In order to profit from this it will be important to take account of people’s fears. Through the use of big data methods, people can very quickly get the impression that they are being constantly spied on in all circumstances. Today, it is already the case that when big data is mentioned many people think spontaneously of the NSA. If this is not dealt with at an early stage it could cause negative developments (prohibitions and checks) which in turn can have an effect on the possibilities. This must be avoided right from the start.

Thank you very much for the Interview!

Our interviewees

Following his degree in Economics at the University of North Carolina in Chapel Hill, USA, Ernest McCutcheon came to Germany in 1981 and completed an advanced degree course at Düsseldorf University.

Ernest McCutcheon

Following his degree in Economics at the University of North Carolina in Chapel Hill, USA, Ernest McCutcheon came to Germany in 1981 and completed an advanced degree course at Düsseldorf University. In 1982 he took a job with Kaypro Computervertriebs GmbH and soon took over the sales manage ment. After his move to the subsequent DAT Informationssysteme AG in 1986, Ernest McCutcheon was head of marketing and international sales before he was granted the leadership of the Technical Applications office in 1990. As the distributor of the MapInfo GIS software, DAT was one of the pioneers in this sector. Based on this experience, Ernest McCutcheon founded Desktop Data Services in 1993, which became DDS Digital Data Services GmbH after a joint venture with the PTV Group. Today, DDS is one of the leading providers of geospatial data and software for desktop mapping applications. In addition, Ernest McCutcheon was appointed to the board of directors of Map&Guide GmbH in January 2001 to 2007 and today is regarded as an expert for geodata with more than 20 years' market experience in a wide variety of geoinformation sectors.

Frank Romeike

He is the founder, manager and owner of the RiskNET GmbH competence centre – The Risk Management Network. Internationally, he is regarded as one of the most renowned and leading experts for risk and opportunity management. In his professional past he was the chief risk officer at IBM Central Europe, where he was involved in the introduction of IBM’s global risk management process and led several international projects. He completed an economics degree (with a main focus on insurance mathematics) in Cologne and Norwich/UK. Afterwards, he studied political science, psychology and philosophy. In addition, he has completed an executive Master’s degree in the field of risk and compliance management.

[Source: gis.Business 5/2017, p. 34-42 | We would like to thank the gis.Business editorial team for the approval of a publication on RiskNET]

Quo vadis Big Data?

Big Data and the world of algorithms and analyses