August 1, 2018
By Charl Lategan, Head of Consulting. Qarar
Tired of hearing buzz words like big data, data lakes, hadoop, machine learning, artificial intelligence and blockchain? There seems to be a myriad of definitions and explanations available, it’s therefore important to have a clear understanding of these topics. So let’s focus on one of them, called Big Data.
Big data has actually been around for many years. But until recently its meaning, use and value have remained fuzzy. Many explanations emerge on the topic. First, we have to acknowledge that data availability has increased exponentially over the last 15 years and with that the necessity for deriving intelligence from it is moving at an equal and even faster pace. According to Gartner IT Glossary (2014) big data refers to high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. The big data term is used to describe large data pieces, almost too large to capture, manage, store or analyse using traditional tools.
Complexity and variety of data sources now available include social media, email, sensor data, business apps, archives and documents, and the speed required for retrieval and analysis, demand fresh new approaches. The word data lakes also pop up in this genre. Data lakes store both structured and unstructured data, such as social media, email and text documents. It can then be used far more rapidly and flexibly. The data lake process allows users to run ad hoc queries, perform cross-source navigation, and make analytical decisions, all based on real-time information.
One of giants of the technology world, IBM, defines the ‘four V’s’ behind the big data mechanism. The first, volume refers to the quantity of data in massive amount of records. Velocity refers to the accelerating speed at which data is being generated today, for example think about fitness wearable’s like the i-watch transmitting real time health data 24/7 without any user intervention. Variety refers to the increasing diversity in the type and sources of data requiring management and analysis, for example think about the traditional text fields morphing into voice recognition nowadays, visual’s captured by high resolution cameras on mobile phones and shared over various social platforms. Lastly, veracity refers to the reliability of data and is often the most neglected out of the four but is probably the most important.
Big data has also become relevant in risk management and has been identified as a critical enabler of the future. Big data combined with analytics have already been identified as a key component to the future of risk management. According to McKinsey & Company (2015) there are five key trends that are expected to significantly change the future of financial risk management and require big data driven decisions.
Regulatory environment requires more data supported evidence, for example IFRS9 requires a much higher degree of analytics in quantifying expected credit losses, including macro-economic forward looking projections. The Use of technology and mathematics in risk management enables automated decisions using quantified, auditable decision frameworks. Changing customer expectations requires data driven actions that appeals to specific needs and wants. Customers demand unbiased, transparent decision making. Emerging risk types in the form of unbanked and informal income sectors requires the use of unstructured data.
Specific ways that Big data have already improved risk management practice include more predictive power and stability in risk models. Emperical research examples provide proof that more data and diversity of that data can improve the predictive power but more importantly make provisions for population shifts and improving the longevity of your risk models. More extensive coverage of real-time risk intelligence via optimisation tools and machine learning provide the ability to respond quickly to changing macro and micro market conditions. This is an improvement in monitoring risk and reducing the noise from too much data but little business value. Strengthened evidence based decisions across a wider population has lead to growth opportunity and financial inclusion. All of the above are leading to significant cost savings in risk management over the long term.
Adopting Big data in an organization also comes with its challenges, and its therefore critical for organisations to plan with a long term mind-set. These main challenges that have been identified include: Massive data storage and organizing due to the fact that many organizations need to keep historic data for many years for trend prediction and other complex analytics. More diverse analytical skills needed in dealing with various data types since most of the big data today is generated from different sources and thus unstructured. Optimizing the usage of long and short term memory storage. Explaining how decisions are made versus black box designs, a key challenge of machine learning is to explain to business how decisions are made when the framework is dynamically adjusted. Answering simple hypothetical questions using gross amounts of data can be overwhelming.