Insurance companies venturing into digital insurance encounter typhoon volumes of data, challenging even the most adept to gain any business value. How can we weather the storms of data? More importantly, how can we turn the storms into information that delivers more revenue or saves expenses?
Historically, companies have established data stores and data warehouses to contain data for searching. These are proving to be increasingly costly to maintain, and they are unable to handle the large volumes and unstructured nature of media and Internet-related data.
Some companies have begun to use open-source Hadoop clusters to store and analyze the media and Internet-related data. A few companies have started to expand the use of these clusters into more operational areas with increasing success. Challenges to this expansion include security and technical skills. To address these challenges some firms offer these systems and their operations as a service, called Big Data Platform as a Service (BDPaaS).
Digital Insurance = Data Abundance
Digital insurance is marked by an abundance of data, from many sources. Primary sources of data include internal systems; Internet, social media, and the like associated with customers and prospects; and market data.
Insurance companies begin by using the data from their existing policy, commission, claims and finance systems. Too often, even this data has been fragmented within the different systems, yielding little added value.
Introducing digital channels increases the amount of digitized data coming from customers. Beginning with online sales, companies obtain more of the customer sales related and policy initiation data in digital format. Allowing customer and agency self service increases this digital data while simultaneously increasing the efficiency of the company by decreasing the amount of paper handled. Most insurance companies in Asia have already begun the journey in these areas.
Claims processing currently offers a great opportunity to improve the use of digital channels for Asian insurance companies. The volume of paper handling and scanned documents means much of this data is separated into document management systems that are not well integrated into other applications. The goal of straight-through claims processing will result in increased digital data volumes, and also provide opportunities for decreasing claims processing costs and fraud.
With digital insurance comes the flood of associated data, such as:
• How someone actually uses the company’s website and reacts to online advertising
• How far along in a sales process a potential customer progresses
• What someone chooses to eat, how much the person exercises or how healthy the person is (from personal fitness devices, phones, watches, and other wearables)
• How a customer drives (from usage based insurance systems)
• What is happening in the life of a customer or identified prospect (from Facebook, other social media, etc.).
Much of this data is still unavailable and unused by Asian insurance companies, despite the value it can potentially deliver to the business.
With the Internet comes an abundance of data on how different people think, feel, buy and interact with companies and each other over electronic media. This data can yield insights for product development, pricing, means for approaching certain customer types, etc. And combining this data with the internal and external data can provide insights into existing customers to drive better customer service and increased sales to those customers. Too often today, this data is processed separately from the other data and the additional insights cannot be attained.
Converting Data to Information
So what does it take to obtain these insights from data to add value to the business, for sales as well as other processes such as customer service?
Insurance companies have been using internal data for many years, primarily by querying it through each of the deployed application systems (policy admin system, claims, document management system, customer relationship management system, etc.). Historically the data has been in different databases associated with particular processes, and even ascribing related data for the same customer across the various systems has been difficult.
To bring the data together from the diverse systems and gain a common view for business reporting, insurance companies have been building data stores and data warehouses. Additionally, some insurance companies have recently started performing complex analytics on Internet related data using “Big Data” capabilities, but this data generally is not connected to the enterprise database. Very few companies have moved to an integrated use of the various kinds of data – and yet this could be the best way to derive the value that companies desire.
To store and process the data from the various sources companies have used evolving technologies, as depicted in Figure 1.
Companies began with operational data stores (ODS), usually with SQL databases, where data from the operational application(s) could be transferred and stored with minimal transformation. This data could be used for building operational reports, originally printed reports that now have become tabular and sometimes visual and available in electronic format. This approach tends to have some critical drawbacks:
• Often tied directly to one or more underlying application systems, but with little linking of the data
• Limited cleansing and processing of the data
• Limited retention of historical data, especially detailed data.
Data warehouse/data mart using SQL database
A number of years ago companies moved to combining the data from various applications or ODS into a single data warehouse (usually with data marts per major department). This data mart was generally structured and built with some form of SQL database. Data was cleansed before it was placed into the data warehouse, usually as part of an ETL process (Extract, Transform, Load). Although originally using attached hard disks or network storage, this led to data storage appliances that offered faster access and extra tools for managing the data.
This approach has worked reasonably well. Companies are able to query related data from different systems, and obtain tabular and visual reports across the enterprise (generally using reporting and analytical tools that interact with the data in the database).
But this approach also has some serious limitations:
• Need for constant revision as underlying systems change along with changing business requirements – intensive and costly to maintain
• Inappropriate for unstructured data such as media and much of the data from the Internet
• Hard to scale network storage systems for very large data volumes
• Expensive to use data appliances for very large data volumes such as that from the Internet.
Open source Hadoop cluster
An approach that has been growing over the last six years is storing data on groups, or “clusters”, of low cost processors and storage using open source Hadoop software to manage the distributed data. These open source Hadoop clusters have been particularly effective for storing the vast and generally unstructured data from social media and the Internet.
Concerns about data reliability, integrity and security have been addressed to varying levels of detail, and more is being developed. Data integrity and reliability are achieved by replicating each element across three different processor storage environments. Data security is delivered with some open source tools, at least at the access authorization level, and increasingly by additional tools to secure down to the element level by user account.
Most early implementations of Hadoop clusters have been in corporate data centers, either done by the organizations or using third party providers. Organizations have started using cloud computing for Hadoop clusters, especially as uses have extended beyond individual departments and into the broader enterprise. Recently companies have started using cloud providers and services for their Hadoop clusters, in the same way that they would use the cloud for other infrastructure and application services.
Newer implementations also generally use “schema on query” rather than “schema on write”, to gain more flexibility and derive value faster. Schema on write means applying the structure to the data when it is written to the database, which is most commonly used for SQL databases. Schema on read means applying the structure and ensuring quality when the data is extracted from the storage for use in analytics, reports, and applications. In both cases a company needs to adopt a common data model that forms the basis for the schema.
Hadoop Clusters for Multiple Business Purposes
To date Hadoop clusters have primarily been used for Internet related data, especially for analytical purposes by data scientists. By combining the low cost storage capability of Hadoop clusters with new analytical tools such as Tableau organizations have managed to gain massive insights into Internet users, as well as the market in general. Insights have driven higher sales through better product marketing to successfully positioning political campaigns.
Recently some insurance companies have started to venture into additional operational uses, with excellent results:
• A large U.S. life insurer combined the data from over 70 systems across the U.S. onto a Hadoop cluster using a NoSQL database, and linked it to their CRM system so that the customer service representatives could have access to all the policies, claims data, etc. for each customer. They were able to build the system and start deriving value within six months. This has proven especially effective because they found that they were unable to script the service interaction for life insurance customers, and having all the data available gave the service agents the information to better address customers’ queries.
• A mid-tier UK general insurance company decided to forego a central SQL data store and simply jump to using Hadoop for the data from their various application systems. The company paid for the development within six months with increased sales, and saved over BPS 5 million over eighteen months on claims fraud.
• A U.S. auto insurer decided to use Hadoop to handle the storage and analytics associated with telemetry data from their UBI auto insurance customers. This has allowed them to store more detailed data than they otherwise would, and to use that data to assess customer use and revise the products.
Issues Encountered with Hadoop
Hadoop clusters offer considerable potential, but also come with their own issues to overcome. Notably, this is still developing technology, so the capabilities are constantly improving and people who understand it well are still scarce. Interestingly, a recent industry survey found that many companies are overcoming the skills issue by training their own staff to use the technology, in addition to hiring and using service providers.
Security is a major concern of companies considering Hadoop clusters, and thus may inhibit their using the technology for enterprise data. Increasingly the tools are available to securely manage and access even the highest level of security on Hadoop clusters. For instance, CSC has now taken security technology developed for the intelligence community and turned it into a highly secure solution for the commercial community. This tool, called Stronghold, wraps around the entire big data system to enable companies to assign detailed rules to every data object to determine who can view, manage, and modify it.
BDPaaS as Option, even for Insurance Data Warehouse
Where knowledge and availability are limited or already applied to other opportunities, companies such as CSC provide Big Data Platform as a Service (BDPaaS) solutions. Companies can install BDPaaS solutions in their own data centres, in private clouds, or in public clouds such as those from AWS and Azure. CSC can ensure the security across all the environments, and has even been PCI and HIPAA accredited for financial and health institutions.
Companies using BDPaaS leverage the technology platforms, skills, and analytical expertise of the supplier. They also can leverage the cloud and use a mix of public and private infrastructure depending on the data. In addition, they gain from a usage based commercial model. BDPaaS works well for a company that wants to start with targeted uses, or as a means to grow their own capability while gaining value quickly.
Hadoop/BDPaaS as Data Warehouse Alternative
Many insurance companies in Asia are now implementing or considering implementing data warehouses using SQL data stores. The question they should ask themselves is whether this is the right approach. Building and maintaining SQL data bases even on data appliances is costly for large volumes of data, and takes a long time to implement, normally close to a year. During this period the company gains no value from the data warehouse.
So why not consider an approach where at least some value can be gained within a matter of months? Hadoop clusters, possibly through BDPaaS, provide this option. And combined with new visualization and analytical tools that can extract from multiple sources, companies can start analysis against internal and external data from the Hadoop cluster, as well as any existing data stores or databases. Just some examples of the benefits from this combination include:
• Low cost storage solution
• Fast delivery of business value, often less than six months
• Rapid, phased implementation to get to overall capability
• Easily modified to accommodate business and technical changes
Asian companies have leapfrogged in other technology areas, why not data management and analytics as well?
Dr. Michael Kelly is a Consulting Partner with CSC, based in Singapore.