AY1516 T2 Team Hew - Overview/Interim Review

From Analytics Practicum
Jump to navigation Jump to search
Proposal Interim Review Final


Revised Project Background

Initially, it was mentioned in the proposal that Tokio Marine Life Insurance Singapore was to provide us with their data for analysis. However due to unforeseen circumstances, they were unable to extract and anonymize the data in time. After discussing with our Project Sponsor (Benito Mable), we will be focusing on another dataset which was supplied by TMI as of end January 2016. Objectives were subsequently revised with our Project Sponsor to reflect the different nature of this new dataset.

Revised Motivation

Tokio Marine’s Group Companies (GCs) collect a lot of data required for underwriting products only at the time of sale. Over time, many data points have been captured with little insights derived other than for underwriting purposes. This data is stored on multiple platforms. While some customers have multiple products, it currently is limited in the utilization of data captured to really understand the profile of the customers, what they bought, channel preference, etc.

Tokio Marine’s Asian GCs have been participating in a large-scale regional project, in which the various GCs are undergoing a phase of digital transformation and to stay updated with current technologies. Tokio Marine Asia seeks to convince staff and implement the usage of analytics among the GCs. This project serves as one of the pilot initiatives, with one Asian GC participating - Tokio Marine Insurance Indonesia (TMI). This project aims to use the insights gathered to formulate new marketing initiatives or product ideas.



Data

There will be 2 original datasets provided by Tokio Marine Insurance Indonesia (TMI). The first dataset (“motor_policy30”) contains about 2 million motor insurance policy transaction records, and the second (“motor_claim7_combined”) consists of about 600,000 motor insurance claims transaction records. Both datasets span from 2003 to 2015, and are specific to customers residing in Indonesia only.

The first dataset (“motor_policy30”) has about 152 variables, and the second (“motor_claim7_combined”) has 66, all of which will be included in Appendix A of the Interim Report for reference. Full disclosure of the dataset will only be available to parties which have signed a Nondisclosure Agreement.

The data is of a transactional nature, where it follows a hierarchy such that in the “motor_policy30” dataset, each Policy can have one or more Risk_NO. These in turn correspond to the “motor_claim7_combined” dataset, where each Risk_NO has none or many Claim_NO, where for each there are at least one Transaction_NO. Policies which did not have any claim made under its tenure will not be recorded in the “motor_claim7_combined” dataset.

Example Data:

Example data


Each Risk_NO represents a vehicle insured under the same Policy number, which can be referenced to a particular customer. However, the dataset does not include any customer details.



Claims Process

The multiple Transaction_NO values for each Claim_NO serves to record the transactions corresponding to various steps in the claims-making process.

Claims process.JPG


The columns of ClaimOS, ClaimPaid and ClaimInc emulate the relationship of that of a balance sheet, and can be represented by the equation below:

ClaimPaid + ClaimOS = ClaimInc


Each row of these columns show the transactions applied, but does NOT give the current value of that column. To derive the actual amount of the total ClaimPaid, we can simply sum up the transactions, or use the value in the very last transaction (shown by the cell highlighted in green). This is best illustrated by the example below:

Claims transaction.JPG


Transaction_NO = 1 corresponds to Step 3 in the Claims Process chart, where the ClaimOS is an estimate of the ClaimPaid recorded when Tokio Marine staff have assessed the damage. ClaimInc is subsequently 500,000 as calculated by the equation given. Transaction_NO = 2 corresponds to an adjustment in the estimated figure, to more accurately reflect the actual damage. Transaction_NO = 3 and 4 are the transactions recording the customer receiving the claim amount from Tokio Marine.


Revised Objectives

  1. Motor Insurance overall profitability and profitability by brand
  2. Motor Insurance and claims trends
  3. Analyse characteristics of Top Agents by Loss Ratio and profitability
  4. Marketing recommendations to improve business performance based on our findings
  5. Time series forecasting of profitability (which will only be discussed more fully in the final report)
  6. Other predictive models (e.g. Multiple Linear Regression), as a bonus



Scope of Work
Due to the sheer number of records in the dataset, we will be restricting our scope to only focus on policies which were underwritten starting 2012 to 2015. Product lines like Shariah/Takaful Islamic motor insurance were discontinued in 2013 so their data will be excluded from the analysis

Research & Methodology

Review of Similar Work

Typically, Affinity Analysis has been applied to consumer products like groceries and thus there exists little literature on this. However recently, Affinity Analysis has been used to analyze different customer segments and this has helped in supporting their decisions to target certain demographic groups[1].

Kamakura’s[2] work on Market Basket Analysis and Path Analysis is valuable. He demonstrates that Path Analysis can be more useful than Market Basket Analysis in some cases as the sequence of purchase of products is more insightful than the static final basket of goods. Path Analysis can clearly illuminate whether goods are substitutes of each other or complements which provided decision support for cross-selling and bundling of grocery products.

Survival Analysis[3] models factors or variables that affects the time to an event. Two common methods are used to investigate the survival analysis of a model. First, Nonparametric methods provide simple and quick looks at the survival experience. Second, the Cox Proportional Hazards Regression Model relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity of time. It remains the dominant analysis method for Survival Analysis.


Methodology

Many methodologies are discussed as this group is still in the initial stage of scoping the project. It is estimated that only 1 or 2 objectives will be taken for the project eventually.

Objective 1 (Develop a database analysis to formulate a demographic and psychographic profile of customers):

For this objective, we will investigate how the demographics of the customers impact the psychographic/behavioral profile of the customers and investigate the correlation using explorative data analysis and clustering techniques.

Some of the demographics we will investigate are: Some of the psychographic/behavioral profile we will investigate are:
  • Age
  • Gender
  • Nationality
  • Profession
  • Pay
  • Marital Status
  • Channel purchased
  • Policy purchased
  • Subsequent Purchase
  • Profitability of customer
  • Policy payment default rate
  • Likelihood of recommending policies to others


We can explore demographic segmentation of different nationalities to see if it plays an important role in the customer behaviour. An example is to investigate if Japanese customers are more likely to make subsequent purchase of policies in Tokio Marine than local customers.

The investigation of demographic and psychographic profile of the customers will translate into more effective marketing strategies. The company can concentrate its efforts of marketing on demographics of customers that are more profitable and give greater incentive to loyal customers to stay with the firm. Alternatively, the company can also incorporate upselling strategies for customers that are demographically more likely to purchase more expensive policies.

We will also conduct a time series analysis to examine how demographic/psychographic profiles change with time.

Objective 2 (Analyse average product holdings per customer): Default rate

This objective investigates the length of time a customer continues to hold a product. This investigates the likelihood and the rate at which customers default of periodic policy payments and the rate at which they cancel policies. A high rate of policy cancellation might indicate a flaw with the product that the product management team might need to investigate. The primary techniques to be used for this is Survival Analysis, where Nonparametric methods and Cox Proportional Hazards Regression Model will be used, using the SAS software.

The Nonparametric method used will be a descriptive technique used to provide the rate at which the consumer drops a product. In SAS, a graph of the Kaplan Meier estimate can be used to allows users to see the survival function of the policy changes over time:

[PICTURE GOES HERE]

The Cox Proportional Hazards Regression Model investigates how different variables affect the policy cancellation rate (survival rate) of the policyholders using SAS. The graph below shows the survival function segmented to different age groups:

[PICTURE GOES HERE]

Some potential problems we might face is that most customers hold the policies until maturity. This means that there will be little variance among different groups. We might need to transform the data in order to obtain a more meaningful analysis.

Objective 3 (Determine which customer segments and products are more profitable): Profitability

In this objective, we will first have to estimate the customer lifetime value of each customer. The customer lifetime value is estimated using some of these variables:

  • Premium paid
  • Administration cost
  • Claim rate
  • Customer acquisition cost
  • Retention rate
  • Policy length
  • Type of insurance (Whole, Universal, Variable, Term)


Using these variables, we will be able to calculate the margins for each customer. We will then segment the customers to its appropriate segment using clustering techniques. From these segments, we will be able to see the segments that are the most profitable and utilise the appropriate marketing policy.

Tokio Marine should strategically target campaign, provide cheaper deals and provide incentive for renewal to customers that are the most profitable. The company should also try to avoid having policies with customers that eat into margins or charge a higher premium from these customers.

Objective 4 (Which channels are more profitable, direct online or through agents): Channels and profitability

This objective investigates the profitability of products sold by different channels of sales like direct, online or agent. The techniques used will be based on exploratory data analysis and clustering. We will be able to investigate the products each channel typically sells. On a whole, the profitability can be investigated by segmenting the data from different channels and looking at the profitability as a segment.

In order to have an in-depth look on the profitability. we could break down the profitability of different policies sold by each channel. The difference in profitability might lie in policy cancellation or the different cost of different channels. From these insight, Tokio Marine can order each channels to have a different specialisation of products based on profitability. For example, the online channel can specialise in travel insurance if it is found to be the most profitable channel.

Objective 5 (Propensity to buy or assess next best offer for customers to enhance effectiveness of marketing campaigns):

Our group will use Affinity Analysis to analyse the customer’s propensity to buy a certain basket of products. After segmenting customers according to demographics, we will conduct an Affinity Analysis to determine which basket of products these segments of customers usually buy. This will then provide guidance to Tokio Marine on marketing a specific group of products more aggressively to a segment of customers or which products to bundle together.

This can be visualized using a heatmap of co-occurrence as seen in the picture shown below, generated by Tableau:

[PICTURE GOES HERE]

To suggest the next best offer for customers, an Event Sequence Analysis can be used (SAS Enterprise Miner terms it as Path Analysis). This will determine what products customers usually buy given that he has purchased a specific product. This will provide guidance to Tokio Marine on which products to cross-sell and how to more effectively price the products to maximize revenue.

This can be visualized using a Sankey Diagram as shown below.

[PICTURE GOES HERE]

Both these analytical methods can be done using SAS Enterprise Miner with several defined macros from the community. For further visualization and presentation, the results can be ported over to JMP for its interactivity and range of visualization options.



References

  1. Roodpishi, M & Nashtaei, R. (2015). Market basket analysis in insurance industry. Management Science Letters , 5(4), 393-400.
  2. Kamakura, W. (2012). Sequential Market Based Analysis. Springer Science, Business Media, 23, 15-15. doi:DOI 10.1007/s11002-012-9181-6
  3. Introduction to SAS. UCLA: Statistical Consulting Group. from http://www.ats.ucla.edu/stat/sas/notes2/ (accessed November 24, 2007)