AY1516 T2 Team Hew - Documentation

From Analytics Practicum
Jump to navigation Jump to search
Exploratory Analysis (Interim) Prediction


Data processing

Dataset Merging

Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.

Our Approach:
1. We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
2. Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.

Deleted Variables (not exhaustive):
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name

3. Merged both datasets by matching Policy and Risk_NO in both datasets
4. Grouped by the Policy, Risk_NO and Claim_NO to sum up the transactional-level data for each group to derive the total for the quantitative variables of GWP, ClaimPaid, Disc and Comm.

The final merged dataset contains 92 columns and 349,648 rows, with duplicate rows removed.

Further Cleaning & Processing

  • We hypothesized that there would be differences in claim amounts for different vehicle types, so we split the dataset into Motorcycle policies VS Vehicle policies for further analysis.
  • Likewise, we segregated the data into Corporate Accounts VS Personal Accounts as we felt that Corporate customers would have different demographics and behaviours.
  • We deleted Driver Ages from 0 -16 (630 entries) as this was below the legal age of 17 to drive a vehicle or ride a motorcycle in Indonesia. For high Driver Age limits, we used a boxplot to identify obvious outliers of age 103 and 114, which were excluded.
  • We studied data above 2 standard deviations away from the mean for Driver Age.
  • We excluded Policy “SBD/AORF/12-A0374288”, as its claim amount is more than 900 Million Rupiah (extreme outlier in dataset), resulting from the vehicle insured being sunk alongside the ship which was carrying it.
  • We excluded rows where SUM(Claim Paid)=0 and Sum(Claim OS)=0, as these are rejected claims and are excluded from the main dataset and saved as a separate table for further analysis (~1000 rows).
  • We removed several rows where SUM(GWP) was negative, which probably arose from data entry errors



Interim Findings

Dataset Summary

After data cleaning and processing, our cleaned dataset was reduced to around 350,0000 entries. Here is the overview of our dataset:

  • Motorcycle policies comprised of 44% (153,000) while vehicle policies comprised of 56% (196,000) of the dataset
  • Distinct corporate policies comprised of 18% (50,000) while personal policies comprised of 82% (230,000) of the dataset
  • Distinct Japanese clients comprised of 11%(30,000) while local clients comprised of 89% (250,000) of the dataset
  • New Business comprised of 89% (250,000) while Renewal Business comprised of 11% (30,000) of the dataset
  • Top Motorcycle brands distribution: Suzuki (49%), Honda (35%), Yamaha (14%), Kawasaki (1%)
  • Top Vehicle brand distribution: Mitsubishi (44%), Toyota (21%), Daihatsu (9%), Honda (7%), Hino (4%), Nissan (4%), Suzuki (3%), Isuzu (3%)

Claims Rate Overview

  • Overall claim rate is 15% (52,000 entries). Motorcycle claims rate is 1.5% while Vehicle claims rate is 25%.
  • Corporate claims rate is 22% while Personal claims rate is 11%.
  • Total cost ratio [(Claims + Discount + Commission)/GWP]: Motorcycle is 289% [Making a loss], Vehicle is 66.5%.

Sales Channels Breakdown

  • The main sales channels are leasing(76.4%), agent(8.7%), direct(8.4%), broker(3.2%), dealer(2.8%). This sales distribution will be further broken down to vehicle, motorcycle, personal, corporate
  • Vehicle sales channels: leasing (62.2%), agent (14.1%), direct (13.3%), broker (4.8%), dealer (4.9%)
  • Motorcycle sales channels: leasing (94.7%)
  • Corporate sales channels: leasing (42.8%), agent (19.7%), direct (19.7%), broker (8.8%), dealer (8.4%)
  • Personal sales channels: leasing (93.3%), agent (3.3%), broker (2.8%)


Profitability Analysis

Profitability Overview

Profitability.png

The graph plots the total value by month of variables “Sum(GWP Total)” [Revenue], “Sum(Claims Total)” [Claims Cost], “Sum(Desc Total)” [Cost of Discount], “Sum(Comm Total)” [Commision cost] against the policy inception date. From the graph, we can see that the total revenue collected by Tokio Marine is on a general downward trend, with a spike in December 2014. However, the profitability of the policies issued is generally healthy since the total cost is generally below the total revenue.

There are several interesting observations we found in the graph. First, There are consistent spikes in GWP at the end of each year. This is due to renewals from Corporate clients that generally takes place at year end. Second, there is a sharp decrease in Discount and a sharp increase in Commission on February 2014. After consultation with the client, it is suspected that this is due to the Tariffication law introduced in March 2014 that controlled the price of policies. In order to remain competitive, Tokio Marine changed its cost structure to manage the changing demands. Third, there is a decrease in claims amount from the start of 2015. This is because the policies are not fully matured, hence, not all impeding claims have been filed. Fourth, there is a sharp spike in GWP in December 2014, and subsequently a sharp drop. This observation will be further elaborated later.

Profitability2.png

As mentioned previously, the profitability of the policies tend to be healthy. The average total cost to GWP ratio before December 2014 tended at around 0.7. This is a healthy profitability since the ratio is below 1 (a ratio of above 1 signifies that the policies are loss making).

Our group further split this graph into Vehicle and Motorcycle segments. The vehicle segment tends to be similar to the graph above and had healthy profitability. However, the Motorcycle segment was making consistent losses after December 2012, with the average total cost to GWP ratio at 2.89. This means that Tokio Marine is paying 2.89 times the amount they collect from the Motorcycle segment. However, the Motorcycle segment constitutes of a small segment of the total sales and cost. Furthermore, the management seems to be agreeable to the lost in the Motorcycle segment due to the maintenance of corporate relations with several major clients.

Profitability3.png

Next, our group investigates the drop in GWP as mentioned earlier. In the above graph, the GWP is split between the Personal and Corporate accounts. For the Personal accounts, it can be seen that there is a drop in GWP but an increase in the number of policies sold. Upon further investigation, it is seen that there is a drop is average price of personal policy in 2013.

For the Corporate accounts, it is seen that the GWP and number of policies sold tend to be relatively stable. The GWP of the Corporate clients has a general increasing trend. However, there is a large spike in GWP in December 2014.

Agent = Orix Finance

Agent = NULL

In further analysis, it is observed that policies with the Agent of BII Finance, Orix Finance Indonesia or a NULL value were the largest contributor to the spike in GWP in December 2014. For Orix Finance, the spike was largely due to its leasing channel. For the Agents with NULL value, its spike was largely attributed to a significant increase in the broker channel, which decreased after the spike.

Vehicle models when Agent = NULL



Upon further analysis, it can be seen that the spike from the broker channels was due to this increase policy agreement for Avanza, Kijang and L200 model. We noticed the there are spikes in December 2012, December 2013 and June 2014. Our group suspect that TMI might have some agreement that increase their sales of policies with broker at the end of the year.

GWP Overview

First, we will look at the total GWP of each segment to get a general sense of what are the main source of revenue in the 3 year period of the dataset from 2012 - 2015:

  • Corporate clients attributed to 1000 billion INR while personal clients attributed to 1500 billion of total GWP in the 3 year period.
  • New business attributed to 2300 billion INR while Renewal business attributed to 200 billion INR of total GWP in the 3 year period.
  • Local clients attributed to 2300 billion INR while Japanese clients attributed to 200 billion INR.


Next, we investigated the Mean GWP by account types of vehicles, motorcycle, corporate and personal:

  • For vehicles, mean GWP for a corporate client is 9.5 million INR while personal is 16 million INR
  • For motorcycles, mean GWP paid for corporate and personal clients are around 410,000 INR.

Hence, from the above, it is seen that personal clients, new business, local clients and vehicles are the main revenue drivers for TMI.

Loss Ratio Overview

Loss Ratio is defined as the total claims / total GWP of the insured segment. This measure helps determine if a particular segment is profitable or making losses from claims. Here are some of the segments analysed:

  • The loss ratio of corporate clients are 0.29 while personal clients are 0.40
  • The loss ratio of japanese clients are 0.22 while local clients are 0.39
  • The loss ratio of renewal business is 0.17 while new business is 0.39


In the analysis of claims rate it is seen that corporate, japanese and renewal business has higher claims rate than personal, local and new business. However, the loss ratio of corporate, japanese and renewal business tend to be lower. This is attributed to the lower average claim paid for japanese, corporate and renewal business.

Upon further analysis, it is seen that Japanese, corporate and renewal business tend to be in a cluster. Our group hypothesize that Japanese clients tend to be corporate business which in turn tend to conduct renewal businesses. Hence, it can be concluded that the Japanese, corporate client tend to be the most profitable client profile.

Other Findings

Claims Causes

The top claims causes for vehicles are grazed (vehicle) [25%], Impact (vehicle) [24%], grazed (non-vehicle) [19%], impact (non-vehicle) [17%], collision [7%], theft and burglary [3%]. On the other hand, for motorcycles, the main cause for claims are theft and burglary [93%]. This is highly abnormal since theft and burglary only attribute to 4% of overall claims causes. This might be the reason that their motorcycle segment has a high total claims cost and contribute to the large losses made in the motorcycle segment. Management of Tokio Marine is aware of this situation for motorcycles in Indonesia and have placed measures Q3 2014.

Cover Type

For the vehicle segment, more customers buy comprehensive policies [125,000] rather than total loss only policies [70,000], becuase repair cost is much more significant. For the motorcycle segment, most customers buy the total loss only policies (being the cheaper option) [150,000] while a negligible number of people buys the comprehensive policies. This might be because repair cost of motorcycles is significantly lower.. Thus, many customers do not see the need for a comprehensive policy.

Driver Age Distribution

It is observed that the mean age for motorcycle clients tend to be lower than vehicle clients. For vehicle clients, the mean age is 37.8 years with a standard deviation of 9.5. For the motorcycle clients, the mean age is 32.7 years with a standard deviation of 10.7. Usually Motorcycle drivers are younger and thus have lower income and cannot afford cars.

Branch Region by Channels

For the vehicle segment, the region that sold the most policies is Jakarta. Most policies are obtained through leasing channel. For the motorcycle segment, region Medan, Jakarta and Surabaya had the most sale of policies. Most policies are also obtained through leasing channel as explained above that buying coverage is a condition of the lease. This supports the idea that Indonesians still do not see insurance coverage as essential.

Treemap Graphical Representation of Region, Corppers and Claim Rate

Using a treemap in JMP, we observed that Jakarta, Medan and Palembang is the highest number of claims collected. Of which, Palenbang and Batam is seen to have to highest claims rate since a darker shade of colour in the treemap indicates high claims rate. Palenbang and Batam are very small markets and thus having one of the highest claims rate is surprising and worrying.

Branch Region by Channels

In the patterns analysis, it is seen that majority of flood claims came in January and February. This might be due to a major flood that happened in a specific year in Jan/ Feb. The incidence of fire claims tend to be more stable, with a majority of the incidence happening in September.

In addition, it is seen that there are drops in claims during the period of August to September every year. We hypothesised that it might be due to Hari Raya which falls mostly on that period as people do not usually travel this festive period. However, the period before Hari Raya is a period where people are rushing back to their hometowns for this festive season and this might have caused more accidents.