Difference between revisions of "AY1516 T2 Team Hew - Documentation"

From Analytics Practicum
Jump to navigation Jump to search
Line 56: Line 56:
  
 
== <p style="font-family:Trebuchet MS; border-left: 6px solid #daad25; padding-left:10px; line-height:40px; height:40px"><b>Findings </b></p>==
 
== <p style="font-family:Trebuchet MS; border-left: 6px solid #daad25; padding-left:10px; line-height:40px; height:40px"><b>Findings </b></p>==
 +
===Dataset Summary===
 +
After data cleaning and processing, our cleaned dataset was reduced to around 350,0000 entries. Here is the overview of our dataset:
 +
* Motorcycle policies comprised of 44% (153,000) while vehicle policies comprised of 56% (196,000) of the dataset
 +
* Distinct corporate policies comprised of 18% (50,000) while personal policies comprised of 82% (230,000) of the dataset
 +
* Distinct Japanese clients comprised of 11%(30,000) while local clients comprised of 89% (250,000) of the dataset
 +
* New Business comprised of 89% (250,000) while Renewal Business comprised  of 11% (30,000) of the dataset
 +
* Top Motorcycle brands distribution: Suzuki (49%), Honda (35%), Yamaha (14%), Kawasaki (1%)
 +
* Top Vehicle brand distribution: Mitsubishi (44%), Toyota (21%), Daihatsu (9%), Honda (7%), Hino (4%), Nissan (4%), Suzuki (3%), Isuzu (3%)
 +
<br/><br/>
 +
====Claims Rate Overview====
 +
* Overall claim rate is 15% (52,000 entries). Motorcycle claims rate is 1.5% while Vehicle claims rate is 25%.
 +
* Corporate claims rate is 22% while Personal claims rate is 11%.
 +
* Total cost ratio [(Claims + Discount + Commission)/GWP]: Motorcycle is 289% [Making a loss], Vehicle is 66.5%.
 +
<br/><br/>
 +
====Sales Channels Breakdown====
 +
* The main sales channels are leasing(76.4%), agent(8.7%), direct(8.4%), broker(3.2%), dealer(2.8%). This sales distribution will be further broken down to vehicle, motorcycle, personal, corporate
 +
* Vehicle sales channels: leasing (62.2%), agent (14.1%), direct (13.3%), broker (4.8%), dealer (4.9%) 
 +
* Motorcycle sales channels: leasing (94.7%)
 +
* Corporate sales channels: leasing (42.8%), agent (19.7%), direct (19.7%), broker (8.8%), dealer (8.4%)
 +
* Personal sales channels: leasing (93.3%), agent (3.3%), broker (2.8%)
  
 
+
<br/><br/>
 
+
===Profitability Analysis===
 +
====Profitability Overview====
 
<br/><br/>
 
<br/><br/>
 
</div>
 
</div>

Revision as of 22:15, 28 February 2016


Data processing

Dataset Merging

Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.

Our Approach:
1. We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
2. Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.

Deleted Variables (not exhaustive):
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name

3. Merged both datasets by matching Policy and Risk_NO in both datasets
4. Grouped by the Policy, Risk_NO and Claim_NO to sum up the transactional-level data for each group to derive the total for the quantitative variables of GWP, ClaimPaid, Disc and Comm.

The final merged dataset contains 92 columns and 349,648 rows, with duplicate rows removed.

Further Cleaning & Processing

  • We hypothesized that there would be differences in claim amounts for different vehicle types, so we split the dataset into Motorcycle policies VS Vehicle policies for further analysis.
  • Likewise, we segregated the data into Corporate Accounts VS Personal Accounts as we felt that Corporate customers would have different demographics and behaviours.
  • We deleted Driver Ages from 0 -16 (630 entries) as this was below the legal age of 17 to drive a vehicle or ride a motorcycle in Indonesia. For high Driver Age limits, we used a boxplot to identify obvious outliers of age 103 and 114, which were excluded.
  • We studied data above 2 standard deviations away from the mean for Driver Age.
  • We excluded Policy “SBD/AORF/12-A0374288”, as its claim amount is more than 900 Million Rupiah (extreme outlier in dataset), resulting from the vehicle insured being sunk alongside the ship which was carrying it.
  • We excluded rows where SUM(Claim Paid)=0 and Sum(Claim OS)=0, as these are rejected claims and are excluded from the main dataset and saved as a separate table for further analysis (~1000 rows).
  • We removed several rows where SUM(GWP) was negative, which probably arose from data entry errors



Findings

Dataset Summary

After data cleaning and processing, our cleaned dataset was reduced to around 350,0000 entries. Here is the overview of our dataset:

  • Motorcycle policies comprised of 44% (153,000) while vehicle policies comprised of 56% (196,000) of the dataset
  • Distinct corporate policies comprised of 18% (50,000) while personal policies comprised of 82% (230,000) of the dataset
  • Distinct Japanese clients comprised of 11%(30,000) while local clients comprised of 89% (250,000) of the dataset
  • New Business comprised of 89% (250,000) while Renewal Business comprised of 11% (30,000) of the dataset
  • Top Motorcycle brands distribution: Suzuki (49%), Honda (35%), Yamaha (14%), Kawasaki (1%)
  • Top Vehicle brand distribution: Mitsubishi (44%), Toyota (21%), Daihatsu (9%), Honda (7%), Hino (4%), Nissan (4%), Suzuki (3%), Isuzu (3%)



Claims Rate Overview

  • Overall claim rate is 15% (52,000 entries). Motorcycle claims rate is 1.5% while Vehicle claims rate is 25%.
  • Corporate claims rate is 22% while Personal claims rate is 11%.
  • Total cost ratio [(Claims + Discount + Commission)/GWP]: Motorcycle is 289% [Making a loss], Vehicle is 66.5%.



Sales Channels Breakdown

  • The main sales channels are leasing(76.4%), agent(8.7%), direct(8.4%), broker(3.2%), dealer(2.8%). This sales distribution will be further broken down to vehicle, motorcycle, personal, corporate
  • Vehicle sales channels: leasing (62.2%), agent (14.1%), direct (13.3%), broker (4.8%), dealer (4.9%)
  • Motorcycle sales channels: leasing (94.7%)
  • Corporate sales channels: leasing (42.8%), agent (19.7%), direct (19.7%), broker (8.8%), dealer (8.4%)
  • Personal sales channels: leasing (93.3%), agent (3.3%), broker (2.8%)



Profitability Analysis

Profitability Overview