Difference between revisions of "AY1516 T2 Team Hew - Documentation"

From Analytics Practicum
Jump to navigation Jump to search
Line 27: Line 27:
 
<br/><br/>
 
<br/><br/>
 
'''Our Approach'''
 
'''Our Approach'''
# We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
+
# We tried a different approach by first filtering both datasets to only select Policies with ''Orig_InceptionDate'' between 2012 to 2015.
 
# Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
 
# Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
  
 
{| class="wikitable"
 
{| class="wikitable"
 +
|-
 +
! Deleted Variables (not exhaustive):
 
|-
 
|-
 
| VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name
 
| VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name
 
|}
 
|}
  
 +
# Merged both datasets by matching Policy and Risk_NO in both datasets
 +
# Grouped by the ''Policy'', ''Risk_NO'' and ''Claim_NO'' to sum up the transactional-level data for each group to derive the total for the quantitative variables of ''GWP'', ''ClaimPaid'', ''Disc'' and ''Comm''.
 
<br/><br/>
 
<br/><br/>
 
</div>
 
</div>

Revision as of 22:00, 28 February 2016


Data processing

Dataset Merging

Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.

Our Approach

  1. We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
  2. Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
Deleted Variables (not exhaustive):
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name
  1. Merged both datasets by matching Policy and Risk_NO in both datasets
  2. Grouped by the Policy, Risk_NO and Claim_NO to sum up the transactional-level data for each group to derive the total for the quantitative variables of GWP, ClaimPaid, Disc and Comm.



Findings