Difference between revisions of "AY1516 T2 Team Hew - Documentation"

From Analytics Practicum
Jump to navigation Jump to search
Line 23: Line 23:
  
 
== <p style="font-family:Trebuchet MS; border-left: 6px solid #62b762; padding-left:10px; line-height:40px; height:40px"><b>Data processing </b></p>==
 
== <p style="font-family:Trebuchet MS; border-left: 6px solid #62b762; padding-left:10px; line-height:40px; height:40px"><b>Data processing </b></p>==
 +
===Dataset Merging===
 +
Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.
 +
<br/><br/>
 +
'''Our Approach'''
 +
# We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
 +
# Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
  
Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.
+
{| class="wikitable"
 +
|-
 +
| VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name
 +
|}
 +
 
 +
<br/><br/>
 
</div>
 
</div>
  

Revision as of 21:58, 28 February 2016


Data processing

Dataset Merging

Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.

Our Approach

  1. We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
  2. Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name



Findings