Difference between revisions of "AY1516 T2 Team Hew - Documentation"
Jump to navigation
Jump to search
Line 27: | Line 27: | ||
<br/><br/> | <br/><br/> | ||
'''Our Approach''' | '''Our Approach''' | ||
− | # We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015. | + | # We tried a different approach by first filtering both datasets to only select Policies with ''Orig_InceptionDate'' between 2012 to 2015. |
# Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis. | # Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis. | ||
{| class="wikitable" | {| class="wikitable" | ||
+ | |- | ||
+ | ! Deleted Variables (not exhaustive): | ||
|- | |- | ||
| VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name | | VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name | ||
|} | |} | ||
+ | # Merged both datasets by matching Policy and Risk_NO in both datasets | ||
+ | # Grouped by the ''Policy'', ''Risk_NO'' and ''Claim_NO'' to sum up the transactional-level data for each group to derive the total for the quantitative variables of ''GWP'', ''ClaimPaid'', ''Disc'' and ''Comm''. | ||
<br/><br/> | <br/><br/> | ||
</div> | </div> |
Revision as of 22:00, 28 February 2016
Data processing
Dataset Merging
Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.
Our Approach
- We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
- Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
Deleted Variables (not exhaustive): |
---|
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name |
- Merged both datasets by matching Policy and Risk_NO in both datasets
- Grouped by the Policy, Risk_NO and Claim_NO to sum up the transactional-level data for each group to derive the total for the quantitative variables of GWP, ClaimPaid, Disc and Comm.
Findings