Difference between revisions of "AY1516 T2 Team Hew - Documentation"
Jump to navigation
Jump to search
Line 23: | Line 23: | ||
== <p style="font-family:Trebuchet MS; border-left: 6px solid #62b762; padding-left:10px; line-height:40px; height:40px"><b>Data processing </b></p>== | == <p style="font-family:Trebuchet MS; border-left: 6px solid #62b762; padding-left:10px; line-height:40px; height:40px"><b>Data processing </b></p>== | ||
+ | ===Dataset Merging=== | ||
+ | Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables. | ||
+ | <br/><br/> | ||
+ | '''Our Approach''' | ||
+ | # We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015. | ||
+ | # Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis. | ||
− | + | {| class="wikitable" | |
+ | |- | ||
+ | | VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name | ||
+ | |} | ||
+ | |||
+ | <br/><br/> | ||
</div> | </div> | ||
Revision as of 21:58, 28 February 2016
Data processing
Dataset Merging
Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.
Our Approach
- We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015.
- Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name |
Findings