AY1516 T2 Team Hew - Documentation

Home

Team

Overview

Project Management

Documentation

Data processing

Dataset Merging

Initially, we tried merging the 2 provided datasets directly but were unsuccessful as our computers ran out of memory, since the end result was a combined dataset of over 6GB in file size and 218 variables.

Our Approach 1. We tried a different approach by first filtering both datasets to only select Policies with Orig_InceptionDate between 2012 to 2015. 2. Deleted variables in both datasets which were either duplicate columns or not very useful in our analysis.

Deleted Variables (not exhaustive):
VEH_TYPE, VEH_Reg_NO, VEH_Chassis, VEH_Engine, Category, AGENT_Code, Client_CODE, Person_in_Charge, DEPT, INWARD, IND, ACT_Policy Duration, LongTERM_POL, Inception_Date, Expiry_Date, UW_YR, UW_Quarter, Eff_Yr, EFF_Month, EFF_Quarter, Last_EffDate, VEH_Code, VEH_CLASS, Contract_Type, Anniversary, Account_Type, Intermediaries, Intermediares_Area, Intermediares_Island, Vehicle_SI_Acceptance, Cause_Code, Loss_Code, Repairer_Code, Transaction_Code, CLM_NO, Repairier_Status2, Transaction_Desc, TransCode_Desc, Payer_name

3. Merged both datasets by matching Policy and Risk_NO in both datasets 4. Grouped by the Policy, Risk_NO and Claim_NO to sum up the transactional-level data for each group to derive the total for the quantitative variables of GWP, ClaimPaid, Disc and Comm.

The final merged dataset contains 92 columns and 349,648 rows, with duplicate rows removed.

AY1516 T2 Team Hew - Documentation

Data processing

Dataset Merging

Findings

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools