Difference between revisions of "Be Customer Wise or Otherwise - Findings"

Revision as of 23:43, 2 March 2015

Data Collection and Preparation

Merging and Cleaning

GLC provided us with 3 datasets, namely Metadata, CRM, and Sales, collected over the 12 months of 2008. Metadata contains fields such as ‘Industry Description’ that help to interpret the industry codes used in the other files. The CRM dataset contains information about the accounts, including fields such as ‘New Ac Number’ and ‘Date Opened’. Finally, the Sales dataset contains 2.5 million rows of individual transactions that occurred within the 12 months of 2008. This file contains fields such as ‘Local Revenue’, ‘Destination’, ‘Origin’, ‘Billed Weight’ and ‘Sales Channel’.

Since we require variables from all three datasets, the first step we took was to merge them by joining the tables based on common fields such as ‘New Ac Number’ for the convenience of our subsequent analysis. Thereafter, we derived certain variables that, from our literature review, would be used in our later analysis, such as ‘Date of Last Transaction’ (i.e. Recency), ‘No. of Transactions’ per account (i.e. Frequency) and ‘Total 2008 Revenue’ per account (i.e. Monetary).

We then removed unnecessary and duplicate fields such as those that, for confidentiality purposes, were given an ‘XXX’ value for all rows.

Missing Data

Finally, we did a missing data analysis to examine how significant the missing data were and decided how to deal with them. A summary can be seen in the table below.

Variable	Count (Missing)
description of activities	2424057 (95.42%)
inbound_contract_code	2099942 (82.66%)
outbound_contract_code	328756 (12.94%)
industry_code_CRM	265198 (10.44%)
site_grouping	214763 (8.45%)
local revenue	24970 (0.98%)
zip_code	3466 (0.136%)

@@ Line 21: / Line 21: @@
 <!--Content Start-->
 <div align="left">
+==<div style="background: #FFBC35; padding: 10px; font-family:Segoe UI; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 13px; border-left: #ff8a18 solid 30px;"><font color="black">Data Collection and Preparation</font></div>==
+<div style="border-left: #ffffff solid 12px; padding: 0px 30px 0px 13px;">
+===<div style="margin-top: 10px; text-align:left; font-size: 16px; font-weight: bold;">Merging and Cleaning</div>===
+GLC provided us with 3 datasets, namely Metadata, CRM, and Sales, collected over the 12 months of 2008. Metadata contains fields such as ‘Industry Description’ that help to interpret the industry codes used in the other files. The CRM dataset contains information about the accounts, including fields such as ‘New Ac Number’ and ‘Date Opened’. Finally, the Sales dataset contains 2.5 million rows of individual transactions that occurred within the 12 months of 2008. This file contains fields such as ‘Local Revenue’, ‘Destination’, ‘Origin’, ‘Billed Weight’ and ‘Sales Channel’.
+Since we require variables from all three datasets, the first step we took was to merge them by joining the tables based on common fields such as ‘New Ac Number’ for the convenience of our subsequent analysis. Thereafter, we derived certain variables that, from our literature review, would be used in our later analysis, such as ‘Date of Last Transaction’ (i.e. Recency), ‘No. of Transactions’ per account (i.e. Frequency) and ‘Total 2008 Revenue’ per account (i.e. Monetary).
+We then removed unnecessary and duplicate fields such as those that, for confidentiality purposes, were given an ‘XXX’ value for all rows.
+===<div style="margin-top: 10px; text-align:left; font-size: 16px; font-weight: bold;">Missing Data</div>===
+Finally, we did a missing data analysis to examine how significant the missing data were and decided how to deal with them. A summary can be seen in the table below.
+{| border="1" cellpadding="5" cellspacing="0"
+! style="width=100%; text-align: center; font-weight: bold; background: #FFA66B;" | Variable
+! style="width=100%; text-align: center; font-weight: bold; background: #FFA66B;" | Count (Missing)
+|-
+|| description of activities
+| 2424057 (95.42%)
+|-
+|| inbound_contract_code
+| 2099942 (82.66%)
+|-
+|| outbound_contract_code
+| 328756 (12.94%)
+|-
+|| industry_code_CRM
+| 265198 (10.44%)
+|-
+|| site_grouping
+| 214763 (8.45%)
+|-
+|| local revenue
+| 24970 (0.98%)
+|-
+|| zip_code
+| 3466 (0.136%)
+|}
+</div>
 ==<div style="background: #FFBC35; padding: 10px; font-family:Segoe UI; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 13px; border-left: #ff8a18 solid 30px;"><font color="black">Exploratory Data Analysis</font></div>==
 <div style="border-left: #ffffff solid 12px; padding: 0px 30px 0px 13px;">

Difference between revisions of "Be Customer Wise or Otherwise - Findings"

Revision as of 23:43, 2 March 2015

Contents

Data Collection and Preparation

Merging and Cleaning

Missing Data

Exploratory Data Analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools