Difference between revisions of "Uncovering Market-Insights for Charles & Keith: Data Preparation"

From Analytics Practicum
Jump to navigation Jump to search
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
<div style=background:#F2D0C7 border:#F2D0C7>
 +
[[Image:DATAPREP.jpg|800px|center]]
 +
</div>
 +
 
<!--Banner-->
 
<!--Banner-->
 
{|style="background-color:#FFFFFF; color:#24c7b1; padding: 6px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#FFFFFF; color:#24c7b1; padding: 6px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
| style="padding:0.3em; font-size:100%; background-color:#35383c;  border-bottom:3px solid #35383c; text-align:center; color:#828282" width="11%" | [[Uncovering Market-Insights for Charles & Keith |<font face = "Trebuchet MS" color="#FFFFFF" size=2><b>HOME</b></font>]]
+
| style="padding:0.3em; font-size:100%; background-color:#FFFFFF;  border-bottom:3px solid #35383c; text-align:center; color:#828282" width="11%" | [[Uncovering Market-Insights for Charles & Keith |<font face = "Trebuchet MS" color="#000000" size=2><b>HOME</b></font>]]
  
 
| style="border-bottom:3px solid #35383c; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:3px solid #35383c; background:none;" width="1%" | &nbsp;
Line 7: Line 11:
  
 
| style="border-bottom:3px solid #35383c; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:3px solid #35383c; background:none;" width="1%" | &nbsp;
| style="padding:0.3em; font-size:100%; background-color:#FFFFFF;  border-bottom:3px solid #35383c; text-align:center; color:#828282" width="11%" | [[Uncovering Market-Insights for Charles & Keith: Data Preparation | <font face = "Trebuchet MS" color="#000000" size=2><b>DATA PREPARATION</b></font>]]
+
| style="padding:0.3em; font-size:100%; background-color:#35383c;  border-bottom:3px solid #35383c; text-align:center; color:#828282" width="11%" | [[Uncovering Market-Insights for Charles & Keith: Data Preparation | <font face = "Trebuchet MS" color="#FFFFFF" size=2><b>DATA PREPARATION</b></font>]]
  
 
| style="border-bottom:3px solid #35383c; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:3px solid #35383c; background:none;" width="1%" | &nbsp;
Line 21: Line 25:
  
  
<div style="border-style: solid solid none; border-color: #35383c; border-width: 1px 1px; padding: 5px; font-size: 120%; font-weight: bold; background-color: #{{LibreOfficeColor2}}; color: #{{LibreOfficeColor3}}; border-radius: 3px 3px 0 0;">Initial Dataset</div>
 
<div style="border: 1px solid #35383c; padding: 15px 15px 20px; border-radius: 0 0 3px 3px;">
 
[[File:DataPreparation.jpg|none| ]]<br />
 
Our data set has 15 columns and 4,374,674 rows of data
 
  
</div>
+
<center>
 +
{| style="background-color:#ffffff ; margin: 3px 11px 3px 11px;" width="80%"|
 +
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border-top:solid #f5f5f5; background-color: #fff" width="200px" |
 +
[[Uncovering Market-Insights for Charles & Keith: Data Preparation|<font color="#3c3c3c"><strong>INITIAL DATASET</strong></font>]]
  
 +
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #f5f5f5; background-color: #f5f5f5" width="200px" | 
 +
[[Uncovering Market-Insights for Charles & Keith: Data Cleaning|<font color="#3c3c3c"><strong>DATA CLEANING </strong></font>]]
  
 +
| style="font-family:Trebuchet MS; font-size:11px; text-align: center; border:solid 1px #f5f5f5; background-color: #f5f5f5" width="200px" | 
 +
[[Uncovering Market-Insights for Charles & Keith: Final Dataset|<font color="#3c3c3c"><strong>FINAL DATASET</strong></font>]]
  
<div style="border-style: solid solid none; border-color: #35383c; border-width: 1px 1px; padding: 5px; font-size: 120%; font-weight: bold; background-color: #{{LibreOfficeColor2}}; color: #{{LibreOfficeColor3}}; border-radius: 3px 3px 0 0;">Recoding of Columns</div>
+
|}
<div style="border: 1px solid #35383c; padding: 15px 15px 20px; border-radius: 0 0 3px 3px;">
+
</center>
<b>Country</b>
 
 
 
[[File:Country.jpg|none| ]]
 
Since the dataset is the record from China, our group has decided to remove the attribute “Country”.
 
 
 
<b>TransactionId</b>
 
[[File:TID1.jpg|none| ]][[File:TID2.jpg|none| ]]
 
For attribute “TransactionId”, since it is a identifier, our group changed the data type from Numeric to Character.
 
 
 
<b>Date</b>
 
[[File:Date1.jpg|none| ]][[File:Date2.jpg|none| ]]
 
To prepare the data correctly, the Date attribute has to be changed. Using JMP Pro, we changed the setting of Date, Data Type from “Character” to “Numeric”. We also set the date format to “m/d/y”.
 
 
 
<b>Materials</b>
 
[[File:Mat1.jpg|none| ]][[File:Mat2.jpg|none| ]]
 
In the figure below, even though Ankle Boot and ANKLEBOOT are the same name, they are classified differently. This also applies to Ballerina and BALLERINA. Hence, our group has recoded attribute “Material” into a separate column named “Material 2” to ensure that materials of the same name are being grouped together
 
 
 
<b>Subclass</b>
 
[[File:SC1.jpg|none| ]][[File:SC2.jpg|none| ]]
 
Besides Materials, similar recoding work was also done to attribute “Subclass”, “Class” and “Size”. For “Subclass”, the PF in PF COVERED, PF OPEN TOE and PF PEEP TOE are all referring to PLATFORM.
 
 
 
Hence, We replaced all PF to PLATFORM.
 
 
 
<b>Class</b>
 
[[File:Class1.jpg|none| ]][[File:Class2.jpg|none| ]]
 
For attribute “CLASS”, upon further investigation, our group realise that PASSPORT HOLDER and PP HOLDER are the same thing. The same could be said about SHOULDER and SHOULDER BAG as well as SLING and SLING BAG. Hence, We recoded this names, to ensure that our analysis will be accurate in the future.
 
 
 
<b>Size</b>
 
[[File:S1.jpg|none| ]][[File:S2.jpg|none| ]]
 
For attribute “SIZE”, all the numerical values belong to Shoe sizes, while the rest refers to accessory sizes such as Necklace, Bags and Wallets etc. To prevent any confusion, our group recoded the size from “340” to “34” for all shoes sizes.
 
 
 
<b> TransactionStoreID</b>
 
[[File:TSID.jpg|none| ]]
 
To ensure that our market basket analysis to be accurate for the next phase of our practicum, our group has created a new variable name that is unique. The attribute “TransactionId” is not a unique identifier for each row of data because different StorName of different Region could have used the same TransactionID. Hence, our group has concatenated the StoreName and TransactionId to create a unique identifier.
 
 
 
</div>
 
  
 +
[[Image:AYEInitialDatset.jpg|900px|center|AYE InitialDataset]]
  
<div style="border-style: solid solid none; border-color: #35383c; border-width: 1px 1px; padding: 5px; font-size: 120%; font-weight: bold; background-color: #{{LibreOfficeColor2}}; color: #{{LibreOfficeColor3}}; border-radius: 3px 3px 0 0;">Final Dataset</div>
 
 
<div style="border: 1px solid #35383c; padding: 15px 15px 20px; border-radius: 0 0 3px 3px;">
 
<div style="border: 1px solid #35383c; padding: 15px 15px 20px; border-radius: 0 0 3px 3px;">
[[File:F1.jpg|none| ]][[File:F2.jpg|none| ]]
+
Our project sponsor CHARLES & KEITH GROUP has provided us with a dataset that consist of 3 years worth of in-store transaction data from all of its C&K retail stores in Mainland China and also order quantity based on each article, during the period from 2013 to 2015. Our data set has 15 columns and 4,374,674 rows of data.
At the end of the data preparation, the dataset has 19 columns rather than 15 columns
 
 
</div>
 
</div>

Latest revision as of 17:18, 17 April 2016

DATAPREP.jpg
HOME   OVERVIEW   DATA PREPARATION   ANALYSIS   PROJECT MANAGEMENT   DOCUMENTATION


INITIAL DATASET

DATA CLEANING

FINAL DATASET

AYE InitialDataset

Our project sponsor CHARLES & KEITH GROUP has provided us with a dataset that consist of 3 years worth of in-store transaction data from all of its C&K retail stores in Mainland China and also order quantity based on each article, during the period from 2013 to 2015. Our data set has 15 columns and 4,374,674 rows of data.