Difference between revisions of "Uncovering Market-Insights for Charles & Keith: Data Preparation"
(Added DataPrep Banner) |
|||
Line 1: | Line 1: | ||
− | < | + | <div style=background:#F2D0C7 border:#F2D0C7> |
− | [[ | + | [[Image:DATAPREP.jpg|800px|center]] |
− | </ | + | </div> |
<!--Banner--> | <!--Banner--> |
Revision as of 15:46, 29 February 2016
HOME | OVERVIEW | DATA PREPARATION | ANALYSIS | PROJECT MANAGEMENT | DOCUMENTATION |
Invalid Transactions
For attribute “SaleQty”, it contains negative value from 0 to -25. Upon confirming with our sponsors, these sales quantity are actually recorded wrongly and sale have taken place. Hence, our group removed all the sales quantity that are less than 1.
Country
Since the dataset is the record from China, our group has decided to remove the attribute “Country”.
TransactionId
For attribute “TransactionId”, since it is a identifier, our group changed the data type from Numeric to Character.
Date
To prepare the data correctly, the Date attribute has to be changed. Using JMP Pro, we changed the setting of Date, Data Type from “Character” to “Numeric”. We also set the date format to “m/d/y”.
Materials
In the figure below, even though Ankle Boot and ANKLEBOOT are the same name, they are classified differently. This also applies to Ballerina and BALLERINA. Hence, our group has recoded attribute “Material” into a separate column named “Material 2” to ensure that materials of the same name are being grouped together
Subclass
Besides Materials, similar recoding work was also done to attribute “Subclass”, “Class” and “Size”. For “Subclass”, the PF in PF COVERED, PF OPEN TOE and PF PEEP TOE are all referring to PLATFORM.
Hence, We replaced all PF to PLATFORM.
Class
For attribute “CLASS”, upon further investigation, our group realise that PASSPORT HOLDER and PP HOLDER are the same thing. The same could be said about SHOULDER and SHOULDER BAG as well as SLING and SLING BAG.
Hence, We recoded this names, to ensure that our analysis will be accurate in the future.
Size
For attribute “SIZE”, all the numerical values belong to Shoe sizes, while the rest refers to accessory sizes such as Necklace, Bags and Wallets etc. To prevent any confusion, our group recoded the size from “340” to “34” for all shoes sizes.
TransactionStoreID
To ensure that our market basket analysis to be accurate for the next phase of our practicum, our group has created a new variable name that is unique. The attribute “TransactionId” is not a unique identifier for each row of data because different StorName of different Region could have used the same TransactionID. Hence, our group has concatenated the StoreName and TransactionId to create a unique identifier.