ANLY482 AY1516 G1 Team Skulptors - Data Analysis
Jump to navigation
Jump to search
HOME | ABOUT US | PROJECT OVERVIEW | DATA ANALYSIS new! | PROJECT MANAGEMENT | DOCUMENTATION |
Data Cleaning new! | Inbound EDA new! | Outbound EDA new! |
Before embarking on the Exploratory Data Analysis (EDA) stage, the team took time to understand the data they were handling and also remove irrelevant data that should not be included in the analysis. The data cleaning change log can be seen below:
Data Cleaning Log | |||
---|---|---|---|
S/N | Date | Data Cleaning Steps | Rationale & Justification |
01 | 06 Jan | Removed data with repetition. | Data repetition is commonly due to accidental double scanning in the warehouse. |
02 | 06 Jan | Removed rows with missing data. | Rows with incomplete data are excluded as they have no value. |
03 | 06 Jan | Removed data rows which exceeded time frame. | As our project focuses on the inflow rate and outflow rate of the SKUs movement for a year (Jan 2015 – Dec 2015), data sets which exceeded this time frame were removed. |
04 | 23 Jan | Removed data rows with transcode “IRT” for inbound dataset. | Sponsors felt that rows with transcode “IRT” were not valuable to them as they represented failed inbound transactions. |
05 | 23 Jan | Removed data rows with transcode “OCP,ORP,ORV,OSA, OSD” for outbound dataset. | Sponsors felt that rows with transcode “OCP, ORP, ORV, OSA, OSD” were not valuable to them as they represented failed outbound transactions. |
06 | 23 Jan | Amendments to outbound dataset. | Team found out that there were some discrepancy in the outbound data. There were two outbound excel data sets from (Jan - July) and (Aug - Dec). However, (Jan - July)’s excel has some datasets with transaction records of in the month of Aug, Sep and Oct. These dataset were unique from those in the (Aug - Dec)’s data. Hence, the team ported over the datasets which had transactions from Aug, Sep and Oct from the (Jan - July)’s excel to (Aug - Dec)’s excel upon clarification with sponsors. |
07 | 05 Feb | Re-added data rows with transcode “IRT” for inbound dataset. | Upon advice from Prof. This will provide sponsors with the option to analyze failed inbound transactions if they wish to, in the future. |
08 | 05 Feb | Re-added data rows with transcode “OCP,ORP,ORV,OSA, OSD” for outbound dataset. | Upon advice from Prof. This will provide sponsors with the option to analyze failed outbound transactions if they wish to, in the future. |
09 | 05 Feb | Merging of two separate excel files into a single csv file. | Upon advice from Prof. To eliminate the need for working on two separate datasets. Improves efficiency of analyzing as there is only one single csv file to work on. |
10 | 18 Feb | Added a column into existing dataset to represent each SKU “A, B, C” classification. | Upon advice from Prof. This will allow team to further analyze and find out how the “A, B, C” classification of each SKU correlates with it outbound rate and put away location. |
11 | 26 Feb | Creation of recoded columns. | To allow ease of analyzation using JMP and SAS Enterprise Miner. |