Difference between revisions of "ANLY482 AY2017-18 Group9: Project Overview/ Methodology"

From Analytics Practicum
Jump to navigation Jump to search
(editted EDA)
(edit data clean)
Line 79: Line 79:
  
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA CLEANING</strong></font></div></div>==
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA CLEANING</strong></font></div></div>==
Missing values and outliers observed during our exploratory data analysis may result in unnecessary inaccuracy and result in skewness of our analysis. To handle missing value, we will look at the missing values / blanks identified and determine if the value should be replaced with our estimation, or simply removed. For outliers, our team will attempt to analyze the underlying reason behind such occurrence and decide if it is important and significant enough to be included in our analysis.
+
Our group have identified a number of redundant columns in Delivery Data which were deemed not important in our analysis, and have subsequently removed it. <br>
 +
These columns are Time (The time where the order is being placed by the staff) and Type (The different type of order being placed).<br><br>
 +
Next, there were presence of several overseas and closed branches in the delivery data as well, such as KOI TE (Thailand), Cambodia Karanak KOI Café, Jakarta Koi, KOI Myanmar, Vietnam and closed outlets which include Espalande, close date of May 2017, and these were deemed as data noises. In addition to removing these data,we removed year 2015 and 2018 data as well, as our analysis on 2 years analysis - 2016 and 2017.  
 
<br>
 
<br>
 
<!--
 
<!--
Line 87: Line 89:
 
<br>
 
<br>
 
-->
 
-->
 +
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>MODEL SELECTION</strong></font></div></div>==
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>MODEL SELECTION</strong></font></div></div>==
 
Next, seasonality analysis will be carry out to determine the optimal restock amount for each day by utilizing the past data. We will be utilizing Quantitative Forecasting Analysis to help identify optimal restock amount. This forecasting approach involves the use of historical data to predict future demand for goods. However, it is notable that the more data is available, the more accurate picture of historical demand will be attained. Furthermore, though this model provides a basis of forecasting, demand could be affected by seasonality. Henceforth, our team are considering two Time Series Analysis Models Naive Approach and Seasonal Naive Approach. In our opinion, Seasonal Naive Approach is a more appropriate model for our analysis, however due to the limited data provided, our team are considering the more generic Naive Approach too for our data analysis to ensure that we take into account the full picture available.
 
Next, seasonality analysis will be carry out to determine the optimal restock amount for each day by utilizing the past data. We will be utilizing Quantitative Forecasting Analysis to help identify optimal restock amount. This forecasting approach involves the use of historical data to predict future demand for goods. However, it is notable that the more data is available, the more accurate picture of historical demand will be attained. Furthermore, though this model provides a basis of forecasting, demand could be affected by seasonality. Henceforth, our team are considering two Time Series Analysis Models Naive Approach and Seasonal Naive Approach. In our opinion, Seasonal Naive Approach is a more appropriate model for our analysis, however due to the limited data provided, our team are considering the more generic Naive Approach too for our data analysis to ensure that we take into account the full picture available.

Revision as of 16:45, 25 February 2018

Fablogo.png

TeamInsured Home.png   HOME

 

TeamInsured About Icon.png   PROJECT OVERVIEW

 

TeamInsured Findings.png   PROJECT FINDINGS

 

TeamInsured PM.png   PROJECT MANAGEMENT

 

TeamInsured Documentation.png   DOCUMENTATION

 

TeamInsured Documentation.png   MAIN PAGE

DATA COLLECTION / PREPARATION

Upon understanding the problems faced by KOI and coming up with potential solution for their problems, we requested a list of datasets that we will required to perform our analysis. In particular, we will target sales and wastage data to optimize the reorder inventory.

To facilitate our analysis, KOI has kindly provided our team with a data of 47 outlets with 1-2 years (from Jan 2016 - Dec 2017) worth of data collected. The client wish to focus on the latest business fiscal year, henceforth we will be provided with the latest data obtained. The type of data obtained are summarize in the table below.

Data Summary

DataSummary Group9.PNG










Delivery Data
A row in this table represents a specific delivery ordered by a branch. The detailed description of the main columns in this table are as follow:

Deliverydata group9.PNG
















Additional columns have also been created to aid us in our analysis as follow:

Deliverydata group9 columnscreated.PNG













Promotion Data
A row in this table represent a specific promotional campaign held in a period. The detailed description of the main columns in this table are as follow:

Promotiondata group9.PNG










Outlet Data
A row in this table represents the KOI branch outlet information. The detailed description of the main columns in this table are as follow:

Outletdata group9.PNG










EXPLORATORY DATA ANALYSIS

Our main project aim is to optimize inventory reorder point with a 15-20% safety net surplus. Utilising EDA, we will identify differences between orders from 2016 and 2017, taking into factors such as launch of new outlets. Additionally, we will analyze reordering frequency per outlets to identify outlets with the most reorders in terms of quantity and number of orders. Henceforth, we would perform a cluster analysis on products with similar trends as well as be able to provide a business recommendation on outlets with similar reordering frequncy in similar regions.

DATA CLEANING

Our group have identified a number of redundant columns in Delivery Data which were deemed not important in our analysis, and have subsequently removed it.
These columns are Time (The time where the order is being placed by the staff) and Type (The different type of order being placed).

Next, there were presence of several overseas and closed branches in the delivery data as well, such as KOI TE (Thailand), Cambodia Karanak KOI Café, Jakarta Koi, KOI Myanmar, Vietnam and closed outlets which include Espalande, close date of May 2017, and these were deemed as data noises. In addition to removing these data,we removed year 2015 and 2018 data as well, as our analysis on 2 years analysis - 2016 and 2017.

MODEL SELECTION

Next, seasonality analysis will be carry out to determine the optimal restock amount for each day by utilizing the past data. We will be utilizing Quantitative Forecasting Analysis to help identify optimal restock amount. This forecasting approach involves the use of historical data to predict future demand for goods. However, it is notable that the more data is available, the more accurate picture of historical demand will be attained. Furthermore, though this model provides a basis of forecasting, demand could be affected by seasonality. Henceforth, our team are considering two Time Series Analysis Models Naive Approach and Seasonal Naive Approach. In our opinion, Seasonal Naive Approach is a more appropriate model for our analysis, however due to the limited data provided, our team are considering the more generic Naive Approach too for our data analysis to ensure that we take into account the full picture available.

MODEL VALIDATION

To validate our model, we will be separating our data into two different set. 70% of the data will be used as our training data set and 30% of the data will be our test data to validate our model. Upon validating our model, we will modify our model and validate it again. This process will repeat until we are satisfied with the performance of our model.