Difference between revisions of "ANLY482 AY2017-18 Group9: Project Overview/ Methodology"

From Analytics Practicum
Jump to navigation Jump to search
(create methodology page)
 
(update model validation)
 
(15 intermediate revisions by 3 users not shown)
Line 5: Line 5:
 
{|style="background-color:#40403E; color:#FFA500; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#40403E; color:#FFA500; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#FFA500" width="10%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#FFA500" width="10%" |  
[[Image:TeamInsured_Home.png| 30px |link= ANLY482_AY2017-18T2_Group09 ]]  
+
[[Image:TeamInsured_Home.png| 20px |link= ANLY482_AY2017-18T2_Group09 ]]  
[[ANLY482_AY2017-18T2_Group09 |<font color="#F5F5F5" size=2><b>HOME</b></font>]]
+
[[ANLY482_AY2017-18T2_Group09 |<font color="#F5F5F5" size=1.5><b>HOME</b></font>]]
  
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;  
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;  
 
| style="padding:0.3em; font-size:100%; background-color:#FFA500;  border-bottom:0px solid #FFA500; text-align:center; color:#40403E" width="12%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#FFA500;  border-bottom:0px solid #FFA500; text-align:center; color:#40403E" width="12%" |  
[[Image:TeamInsured_About_Icon.png|30px|link=ANLY482 AY2017-18 Group9: Project Overview ]] &nbsp;
+
[[Image:TeamInsured_About_Icon.png|20px|link=ANLY482 AY2017-18 Group9: Project Overview ]] &nbsp;
[[ANLY482 AY2017-18 Group9: Project Overview|<font color="#F5F5F5" size=2><b>PROJECT OVERVIEW</b></font>]]
+
[[ANLY482 AY2017-18 Group9: Project Overview|<font color="#F5F5F5" size=1.5><b>PROJECT OVERVIEW</b></font>]]
  
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#40403E" width="12%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#40403E" width="12%" |  
[[Image:TeamInsured_Findings.png|30px|link=ANLY482 AY2017-18 Group9: Findings]] &nbsp;
+
[[Image:TeamInsured_Findings.png|20px|link=ANLY482 AY2017-18 Group9: Findings]] &nbsp;
[[ANLY482 AY2017-18 Group9: Project Findings |<font color="#F5F5F5" size=2><b>PROJECT FINDINGS</b></font>]]
+
[[ANLY482 AY2017-18 Group9: Project Findings |<font color="#F5F5F5" size=1.5><b>PROJECT FINDINGS</b></font>]]
  
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #3D9DD7; text-align:center; color:#40403E" width="15%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #3D9DD7; text-align:center; color:#40403E" width="15%" |  
[[Image:TeamInsured_PM.png|30px|link=ANLY482 AY2017-18 Group9: Project Management]] &nbsp;
+
[[Image:TeamInsured_PM.png|20px|link=ANLY482 AY2017-18 Group9: Project Management]] &nbsp;
[[ANLY482 AY2017-18 Group9: Project Management |<font color="#F5F5F5" size=2><b>PROJECT MANAGEMENT</b></font>]]
+
[[ANLY482 AY2017-18 Group9: Project Management |<font color="#F5F5F5" size=1.5><b>PROJECT MANAGEMENT</b></font>]]
  
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#40403E" width="10%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#40403E" width="10%" |  
[[Image:TeamInsured_Documentation.png|30px|link=ANLY482 AY2017-18 Group9: Documentation]] &nbsp;
+
[[Image:TeamInsured_Documentation.png|20px|link=ANLY482 AY2017-18 Group9: Documentation]] &nbsp;
[[ANLY482 AY2017-18 Group9: Documentation | <font color="#F5F5F5" size=2><b>DOCUMENTATION</b></font>]]
+
[[ANLY482 AY2017-18 Group9: Documentation | <font color="#F5F5F5" size=1.5><b>DOCUMENTATION</b></font>]]
 +
 
 +
| style="border-bottom:0px solid #40403E; background:none;" width="1%" | &nbsp;
 +
| style="padding:0.3em; font-size:100%; background-color:#40403E;  border-bottom:0px solid #40403E; text-align:center; color:#40403E" width="10%" |
 +
[[Image:TeamInsured_Documentation.png|20px|link=ANLY482_AY2017-18_Term_2]] &nbsp;
 +
[[ANLY482_AY2017-18_Term_2 | <font color="#F5F5F5" size=1.5><b>MAIN PAGE</b></font>]]
 
|}  
 
|}  
 
<!--Header End-->
 
<!--Header End-->
Line 40: Line 45:
  
 
<!--Body Start-->
 
<!--Body Start-->
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA COLLECTION</strong></font></div></div>==
+
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA COLLECTION / PREPARATION</strong></font></div></div>==
 +
Upon understanding the problems faced by KOI and coming up with potential solution for their problems, we requested a list of datasets that we will required to perform our analysis. In particular, we will target sales and wastage data to optimize the reorder inventory.
 +
<br><br>
 +
To facilitate our analysis, KOI has kindly provided our team with a data of 47 outlets with 1-2 years (from Jan 2016 - Dec 2017) worth of data collected. The client wish to focus on the latest business fiscal year, henceforth we will be provided with the latest data obtained. The type of data obtained are summarize in the table below.
 +
<br><br>
 +
<b>Data Summary</b>
 +
[[Image:DataSummary_Group9.PNG|left|450px|]]
 +
<br><br><br><br><br><br><br><br>
 +
<b>Delivery Data</b><br>
 +
A row in this table represents a specific delivery ordered by a branch. The detailed description of the main columns in this table are as follow:
 +
<br><br>
 +
[[Image:Deliverydata_group9.PNG|left|450px|]]
 +
 
 +
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 +
Additional columns have also been created to aid us in our analysis as follow:
 +
<br><br>
 +
[[Image:Deliverydata_group9_columnscreated.PNG|left|450px|]]
 +
<br><br>
 +
[[Image:Koinewcolumns.PNG|left|450px|]]
 +
<br><br><br><br><br><br><br><br><br><br><br>
 +
<br><br><br><br><br><br><br><br>
 +
<b>Promotion Data</b><br>
 +
A row in this table represent a specific promotional campaign held in a period. The detailed description of the main columns in this table are as follow:
 +
<br><br>
 +
[[Image:Promotiondata_group9.PNG|left|450px|]]
 +
<br><br><br><br><br><br><br><br><br>
 +
<!--
 +
<b>Outlet Data</b><br>
 +
A row in this table represents the KOI branch outlet information. The detailed description of the main columns in this table are as follow:
 +
<br><br>
 +
[[Image:Outletdata_group9.PNG|left|450px|]]
 +
-->
 +
<br><br><br><br><br><br><br><br><br>
 +
 
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>EXPLORATORY DATA ANALYSIS</strong></font></div></div>==
 +
Our main project aim is to optimize inventory reorder point with a 15-20% safety net surplus. Utilising EDA, we will identify differences between orders from 2016 and 2017, taking into factors such as launch of new outlets. Additionally, we will analyze reordering frequency per outlets to identify outlets with the most reorders in terms of quantity and number of orders. Henceforth, we would perform a cluster analysis on products with similar trends as well as be able to provide a business recommendation on outlets with similar reordering frequncy in similar regions.
 +
<br>
 +
 
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA CLEANING AND PREPARATION</strong></font></div></div>==
 +
<b>Data Cleaning</b><br>
 +
Our group have identified a number of <b><U>redundant columns</U></b> in Delivery Data which were deemed not important in our analysis, and have subsequently removed it. <br>
 +
These columns are <b><U>Time</U></b> (The time where the order is being placed by the staff) and <b><U>Type</U></b> (The different type of order being placed).<br>
 +
Next, there were presence of several overseas and closed branches in the delivery data as well, such as KOI TE (Thailand), Cambodia Karanak KOI Café, Jakarta Koi, KOI Myanmar, Vietnam and closed outlets which include Espalande, close date of May 2017, and these were deemed as <b><U>data noises</U></b>. In addition to removing these data,we removed year 2015 and 2018 data as well, as our analysis on 2 years analysis - 2016 and 2017.
 +
<br><br>
 +
<b>Data Preparation</b><br>
 +
As identified previously in data summary, we have identified <b><U>several new columns</U></b> created to aid in our analysis. Under delivery data, 5 new columns were actually created, mainly 'Created Date', 'Date Diff', 'Actual Data Diff', 'Product Category' and 'Include'.
 +
<br>
 +
<br>
 +
We have also created our own data based on the outlets of Koi, <b><U>Outlet Info</U></b>. This data consists of segment and regions, segments to group outlets of similar reordering frequency and region where outlet is located in Singapore. With this, we will be able to provide a better business recommendation to our sponsor regarding deliveries.
 
<br>
 
<br>
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA PREPARATION</strong></font></div></div>==
 
 
<br>
 
<br>
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>EXPLORATORY DATA ANALYSIS</strong></font></div></div>==
+
Under reordering frequency, we could clearly identify KOI HQ and KOI PLUS as <b><U>outliers</U></b>, and have thus decided to remove these outlets. For KOI HQ as it is not a customer facing branch, we have deemed it insignificant for our analysis. Secondly, as consulted with our sponsor, we have confirmed that KOI PLUS relies on KOI Singapura for its goods ordering. Furthermore, our results also show that the restock products of KOI Plus are mainly non-perishable goods such as cleaning supplies, cups and seal. Henceforth, the reorder frequency for KOI PLUS tends to stretch for a long period before their next order.
 
<br>
 
<br>
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA CLEANING</strong></font></div></div>==
 
 
<br>
 
<br>
 +
Last but not least, undergoing our preliminary EDA, we have identified <b><U>several items which have been discontinued</U></b> and thus, decide to remove it from our dataset as it is considered insignificant for future predictions.
 +
The discontinued ingredients are: mango syrup, mango sticker, passion fruit syrup, passion fruit sticker and coffee mate 1kg. 
 +
<!--
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA NORMALISATION & TRANSFORMATION</strong></font></div></div>==
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>DATA NORMALISATION & TRANSFORMATION</strong></font></div></div>==
 
<br>
 
<br>
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>CLUSTER ANALYSIS</strong></font></div></div>==
 
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>CLUSTER ANALYSIS</strong></font></div></div>==
 
<br>
 
<br>
 +
-->
  
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>MODEL SELECTION</strong></font></div></div>==
 +
Next, <b><U>multilinear regression analysis</U></b> will be carried out to determine the significant variables affecting the quantity to be ordered for each product and store. This will be done on specific products deemed to have a time-series pattern. <b><U>Monte Carlo simulations</U></b> will then be carried with these significant dependent variables to determine how much quantity employees should order by adjusting the values of dependent variables.
 +
<br>
 +
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>MODEL VALIDATION</strong></font></div></div>==
 +
To validate our model, we will be separating our data into three different data sets, testing, training and validation. The last month of the data (Dec 2017) will be used as our testing data set, while the second last month (Nov 2017) will be used as our validation data set and the rest of the data will be our training data used for learning patterns based on supervised learning techniques to measure our outcome – reorder quantity.
 +
<br>
 
<!--Body End-->
 
<!--Body End-->

Latest revision as of 00:19, 15 April 2018

Fablogo.png

TeamInsured Home.png   HOME

 

TeamInsured About Icon.png   PROJECT OVERVIEW

 

TeamInsured Findings.png   PROJECT FINDINGS

 

TeamInsured PM.png   PROJECT MANAGEMENT

 

TeamInsured Documentation.png   DOCUMENTATION

 

TeamInsured Documentation.png   MAIN PAGE

DATA COLLECTION / PREPARATION

Upon understanding the problems faced by KOI and coming up with potential solution for their problems, we requested a list of datasets that we will required to perform our analysis. In particular, we will target sales and wastage data to optimize the reorder inventory.

To facilitate our analysis, KOI has kindly provided our team with a data of 47 outlets with 1-2 years (from Jan 2016 - Dec 2017) worth of data collected. The client wish to focus on the latest business fiscal year, henceforth we will be provided with the latest data obtained. The type of data obtained are summarize in the table below.

Data Summary

DataSummary Group9.PNG









Delivery Data
A row in this table represents a specific delivery ordered by a branch. The detailed description of the main columns in this table are as follow:

Deliverydata group9.PNG
















Additional columns have also been created to aid us in our analysis as follow:

Deliverydata group9 columnscreated.PNG



Koinewcolumns.PNG




















Promotion Data
A row in this table represent a specific promotional campaign held in a period. The detailed description of the main columns in this table are as follow:

Promotiondata group9.PNG



















EXPLORATORY DATA ANALYSIS

Our main project aim is to optimize inventory reorder point with a 15-20% safety net surplus. Utilising EDA, we will identify differences between orders from 2016 and 2017, taking into factors such as launch of new outlets. Additionally, we will analyze reordering frequency per outlets to identify outlets with the most reorders in terms of quantity and number of orders. Henceforth, we would perform a cluster analysis on products with similar trends as well as be able to provide a business recommendation on outlets with similar reordering frequncy in similar regions.

DATA CLEANING AND PREPARATION

Data Cleaning
Our group have identified a number of redundant columns in Delivery Data which were deemed not important in our analysis, and have subsequently removed it.
These columns are Time (The time where the order is being placed by the staff) and Type (The different type of order being placed).
Next, there were presence of several overseas and closed branches in the delivery data as well, such as KOI TE (Thailand), Cambodia Karanak KOI Café, Jakarta Koi, KOI Myanmar, Vietnam and closed outlets which include Espalande, close date of May 2017, and these were deemed as data noises. In addition to removing these data,we removed year 2015 and 2018 data as well, as our analysis on 2 years analysis - 2016 and 2017.

Data Preparation
As identified previously in data summary, we have identified several new columns created to aid in our analysis. Under delivery data, 5 new columns were actually created, mainly 'Created Date', 'Date Diff', 'Actual Data Diff', 'Product Category' and 'Include'.

We have also created our own data based on the outlets of Koi, Outlet Info. This data consists of segment and regions, segments to group outlets of similar reordering frequency and region where outlet is located in Singapore. With this, we will be able to provide a better business recommendation to our sponsor regarding deliveries.

Under reordering frequency, we could clearly identify KOI HQ and KOI PLUS as outliers, and have thus decided to remove these outlets. For KOI HQ as it is not a customer facing branch, we have deemed it insignificant for our analysis. Secondly, as consulted with our sponsor, we have confirmed that KOI PLUS relies on KOI Singapura for its goods ordering. Furthermore, our results also show that the restock products of KOI Plus are mainly non-perishable goods such as cleaning supplies, cups and seal. Henceforth, the reorder frequency for KOI PLUS tends to stretch for a long period before their next order.

Last but not least, undergoing our preliminary EDA, we have identified several items which have been discontinued and thus, decide to remove it from our dataset as it is considered insignificant for future predictions. The discontinued ingredients are: mango syrup, mango sticker, passion fruit syrup, passion fruit sticker and coffee mate 1kg.

MODEL SELECTION

Next, multilinear regression analysis will be carried out to determine the significant variables affecting the quantity to be ordered for each product and store. This will be done on specific products deemed to have a time-series pattern. Monte Carlo simulations will then be carried with these significant dependent variables to determine how much quantity employees should order by adjusting the values of dependent variables.

MODEL VALIDATION

To validate our model, we will be separating our data into three different data sets, testing, training and validation. The last month of the data (Dec 2017) will be used as our testing data set, while the second last month (Nov 2017) will be used as our validation data set and the rest of the data will be our training data used for learning patterns based on supervised learning techniques to measure our outcome – reorder quantity.