ANLY482 AY2017-18 Group9: Project Overview/ Methodology

From Analytics Practicum
Jump to navigation Jump to search
Fablogo.png

TeamInsured Home.png   HOME

 

TeamInsured About Icon.png   PROJECT OVERVIEW

 

TeamInsured Findings.png   PROJECT FINDINGS

 

TeamInsured PM.png   PROJECT MANAGEMENT

 

TeamInsured Documentation.png   DOCUMENTATION

 

TeamInsured Documentation.png   MAIN PAGE

DATA COLLECTION / PREPARATION

Upon understanding the problems faced by KOI and coming up with potential solution for their problems, we requested a list of datasets that we will required to perform our analysis. In particular, we will target sales and wastage data to optimize the reorder inventory.

To facilitate our analysis, KOI has kindly provided our team with a data of 47 outlets with 1-2 years (from Jan 2016 - Dec 2017) worth of data collected. The client wish to focus on the latest business fiscal year, henceforth we will be provided with the latest data obtained. The type of data obtained are summarize in the table below.

Data Summary

DataSummary Group9.PNG









Delivery Data
A row in this table represents a specific delivery ordered by a branch. The detailed description of the main columns in this table are as follow:

Deliverydata group9.PNG
















Additional columns have also been created to aid us in our analysis as follow:

Deliverydata group9 columnscreated.PNG



Koinewcolumns.PNG




















Promotion Data
A row in this table represent a specific promotional campaign held in a period. The detailed description of the main columns in this table are as follow:

Promotiondata group9.PNG



















EXPLORATORY DATA ANALYSIS

Our main project aim is to optimize inventory reorder point with a 15-20% safety net surplus. Utilising EDA, we will identify differences between orders from 2016 and 2017, taking into factors such as launch of new outlets. Additionally, we will analyze reordering frequency per outlets to identify outlets with the most reorders in terms of quantity and number of orders. Henceforth, we would perform a cluster analysis on products with similar trends as well as be able to provide a business recommendation on outlets with similar reordering frequncy in similar regions.

DATA CLEANING AND PREPARATION

Data Cleaning
Our group have identified a number of redundant columns in Delivery Data which were deemed not important in our analysis, and have subsequently removed it.
These columns are Time (The time where the order is being placed by the staff) and Type (The different type of order being placed).
Next, there were presence of several overseas and closed branches in the delivery data as well, such as KOI TE (Thailand), Cambodia Karanak KOI Café, Jakarta Koi, KOI Myanmar, Vietnam and closed outlets which include Espalande, close date of May 2017, and these were deemed as data noises. In addition to removing these data,we removed year 2015 and 2018 data as well, as our analysis on 2 years analysis - 2016 and 2017.

Data Preparation
As identified previously in data summary, we have identified several new columns created to aid in our analysis. Under delivery data, 5 new columns were actually created, mainly 'Created Date', 'Date Diff', 'Actual Data Diff', 'Product Category' and 'Include'.

We have also created our own data based on the outlets of Koi, Outlet Info. This data consists of segment and regions, segments to group outlets of similar reordering frequency and region where outlet is located in Singapore. With this, we will be able to provide a better business recommendation to our sponsor regarding deliveries.

Under reordering frequency, we could clearly identify KOI HQ and KOI PLUS as outliers, and have thus decided to remove these outlets. For KOI HQ as it is not a customer facing branch, we have deemed it insignificant for our analysis. Secondly, as consulted with our sponsor, we have confirmed that KOI PLUS relies on KOI Singapura for its goods ordering. Furthermore, our results also show that the restock products of KOI Plus are mainly non-perishable goods such as cleaning supplies, cups and seal. Henceforth, the reorder frequency for KOI PLUS tends to stretch for a long period before their next order.

Last but not least, undergoing our preliminary EDA, we have identified several items which have been discontinued and thus, decide to remove it from our dataset as it is considered insignificant for future predictions. The discontinued ingredients are: mango syrup, mango sticker, passion fruit syrup, passion fruit sticker and coffee mate 1kg.

MODEL SELECTION

Next, multilinear regression analysis will be carried out to determine the significant variables affecting the quantity to be ordered for each product and store. This will be done on specific products deemed to have a time-series pattern. Monte Carlo simulations will then be carried with these significant dependent variables to determine how much quantity employees should order by adjusting the values of dependent variables.

MODEL VALIDATION

To validate our model, we will be separating our data into three different data sets, testing, training and validation. The last month of the data (Dec 2017) will be used as our testing data set, while the second last month (Nov 2017) will be used as our validation data set and the rest of the data will be our training data used for learning patterns based on supervised learning techniques to measure our outcome – reorder quantity.