Difference between revisions of "ANLY482 AY2017-18T2 Group10 Project Overview: Data"
Line 55: | Line 55: | ||
It is noted that the sponsor only performs stock taking once a month and hence only monthly inventory data is available. This means that there are no daily or weekly stock levels recorded. However, the objective of the project is to be able to forecast the demand for each ingredient on a daily basis. The proposed idea is to first forecast the demand for a particular month. Subsequently, we will use the PLU data which contains the number of pax daily to breakdown the forecasted monthly usage into a daily value. | It is noted that the sponsor only performs stock taking once a month and hence only monthly inventory data is available. This means that there are no daily or weekly stock levels recorded. However, the objective of the project is to be able to forecast the demand for each ingredient on a daily basis. The proposed idea is to first forecast the demand for a particular month. Subsequently, we will use the PLU data which contains the number of pax daily to breakdown the forecasted monthly usage into a daily value. | ||
− | |||
− | |||
</div> | </div> | ||
− | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe; margin-bottom:5px"><font color= #000000><strong> | + | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe; margin-bottom:5px"><font color= #000000><strong>Purchase Data Preparation Process</strong></font></div>== |
− | |||
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | ||
− | |||
− | + | [[File:Tennet inventory processing diagram.png|border|center|600px]]] | |
− | |||
</div> | </div> | ||
Line 79: | Line 74: | ||
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | ||
+ | Sales data refers to the daily sales that XYZ Company everyday for each outlet broken down to various categories, like Card, Nets, cash, etc. For the sales data, we have the data between December 2015 to November 2017. </br> | ||
+ | |||
+ | Since the sales data is relatively clean and ready to be analysed, there is very little to do. Our group only prepared the data by adding another calculated column called “revenue”, which is the sum of column “nett_sales” and “service”, and we concatenated all 4 excel files into one single file. Using Python scripts, we standardised the column names and compiled them into a giant CSV data file. | ||
+ | </div> | ||
+ | |||
+ | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe; margin-bottom:5px"><font color= #000000><strong>PLU Data Preparation Process</strong></font></div>== | ||
+ | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | ||
+ | Daily PLU refers to the number of patrons that each outlet has daily, in various categories (Adult, Child, Student, Tourist, Senior and FOC Pax) There were several issues with the data. Firstly, all the days are in the same row, which makes it hard to analyse the data. In addition, there is a column “Dept_Type2” in the sample data which is irrelevant to us, for example “Soup Base”. According to our sponsor, we are only interested in 6 categories (Adult, Student, Senior, Tourist, Child and FOC PAX). There are also many different rows of the same category due to the irrelevant description. For example, an adults from an outlet can have different descriptions, such as “Wkend Adult Dinner Buffet” or ‘Wkday Adult Dinner Buffet”, since this is irrelevant to us in the analysis, we would be grouping all the categories into one row for each outlet. After, cleaning all the data issues, we concatenate the various Daily PLU data into one file to make it easier to analyse the daily patrons. | ||
+ | |||
+ | [[File:Tennet DailyPLU Cleaning diagram.png|border|center|300px]]] | ||
</div> | </div> | ||
<!-- End Body ---> | <!-- End Body ---> |
Revision as of 16:52, 25 February 2018
Due to confidentiality, we will not be able to upload any charts onto the wiki. The fully disclosed analysis report is available in our Interim Report submission.
Contents
Data Overview
The data provided by the sponsor is in Microsoft Excel format for each outlet by month. The team has used Python for the cleaning and preparation of the data. For now, they have provided the data for a total of 24 months from Dec 2015 to Dec 2017. The data that was given to us are Inventory Data, Monthly PLU (Programmable Logic Unit) and Sales Data. One limitation is that the company has recently changed the format of the inventory data, and thus we would be working with 2 different formats of inventory data. Below is a short description of the each dataset:
Dataset Name | Dataset Description |
---|---|
Inventory | Describes the inventory order for each outlet the data is updated daily. |
Sales | Describes the sales for each outlet for each month daily. |
MonthPLU | Describes the number of patrons for each outlet according to the type of meal daily. |
It is noted that the sponsor only performs stock taking once a month and hence only monthly inventory data is available. This means that there are no daily or weekly stock levels recorded. However, the objective of the project is to be able to forecast the demand for each ingredient on a daily basis. The proposed idea is to first forecast the demand for a particular month. Subsequently, we will use the PLU data which contains the number of pax daily to breakdown the forecasted monthly usage into a daily value.
Purchase Data Preparation Process
Inventory Data Preparation Process
As there was a change in format of the Monthly Inventory Data from October 2017 onwards, there are two main different types of formats for the Monthly Inventory Data. The two different formats have different column names and different number of columns. Hence, to perform our analysis and EDA, we had to process the two formats separately. Using Python scripts, we extracted the necessary columns from each file, standardised the column names and compiled them into a giant CSV data file - ‘Inventory_Processed_2016-2017.csv’
]Sales Data Preparation Process
Sales data refers to the daily sales that XYZ Company everyday for each outlet broken down to various categories, like Card, Nets, cash, etc. For the sales data, we have the data between December 2015 to November 2017.
Since the sales data is relatively clean and ready to be analysed, there is very little to do. Our group only prepared the data by adding another calculated column called “revenue”, which is the sum of column “nett_sales” and “service”, and we concatenated all 4 excel files into one single file. Using Python scripts, we standardised the column names and compiled them into a giant CSV data file.
PLU Data Preparation Process
Daily PLU refers to the number of patrons that each outlet has daily, in various categories (Adult, Child, Student, Tourist, Senior and FOC Pax) There were several issues with the data. Firstly, all the days are in the same row, which makes it hard to analyse the data. In addition, there is a column “Dept_Type2” in the sample data which is irrelevant to us, for example “Soup Base”. According to our sponsor, we are only interested in 6 categories (Adult, Student, Senior, Tourist, Child and FOC PAX). There are also many different rows of the same category due to the irrelevant description. For example, an adults from an outlet can have different descriptions, such as “Wkend Adult Dinner Buffet” or ‘Wkday Adult Dinner Buffet”, since this is irrelevant to us in the analysis, we would be grouping all the categories into one row for each outlet. After, cleaning all the data issues, we concatenate the various Daily PLU data into one file to make it easier to analyse the daily patrons.
]