Difference between revisions of "ANLY482 AY2017-18 T1 Group1/Final Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 76: Line 76:
  
 
For example, our client had to produce multiple static bar chart diagrams in Excel to show two views: one is an overall outlook on the manufacturing costs across different plants, and one is to get a deeper dive into specific plants’ manufacturing cost components. This process is time-consuming as it requires our sponsor to create many different calculations and data tables for different levels of cost comparison (e.g. two data tables are required to display the two bar chart diagrams shown above). The static diagrams also do not allow users to explore different levels of cost comparison on demand.
 
For example, our client had to produce multiple static bar chart diagrams in Excel to show two views: one is an overall outlook on the manufacturing costs across different plants, and one is to get a deeper dive into specific plants’ manufacturing cost components. This process is time-consuming as it requires our sponsor to create many different calculations and data tables for different levels of cost comparison (e.g. two data tables are required to display the two bar chart diagrams shown above). The static diagrams also do not allow users to explore different levels of cost comparison on demand.
 +
 +
For our project, twelve data files were required to provide complete information on the different costs involved for each SKU, from the manufacturing plant all the way to the distribution centres. As we aim to automate the process of data cleaning, data harmonization and master table generation, a lot of data manipulation is required for our project. We also initially planned to produce a static report of costs statistical analysis on top of the Tableau dashboard, thus we decided to use R as it has more libraries available for data cleaning, data table manipulation and statistical analysis.
 +
 +
'''Group's Proposed Approach'''<br>
 +
At the end of the project, we propose that the process be automated in four phases, with each phase being performed by one R code file. (Unless stated otherwise, all functions mentioned in this paper are from the dplyr library in R.)
 +
 +
A. Merge Data Files
 +
First, we use a self-defined function to standardize the column types and names of the sixty-three BOM files we have received for a product line. Then, using the bind_rows function, we will merge the sixty-three files into a single file, before using write_xlsx to save the newly combined data table into an .xlsx file. (refer to Figure 3)
 +
 +
[[File:Figure 3 (JYC).PNG]]
 +
''Figure 3. Usage of self-defined function f1, bind_rows from dplyr, and write_xlsx from writexl.'''
 +
 
</font></div>
 
</font></div>
 
<br>
 
<br>

Revision as of 02:11, 4 December 2017

Home Team Project Overview Midterm Progress Final Progress Project Management Documentation


Item No.Final Deliverables
01 Conference Paper (submitted on eLearn)
02 Presentation Poster (submitted on eLearn)
03 Project Artefacts (submitted on eLearn)


Abstract

Supply chain management is a complex process that is filled with information that can be mined using data analytics techniques. From procurement of raw materials to manufacturing the different compositions of a finished good to distribution and sales, there are a lot of information that a company can use to improve the efficiency of their supply chain. This can be done by uncovering discrepancies in the costs in procurement and distribution or creating demand forecasting models to anticipate future demand for a finished good.

This is the first iteration of a data project our sponsor has undertaken and thus the objective will be to source out data issues and create an overview of our client's supply chain. The data and insights obtained can then be used in the future for more complex analysis. For the project, we will be utilizing Python and R and respective data manipulation packages (Pandas, dplyr, etc.) to combine the client's disparate excel files into a single master table. Finally, Tableau will be used to visualize cost comparisons of the constituents of a product throughout its manufacturing and distribution.


Literature Review

The Supply Chain Operations Reference (SCOR) model developed by the Supply Chain Council (APICS, 2017) is a good framework for conducting analytics with J&J’s supply chain.

Figure 1 (JYC).PNG
Figure 1. Analytics techniques used in supply chains

As seen in the table above by Souza, 2014, we will first approach our sponsor's business problem by doing descriptive analytics via mapping and visualization of the constituent costs of their products. Since this is the first iteration of the project, doing descriptive analytics will give a good overview of the sponsor's supply chain and expose any weaknesses in their data. Thus, we aim to build a good foundation on which future projects can use to further build predictive and prescriptive model so that our sponsor can use it to make strategic and tactical decisions on how to source, make and deliver as seen in the SCOR model.


Project Overview

Sponsor's Current Approach
Currently, our sponsor spends a lot of time on manual data cleaning and harmonization across multiple Excel files to reach the insights they want about their products.

Figure 2 (JYC).png
Figure 2. Sample product analysis output generated by the sponsor

For example, our client had to produce multiple static bar chart diagrams in Excel to show two views: one is an overall outlook on the manufacturing costs across different plants, and one is to get a deeper dive into specific plants’ manufacturing cost components. This process is time-consuming as it requires our sponsor to create many different calculations and data tables for different levels of cost comparison (e.g. two data tables are required to display the two bar chart diagrams shown above). The static diagrams also do not allow users to explore different levels of cost comparison on demand.

For our project, twelve data files were required to provide complete information on the different costs involved for each SKU, from the manufacturing plant all the way to the distribution centres. As we aim to automate the process of data cleaning, data harmonization and master table generation, a lot of data manipulation is required for our project. We also initially planned to produce a static report of costs statistical analysis on top of the Tableau dashboard, thus we decided to use R as it has more libraries available for data cleaning, data table manipulation and statistical analysis.

Group's Proposed Approach
At the end of the project, we propose that the process be automated in four phases, with each phase being performed by one R code file. (Unless stated otherwise, all functions mentioned in this paper are from the dplyr library in R.)

A. Merge Data Files First, we use a self-defined function to standardize the column types and names of the sixty-three BOM files we have received for a product line. Then, using the bind_rows function, we will merge the sixty-three files into a single file, before using write_xlsx to save the newly combined data table into an .xlsx file. (refer to Figure 3)

Figure 3 (JYC).PNG Figure 3. Usage of self-defined function f1, bind_rows from dplyr, and write_xlsx from writexl.'