Difference between revisions of "ANLY482 AY2017-18 T1 Group1/Final Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 86: Line 86:
  
 
[[File:Figure 3 (JYC).PNG]]<br>
 
[[File:Figure 3 (JYC).PNG]]<br>
''Figure 3. Usage of self-defined function f1, bind_rows from dplyr, and write_xlsx from writexl.'''
+
''Figure 3. Usage of self-defined function f1, bind_rows from dplyr, and write_xlsx from writexl.''
  
 
'''B. Clean Data Files'''<br>
 
'''B. Clean Data Files'''<br>
Line 92: Line 92:
  
 
[[File:Figure 4 (JYC).PNG]]<br>
 
[[File:Figure 4 (JYC).PNG]]<br>
''Figure 4. Usage of multiple functions in dplyr library to perform data cleaning.'''
+
''Figure 4. Usage of multiple functions in dplyr library to perform data cleaning.''
 +
 
 +
'''C. Reshape Data Table for Easier Manipulation'''<br>
 +
We used the gather function to transpose twenty-one cost type columns into a pair of key-value columns. By reshaping the data table, we can more easily apply calculations to standardize costs across different currencies into US Dollars (USD). The reshaped data table is also more desirable as an input for our Tableau dashboard, because we can easily sum up multiple components that are related to one finished good without having to pivot the data table in Tableau's import settings.
 +
 
 +
 
 +
 
 +
[[File:Figure 5 (JYC).PNG]]<br>
 +
''Figure 5. Usage of gather function to reshape the data table.''<br><br>
 +
 
 +
[[File:Figure 6 (JYC).PNG]]<br>
 +
''Figure 6. Data table structure before transposition.''<br><br>
 +
 
 +
[[File:Figure 7 (JYC).PNG]]<br>
 +
''Figure 7. A leaner data table structure after transposition.'''
 +
 
 
</font></div>
 
</font></div>
 
<br>
 
<br>

Revision as of 03:18, 4 December 2017

Home Team Project Overview Midterm Progress Final Progress Project Management Documentation


Item No.Final Deliverables
01 Conference Paper (submitted on eLearn)
02 Presentation Poster (submitted on eLearn)
03 Project Artefacts (submitted on eLearn)


Abstract

Supply chain management is a complex process that is filled with information that can be mined using data analytics techniques. From procurement of raw materials to manufacturing the different compositions of a finished good to distribution and sales, there are a lot of information that a company can use to improve the efficiency of their supply chain. This can be done by uncovering discrepancies in the costs in procurement and distribution or creating demand forecasting models to anticipate future demand for a finished good.

This is the first iteration of a data project our sponsor has undertaken and thus the objective will be to source out data issues and create an overview of our client's supply chain. The data and insights obtained can then be used in the future for more complex analysis. For the project, we will be utilizing Python and R and respective data manipulation packages (Pandas, dplyr, etc.) to combine the client's disparate excel files into a single master table. Finally, Tableau will be used to visualize cost comparisons of the constituents of a product throughout its manufacturing and distribution.


Literature Review

The Supply Chain Operations Reference (SCOR) model developed by the Supply Chain Council (APICS, 2017) is a good framework for conducting analytics with J&J’s supply chain.

Figure 1 (JYC).PNG
Figure 1. Analytics techniques used in supply chains

As seen in the table above by Souza, 2014, we will first approach our sponsor's business problem by doing descriptive analytics via mapping and visualization of the constituent costs of their products. Since this is the first iteration of the project, doing descriptive analytics will give a good overview of the sponsor's supply chain and expose any weaknesses in their data. Thus, we aim to build a good foundation on which future projects can use to further build predictive and prescriptive model so that our sponsor can use it to make strategic and tactical decisions on how to source, make and deliver as seen in the SCOR model.


Project Overview

Sponsor's Current Approach
Currently, our sponsor spends a lot of time on manual data cleaning and harmonization across multiple Excel files to reach the insights they want about their products.

Figure 2 (JYC).png
Figure 2. Sample product analysis output generated by the sponsor

For example, our client had to produce multiple static bar chart diagrams in Excel to show two views: one is an overall outlook on the manufacturing costs across different plants, and one is to get a deeper dive into specific plants’ manufacturing cost components. This process is time-consuming as it requires our sponsor to create many different calculations and data tables for different levels of cost comparison (e.g. two data tables are required to display the two bar chart diagrams shown above). The static diagrams also do not allow users to explore different levels of cost comparison on demand.

For our project, twelve data files were required to provide complete information on the different costs involved for each SKU, from the manufacturing plant all the way to the distribution centres. As we aim to automate the process of data cleaning, data harmonization and master table generation, a lot of data manipulation is required for our project. We also initially planned to produce a static report of costs statistical analysis on top of the Tableau dashboard, thus we decided to use R as it has more libraries available for data cleaning, data table manipulation and statistical analysis.

Group's Proposed Approach
At the end of the project, we propose that the process be automated in four phases, with each phase being performed by one R code file. (Unless stated otherwise, all functions mentioned in this paper are from the dplyr library in R.)

A. Merge Data Files
First, we use a self-defined function to standardize the column types and names of the sixty-three BOM files we have received for a product line. Then, using the bind_rows function, we will merge the sixty-three files into a single file, before using write_xlsx function from the writexl library to save the newly combined data table into an .xlsx file. (refer to Figure 3)

Figure 3 (JYC).PNG
Figure 3. Usage of self-defined function f1, bind_rows from dplyr, and write_xlsx from writexl.

B. Clean Data Files
For data cleaning, we will first extract all relevant columns from the raw files using the subset function. Then, depending on the data, we will use either the mutate function to change/replace values, filter function to extract rows that fulfill certain conditions, or the rename function to standardize column names across different files. In some cases where we only want to store the substring of a value, we will also use the str_extract function from the stringr library.

Figure 4 (JYC).PNG
Figure 4. Usage of multiple functions in dplyr library to perform data cleaning.

C. Reshape Data Table for Easier Manipulation
We used the gather function to transpose twenty-one cost type columns into a pair of key-value columns. By reshaping the data table, we can more easily apply calculations to standardize costs across different currencies into US Dollars (USD). The reshaped data table is also more desirable as an input for our Tableau dashboard, because we can easily sum up multiple components that are related to one finished good without having to pivot the data table in Tableau's import settings.


Figure 5 (JYC).PNG
Figure 5. Usage of gather function to reshape the data table.

Figure 6 (JYC).PNG
Figure 6. Data table structure before transposition.

Figure 7 (JYC).PNG
Figure 7. A leaner data table structure after transposition.'