Difference between revisions of "ANLY482 AY2017-18 T1 Group1/Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
Line 77: Line 77:
  
 
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:16px; font-family:Arial"><font color= #000000>
 
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:16px; font-family:Arial"><font color= #000000>
3.1 Objectives
+
Objectives
  
 
To aid in their roadmap to optimize their supply chain, our team will be developing a proof-of-concept model to aid in identifying redundant costs.
 
To aid in their roadmap to optimize their supply chain, our team will be developing a proof-of-concept model to aid in identifying redundant costs.
Line 91: Line 91:
  
  
3.2 Analysis Methods: Exploratory Analysis, logistic regression
+
Analysis Methods: Exploratory Analysis, logistic regression
  
 
Exploratory analysis: We will be doing a horizontal analysis to understand what is the ratio of each cost component in a particular unit of finished good across different locations.
 
Exploratory analysis: We will be doing a horizontal analysis to understand what is the ratio of each cost component in a particular unit of finished good across different locations.
Line 105: Line 105:
  
 
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:16px; font-family:Arial"><font color= #000000>
 
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:16px; font-family:Arial"><font color= #000000>
4.1 Data Gathering and Scoping
+
Data Gathering and Scoping
 
Our sponsor has given us the data in Excel file format for the product family Johnson’s Baby Oil in the year 2017. Geographically, the data involves three manufacturing plants in China, Thailand and India, and multiple distribution centres across over ten cities. From the supply chain perspective, the project will investigate the total costs associated with a finished good from the manufacturing plant to the wholesalers.
 
Our sponsor has given us the data in Excel file format for the product family Johnson’s Baby Oil in the year 2017. Geographically, the data involves three manufacturing plants in China, Thailand and India, and multiple distribution centres across over ten cities. From the supply chain perspective, the project will investigate the total costs associated with a finished good from the manufacturing plant to the wholesalers.
 
   
 
   
4.2 Research on Tools and Proposal Preparation
+
Research on Tools and Proposal Preparation
 
Our team has researched and discussed with the sponsor regarding the tools used in analyzing and presenting the data.
 
Our team has researched and discussed with the sponsor regarding the tools used in analyzing and presenting the data.
  
Line 115: Line 115:
 
We will also research R libraries to find suitable methods that perform analysis and data wrangling.
 
We will also research R libraries to find suitable methods that perform analysis and data wrangling.
  
4.3 Data Wrangling and Restructuring
+
Data Wrangling and Restructuring
  
 
The team wants to perform horizontal analysis on the cost of one material across the three locations. However, the same material have different material numbers in different manufacturing location. Thus, we can only compare based on the material’s textual description, which are recorded differently in the different locations. For example, a 500ml Johnson’s Baby Oil is recorded as “JBOil Reg 500ml (KR0110) COSMOS” in the Thailand Bill of Materials (BOM) file, but is recorded differently as “JB Oil 500ml Monsoon - B” in the India BOM file.
 
The team wants to perform horizontal analysis on the cost of one material across the three locations. However, the same material have different material numbers in different manufacturing location. Thus, we can only compare based on the material’s textual description, which are recorded differently in the different locations. For example, a 500ml Johnson’s Baby Oil is recorded as “JBOil Reg 500ml (KR0110) COSMOS” in the Thailand Bill of Materials (BOM) file, but is recorded differently as “JB Oil 500ml Monsoon - B” in the India BOM file.

Revision as of 19:35, 26 August 2017

Home Team Project Overview Midterm Progress Final Progress Project Management Documentation


Motivation

Our group comprises of 2 business students and 1 accountancy student, all having a second major in analytics. As final year students, we realise that there is a need to familiarise ourselves with analytics in the working world. This module serves as an opportunity for us to undergo a project with XYZ which would give us more experience on a real-life working world situation. These hands on experience are invaluable that cannot be taught in a classroom environment. We hope this internship-like experience with XYZ will prepare us for our jobs in the future.


Objective & Goals

XYZ wants to increase profits by reducing their costs. However, the company is plagued by data problems which makes the task of optimizing their supply chain very difficult.

Their data are stored in many locations and are often transferred using excel files. This results in inconsistency and incoherence in their data sources which makes analysis extremely time-consuming. On average, the company takes 30 days to consolidate the data on excel and produce a report.

There is also no standardized way of labelling certain items. For example, raw materials such as bottles are labelled as ‘botl’ or ‘btl’ in other regions. Thus, consolidating the data requires manual override of the labels in excel, prolonging the process unnecessarily.

To aid in their roadmap to optimize their supply chain, our team will be developing a proof-of-concept model to aid in identifying redundant costs. We will begin this by analyzing the supply chain data for a product (e.g. baby oil). This product has relatively fewer levels of abstraction which makes modelling easier. The model can then be scaled to add additional levels for more complex products.

We will identify factors that results in the cost complexity such as:

  • Freight cost
  • Regional labour cost
  • Raw material cost
  • Distribution cost
  • Storage cost
  • Tax laws


Provided Data

It comes in the form of 14 Excel sheets, notably the Bill of Materials (BOM) files for the manufacturing plants in China, India and Thailand. The BOM composes of the individual components used to XYZ Baby oil as well as the cost. The cost include: Raw & Pack, Packaging, Labour, O/H direct. O/H indirect, depreciation, freight, duty, other PFG, toll fee, IC Mark up, Other OH). Other data also include plant description, material master as well as procurement information records.

The data provided is mainly in string and numerical format.


Methodology

Objectives

To aid in their roadmap to optimize their supply chain, our team will be developing a proof-of-concept model to aid in identifying redundant costs.

We will begin this by analyzing the supply chain data for a product (e.g. baby oil). This product has relatively fewer levels of abstraction which makes modelling easier. The model can then be scaled to add additional levels for more complex products.

We will identify factors that results in the cost complexity such as: Freight cost Labour cost Overhead cost Depreciation cost Raw material cost


Analysis Methods: Exploratory Analysis, logistic regression

Exploratory analysis: We will be doing a horizontal analysis to understand what is the ratio of each cost component in a particular unit of finished good across different locations.

Correlation: We will be doing a correlation analysis to identify variables that correlate with one another. We will also be identifying the variables that affect the total cost of goods produce which affects the total gross profit, such as the correlation between different cost components and the finished good cost. We will also be checking for multicollinearity between different cost components for complementary materials.

Regression: We will see if the total cost to produce a finished good can be expressed as a linear/polynomial equation of certain influential cost components. This is to identify which materials are the bottlenecks as well as weightage importance.


Project Scope

Data Gathering and Scoping Our sponsor has given us the data in Excel file format for the product family Johnson’s Baby Oil in the year 2017. Geographically, the data involves three manufacturing plants in China, Thailand and India, and multiple distribution centres across over ten cities. From the supply chain perspective, the project will investigate the total costs associated with a finished good from the manufacturing plant to the wholesalers.

Research on Tools and Proposal Preparation Our team has researched and discussed with the sponsor regarding the tools used in analyzing and presenting the data.

We will be using JMP Pro to perform exploratory data analysis. For visualization, we will be presenting our analytical results and dashboard using Tableau as J&J is comfortable with using this software. It also fulfills their requirement of a dashboard that is highly interactive and allows them to drill down from the overview to the root causes.

We will also research R libraries to find suitable methods that perform analysis and data wrangling.

Data Wrangling and Restructuring

The team wants to perform horizontal analysis on the cost of one material across the three locations. However, the same material have different material numbers in different manufacturing location. Thus, we can only compare based on the material’s textual description, which are recorded differently in the different locations. For example, a 500ml Johnson’s Baby Oil is recorded as “JBOil Reg 500ml (KR0110) COSMOS” in the Thailand Bill of Materials (BOM) file, but is recorded differently as “JB Oil 500ml Monsoon - B” in the India BOM file.

Therefore, the team will need to reconcile these name differences, either through manual reparsing or by using scripts that perform approximate string matching. Our preliminary research points us towards the R library stringdist, which can match two approximately similar strings based on their string distance. (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf)

The team will also use R to write a script that will extract and join the relevant fields from the different Excel files into a consolidated spreadsheet, which will then be used as input for analysis and report generation.