Difference between revisions of "ANLY482 AY2017-18T2 Group03 Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
Line 21: Line 21:
 
|}  
 
|}  
 
<br/>
 
<br/>
 +
 +
<br>
 +
 +
<div style="border-style: solid; border-width:0; background: #b3ccff; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #000000>Motivation</font></div>
 +
 +
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
 +
Our group comprises of 2 business students and 1 accountancy student, all having a second major in analytics. As final year students, we realise that there is a need to familiarise ourselves with analytics in the working world. This module serves as an opportunity for us to undergo a project with XYZ which would give us more experience on a real-life working world situation. These hands on experience are invaluable that cannot be taught in a classroom environment. We hope this internship-like experience with XYZ will prepare us for our jobs in the future.
 +
</font></div>
 +
<br>
 +
 +
<div style="border-style: solid; border-width:0; background: #b3ccff; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #000000>Provided Data</font></div>
 +
 +
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
 +
It comes in the form of 14 Excel sheets, notably the Bill of Materials (BOM) files for the manufacturing plants in China, India and Thailand. The BOM includes the individual components and costs involved to manufacture XYZ Baby oil. The costs include but are not limited to: packaging, labour, direct overhead, indirect overhead, depreciation, freight, duty, toll fees, markup. Other data also include plant description, material master as well as procurement information records.
 +
 +
The data provided is mainly in string and numerical format.
 +
 +
</font></div>
 +
<br>
 +
 +
<div style="border-style: solid; border-width:0; background: #b3ccff; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #000000>Objective and Goals</font></div>
 +
 +
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
 +
The team aims to help XYZ increase gross profit from their products through these two goals:
 +
 +
1. To provide XYZ with an end-to-end perspective of all the costs associated with a finished good from the manufacturing plants to the distribution centers, so that they can investigate redundant costs or discover opportunities to optimize the material flow in their supply chain. We will be identifying factors that results in the cost complexity such as:
 +
* Freight cost
 +
* Labour cost
 +
* Overhead cost
 +
* Depreciation cost
 +
* Raw material cost
 +
 +
2. To provide XYZ a script that can automate the data consolidation between different Excel files, so that time required to generate a report is shortened, aiding XYZ in making more timely decisions.
 +
 +
</font></div>
 +
<br>
 +
 +
<div style="border-style: solid; border-width:0; background: #b3ccff; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #000000>Analysis Methods</font></div>
 +
 +
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
 +
'''<big>1. Exploratory analysis</big>'''
 +
 +
We will be doing a horizontal analysis to understand what is the ratio of each cost component in a particular unit of finished good across different locations.
 +
 +
'''<big>2. Correlation</big>'''
 +
 +
We will be doing a correlation analysis to identify variables that correlate with one another. We will also be identifying the variables that affect the total cost of goods produce which affects the total gross profit, such as the correlation between different cost components and the finished good cost. We will also be checking for multicollinearity between different cost components for complementary materials.
 +
 +
'''<big>3. Regression</big>'''
 +
 +
We will see if the total cost to produce a finished good can be expressed as a linear/polynomial equation of certain influential cost components. This is to identify which materials are the bottlenecks as well as weightage importance.
 +
 +
</font></div>
 +
<br>
 +
 +
<div style="border-style: solid; border-width:0; background: #b3ccff; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #000000>Project Scope</font></div>
 +
 +
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
 +
'''<big>1. Data Gathering and Scoping</big>'''
 +
 +
Our sponsor has given us the data in Excel file format for the baby oil product family in the year 2017. Geographically, the data involves three manufacturing plants in China, Thailand and India, and multiple distribution centres across over ten cities. From the supply chain perspective, the project will investigate the total costs associated with a finished good from the manufacturing plant to the wholesalers.
 +
 +
'''<big>2. Tools to be Used</big>'''
 +
 +
Our team has researched and discussed with the sponsor regarding the tools used in analyzing and presenting the data.
 +
 +
We will be using JMP Pro to perform exploratory data analysis. For visualization, we will be presenting our analytical results and dashboard using Tableau as XYZ is comfortable with using this software. It also fulfills their requirement of a dashboard that is highly interactive and allows them to drill down from the overview to the root causes.
 +
 +
We will also research R libraries to find suitable methods that perform analysis and data wrangling.
 +
 +
'''<big>3. Data Cleaning, Wrangling and Restructuring</big>'''
 +
 +
The team wants to perform horizontal analysis on the cost of one material across the three locations. However, the same material have different material numbers in different manufacturing location. Thus, we can only compare based on the material’s textual description, which are recorded differently in the different locations.
 +
 +
Therefore, the team will need to reconcile these name differences, either through manual reparsing or by using scripts that perform approximate string matching. Our preliminary research points us towards the R library stringdist, which can match two approximately similar strings based on their string distance. (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf)
 +
 +
The team will also use R to write a script that will extract and join the relevant fields from the different Excel files into a consolidated spreadsheet, which will then be used as input for analysis and report generation.
 +
 +
'''<big>4. Data Visualization and Reporting</big>'''
 +
 +
Once the data is cleaned and collated, we can begin to extract the data and load onto a data visualization template in R that will facilitate reporting and upper management review of the company’s performance. The template will accept certain excel file inputs regarding XYZ’s yearly, monthly or daily operations and will produce an output, either in the form of an interactive dashboard or a visual presentation. There will also be statistical relations so that management can more easily discern trends and discrepancies in their operations.
 +
 +
Armed with these functions, XYZ can determine where to first look into to find redundant costs in their supply chain.
 +
 +
</font></div>
 +
<br>
 
</center>
 
</center>
 
<!--End of Navigation Bar-->
 
<!--End of Navigation Bar-->

Revision as of 15:34, 2 January 2018

AY2017-18T2 Group03 Team Logo.png


AY2017-18T2 Group03 Home Icon.png   HOME AY2017-18T2 Group03 About Us Icon.png   ABOUT US AY2017-18T2 Group03 Project Overview Icon.png   PROJECT OVERVIEW AY2017-18T2 Group03 Data Analysis Icon.png   DATA ANALYSIS AY2017-18T2 Group03 Project Manage Icon.png   PROJECT MANAGEMENT AY2017-18T2 Group03 Documentation Icon.png   DOCUMENTATION



Motivation

Our group comprises of 2 business students and 1 accountancy student, all having a second major in analytics. As final year students, we realise that there is a need to familiarise ourselves with analytics in the working world. This module serves as an opportunity for us to undergo a project with XYZ which would give us more experience on a real-life working world situation. These hands on experience are invaluable that cannot be taught in a classroom environment. We hope this internship-like experience with XYZ will prepare us for our jobs in the future.


Provided Data

It comes in the form of 14 Excel sheets, notably the Bill of Materials (BOM) files for the manufacturing plants in China, India and Thailand. The BOM includes the individual components and costs involved to manufacture XYZ Baby oil. The costs include but are not limited to: packaging, labour, direct overhead, indirect overhead, depreciation, freight, duty, toll fees, markup. Other data also include plant description, material master as well as procurement information records.

The data provided is mainly in string and numerical format.


Objective and Goals

The team aims to help XYZ increase gross profit from their products through these two goals:

1. To provide XYZ with an end-to-end perspective of all the costs associated with a finished good from the manufacturing plants to the distribution centers, so that they can investigate redundant costs or discover opportunities to optimize the material flow in their supply chain. We will be identifying factors that results in the cost complexity such as:

  • Freight cost
  • Labour cost
  • Overhead cost
  • Depreciation cost
  • Raw material cost

2. To provide XYZ a script that can automate the data consolidation between different Excel files, so that time required to generate a report is shortened, aiding XYZ in making more timely decisions.


Analysis Methods

1. Exploratory analysis

We will be doing a horizontal analysis to understand what is the ratio of each cost component in a particular unit of finished good across different locations.

2. Correlation

We will be doing a correlation analysis to identify variables that correlate with one another. We will also be identifying the variables that affect the total cost of goods produce which affects the total gross profit, such as the correlation between different cost components and the finished good cost. We will also be checking for multicollinearity between different cost components for complementary materials.

3. Regression

We will see if the total cost to produce a finished good can be expressed as a linear/polynomial equation of certain influential cost components. This is to identify which materials are the bottlenecks as well as weightage importance.


Project Scope

1. Data Gathering and Scoping

Our sponsor has given us the data in Excel file format for the baby oil product family in the year 2017. Geographically, the data involves three manufacturing plants in China, Thailand and India, and multiple distribution centres across over ten cities. From the supply chain perspective, the project will investigate the total costs associated with a finished good from the manufacturing plant to the wholesalers.

2. Tools to be Used

Our team has researched and discussed with the sponsor regarding the tools used in analyzing and presenting the data.

We will be using JMP Pro to perform exploratory data analysis. For visualization, we will be presenting our analytical results and dashboard using Tableau as XYZ is comfortable with using this software. It also fulfills their requirement of a dashboard that is highly interactive and allows them to drill down from the overview to the root causes.

We will also research R libraries to find suitable methods that perform analysis and data wrangling.

3. Data Cleaning, Wrangling and Restructuring

The team wants to perform horizontal analysis on the cost of one material across the three locations. However, the same material have different material numbers in different manufacturing location. Thus, we can only compare based on the material’s textual description, which are recorded differently in the different locations.

Therefore, the team will need to reconcile these name differences, either through manual reparsing or by using scripts that perform approximate string matching. Our preliminary research points us towards the R library stringdist, which can match two approximately similar strings based on their string distance. (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf)

The team will also use R to write a script that will extract and join the relevant fields from the different Excel files into a consolidated spreadsheet, which will then be used as input for analysis and report generation.

4. Data Visualization and Reporting

Once the data is cleaned and collated, we can begin to extract the data and load onto a data visualization template in R that will facilitate reporting and upper management review of the company’s performance. The template will accept certain excel file inputs regarding XYZ’s yearly, monthly or daily operations and will produce an output, either in the form of an interactive dashboard or a visual presentation. There will also be statistical relations so that management can more easily discern trends and discrepancies in their operations.

Armed with these functions, XYZ can determine where to first look into to find redundant costs in their supply chain.