Difference between revisions of "ANLY482 AY2017-18 T1 Group2 Project EZLin Project Data"

From Analytics Practicum
Jump to navigation Jump to search
Line 42: Line 42:
 
<br/>
 
<br/>
 
<div style="font-size: 16px;">
 
<div style="font-size: 16px;">
The dataset is provided by XXX’s supply chain and data team in excel format. The timeframe of the data being provided is for the first few months in 2017.
+
We received various data files for our exploration and analysis. There are three main internal manufacturing plants and other plants are subcontracting plants that do not produce any goods in house.
 
</div>
 
</div>
  

Revision as of 17:56, 15 October 2017


HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

 


We received various data files for our exploration and analysis. There are three main internal manufacturing plants and other plants are subcontracting plants that do not produce any goods in house.

Mapping of Data

The initial 20 excels sheets have different format and include various information. There are sheets on the exchange rate between different countries, BOM of the plants and markets in various APAC countries, and etc. In order to identify:
1) the flow of manufacturing and distributing,
2) the location of the manufacturing plant and markets and
3) the cost spending on raw materials and overhead,
we need to map all these sheets together, through identifying primary and secondary key in each sheet before linking all the sheets together.
Combine all excel files for BOM into one csv file

  • Necessary for data extraction like getting data for different plant at one time.

Combine Purchase Information excel sheet with Purchase Information Linkage

  • As there are too many useless columns in these two files, we decided to select the useful ones and combine them together

Combine material master excel sheet with Special procurement type description

  • These two files are closely related. Adding the description of special procurement type to material master can help us to understand the data better.


Variable Transformation

The excel sheets lack a standard format and they are generated by different people. Same variable could have different names in different excel sheet. Currencies used in different sheets for different countries are also different - with Chinese RMB, Singapore Dollars, and etc. As such, we will need to standardise the format of all the sheets such as standardising the naming format of the variables before performing any analysis.
Transform all currencies into United States Dollars (USD).

  • Different plants in different regions have different currency. In order to compare all of them, we transform currencies to USD.


Missing Variable

In order to track the relationship between excel sheets and help with further analysis.
Add Bom_Id, SerielCode column to all excel files about BOM.

  • Adding Bom_Id: easier to extract one BOM for future analysis
  • Adding SerialCode: To keep track of the BOM level relationship & For converting to JSON (explain later)

Add SubconOrNot column to all excel files about BOM

  • Adding SubconOrNot: Easier to filter data for future analysis


Multi Entry Data

We identified that some excel sheets include multiple entry of the same product, with different variable values.
This occurs due to various issues such as human errors, or generating reports in new data format using the old systems in the company. After talking to the sponsors, it is believed that the multi entry of the data need to be combined before we start working on any analysis.


Data Visualization Transformation

Transformation of data from excel format to JSON format is needed in order to perform data visualization

  • To use d3.js for data visualization, we need to transform the data to JSON format.
  • JSON format is also good for dealing with hierarchical data.