Difference between revisions of "ANLY482 AY2017-18T2 Group03 Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(72 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
<br/>
 
<br/>
 
<!--Main Navigation-->
 
<!--Main Navigation-->
<center>
 
 
<!--Insert Group Image later-->
 
<!--Insert Group Image later-->
 
{|style="background-color:#ffffff; color:#000000; width="100%" cellspacing="0" cellpadding="10" border="0" |
 
{|style="background-color:#ffffff; color:#000000; width="100%" cellspacing="0" cellpadding="10" border="0" |
Line 12: Line 11:
 
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="13%" | [[ANLY482_AY2017-18T2_Group03 About Us|<font color="#FFFFFF" size=3><b>ABOUT US</b></font>]]
 
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="13%" | [[ANLY482_AY2017-18T2_Group03 About Us|<font color="#FFFFFF" size=3><b>ABOUT US</b></font>]]
  
|style="font-size:88%; border-left:1px solid #149de7; border-right:1px solid #149de7; text-align:center; border-bottom:1px solid #149de7; border-top:1px solid #149de7;" width="20%" |[[ANLY482_AY2017-18T2_Group03 Project Overview |<font color="#149de7" size=3><b>PROJECT OVERVIEW</b></font>]]
+
|style="font-size:88%; border-left:1px solid #149de7; border-right:1px solid #149de7; text-align:center; border-bottom:1px solid #149de7; border-top:1px solid #149de7;" width="15%" |[[ANLY482_AY2017-18T2_Group03 Project Overview |<font color="#149de7" size=3><b>PROJECT OVERVIEW</b></font>]]
  
 
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="18%" |[[ANLY482_AY2017-18T2_Group03 Data Analysis |<font color="#FFFFFF" size=3><b>DATA ANALYSIS </b></font>]]
 
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="18%" |[[ANLY482_AY2017-18T2_Group03 Data Analysis |<font color="#FFFFFF" size=3><b>DATA ANALYSIS </b></font>]]
  
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="22%" |[[ANLY482_AY2017-18T2_Group03 Project Management |<font color="#FFFFFF" size=3><b>PROJECT MANAGEMENT</b></font>]]
+
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="15%" |[[ANLY482_AY2017-18T2_Group03 Project Management |<font color="#FFFFFF" size=3><b>PROJECT MANAGEMENT</b></font>]]
  
 
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="17%" |[[ANLY482_AY2017-18T2_Group03 Documentation | <font color="#FFFFFF" size=3><b>DOCUMENTATION</b></font>]]
 
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="17%" |[[ANLY482_AY2017-18T2_Group03 Documentation | <font color="#FFFFFF" size=3><b>DOCUMENTATION</b></font>]]
 +
 +
|style="font-size:88%; border-left:1px solid #ffffff; border-right:1px solid #ffffff; text-align:center; background-color:#149de7; " width="17%" |[[ANLY482_AY2017-18_Term_2 | <font color="#FFFFFF" size=3><b>MAIN PAGE</b></font>]]
 
|}  
 
|}  
<br/>
 
  
<br>
+
<!---------------START of sub menu ---------------------->
 +
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 +
|-
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[ANLY482_AY2017-18T2_Group03 Project Overview Old| <span style="color:#149de7">Previous</span>]]
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 +
 
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #149de7" width="150px"| [[ANLY482_AY2017-18T2_Group03 Project Overview| <span style="color:#149de7">Current</span>]]
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
  
<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Motivation</font></div>
+
|}
 +
<!---------------END of sub menu ---------------------->
 +
<br/>
 +
==<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Motivation</font></div>==
  
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
+
The company has stayed ahead of their competition by embracing technology in the early days and now, they are starting to dabble into analytics in order to help make use of the large amount of data that they have generated throughout the course of their business. Currently, these information is being stored in their proprietary Warehouse Management System (WMS) which captures transaction information whenever goods are moving in, out or within the warehouse.  
The company has stayed ahead of their competition by embracing technology in the early days and now, they are starting to dabble into analytics in order to help make use of the large amount of data that they have generated throughout the course of their business. Currently, these information is being stored in their proprietary Warehouse Management System (WMS) which captures transaction information whenever goods are moving in, out or within the warehouse. The amount of information being managed by the system is massive and it is very complicated, thus managers have to spend time to use pivot tables in excel to better visualize the information being generated in order to get a better view of how the current operations is being run. This uses up valuable time which could have been used for other more important operational matters.
 
  
Also, manhour management is also very important as the amount of volume that the material handlers can handle is limited to 7 containers for inbound and 7 containers for outbound. Anymore than this and overtime will be required. Currently, material handlers are the ones that request for overtime hours whenever the volume of work exceeds the current baseline threshold. Thus, it is up to the judgement of the manager on the ground to make the final call on how long the material handlers need to overtime. As a result, there are cases of material handlers taking more overtime than necessary leading to increased cost which leads to the company having their margins beings reduced due to this inefficiency as the client would only pay for a certain number of overtime hours based on the agreed service level agreement.
+
The amount of information being managed by the system is massive and it is very complicated, thus managers have to spend time to use pivot tables in excel to better visualize the information being generated in order to get a better view of how the current operations is being run. This uses up valuable time which could have been used for other more important operational matters.
  
</font></div>
+
One important information that is very useful to a warehouse manager is an ABC analysis which is an inventory categorization technique. The purpose of this analysis is to classify the inventory into 3 categories.
<br>
+
The most general way of categorization is:
 +
*‘A’ items – 20% of the items accounts for 80% of the annual consumption value of the items.
 +
*‘B’ items - 30% of the items accounts for 15% of the annual consumption value of the items.
 +
*‘C’ items - 50% of the items accounts for 5% of the annual consumption value of the items.
  
<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Objective and Goals</font></div>
+
Not only that, in order to provide even more better insight for their customers, they would also like to be able to have a better view of the <b>seasonality</b> patterns of the products, as well as an <b>affinity analysis</b> of the products to further improve the way the products can be organized in the warehouse to increase the efficiency of the warehouse operations.
  
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
+
=<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Objective and Goals</font></div>=
 
The main objectives of the project would be to do the following:
 
The main objectives of the project would be to do the following:
  
1. Create a Operations Dashboard to visualize the following KPIs
+
1. Create a Operations Dashboard to visualize the following information
* Operations Productivity Chart
+
 
** The purpose of this chart is to visualize the operations performance at each hour of the day and see if the drop in productivity is line with the break timings and if there are additional timings which are unproductive.
+
*Inbound / Outbound Volume Time Series Chart
** X-axis: Hour of the day
+
**The purpose of this chart is to visualize the operations performance at various time slices so as to allow operations to have a clear view of the inventory movement patterns for the various SKUs
** Y-axis: Number of pallets successfully serviced
+
**X-axis: Time unit (day, week, month, quarter, year)
 +
**Y-axis: No. of units of product
 +
**Lines: Each line would symbolize the trend for a different SKU.
  
 
*Product Ranking Chart
 
*Product Ranking Chart
Line 52: Line 66:
 
**The purpose of this chart is to see the seasonality factors of the different products.
 
**The purpose of this chart is to see the seasonality factors of the different products.
 
**X-axis: Month of the year
 
**X-axis: Month of the year
**Y-axis: Number of pallets
+
**Y-axis: Number of units of product
 
+
**Lines: Each line would symbolize the trend for a different SKU.
*Actual Overtime Hours Chart (Dual-axis chart), these charts will be made up of 2 lines, one for the total number of overtime hours incurred and another being the total number of containers that were completed for the day. One chart will be used for the morning shift and another chart will be used for the night shift.
 
**The purpose of this chart is to visualize the number of overtime hours for the month in a timeline view so that managers can see the overtime performance of the operations in relation to the number of additional containers that need to be serviced for the day
 
**X-axis: Date of the month
 
**Y-axis: Number of overtime hours
 
**Y-axis: Number of containers
 
 
 
*Overtime Performance Analysis Chart
 
**The purpose of this chart is to use the historical timings to come-up with a baseline overtime hour required for varying number of additional containers.
 
**X-axis: Number of containers over the threshold limit
 
**Y-axis: Average time taken to complete the additional order
 
  
2. Validate if the current operation performance is in-line with the baseline performance of 7 containers of inbound and 7 containers of outbound per working shift.
+
2. Conduct Associative Rule learning so as to attempt to see if there are affinities within the products within the warehouse so that inventory can be better organized within  the warehouse.
  
</font></div>
+
=<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Provided Data</font></div>=
<br>
+
The data provided by our sponsor are obtained from their proprietary Warehouse Management System (WMS) which is developed in-house. The Warehouse Management System is used to keep track of the movement of the goods coming in, moving out and within the warehouse itself. The process of goods coming into the warehouse is known as the Inbound process while the process of goods moving out of the warehouse is known as the Outbound process.
 +
 +
For this project, our sponsor has provided us with data for one of their client, which is a company selling cables and connectors. We will be provided with 3 years worth of data from year 2015 to 2017. There are 2 Excel workbooks provided, one for the Inbound goods and the other for the Outbound goods. A more in-depth explanation of the Excel workbooks will be explained below.
  
<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Provided Data</font></div>
+
===Inbound Report<br/>===
 +
There is a total of 308,730 rows of data spanning across January 2015 to December 2017 for the Inbound orders for this client. In 2015, there were 99,586 rows of data and this increased in 2016 to 102,379. In 2017, the number of rows of data was 106,765.
  
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
+
Upon further clarifications with our sponsor, we realized that they have moved warehouse at the beginning of September 2016 and this lasted for around 2 weeks. Therefore, for the analysis on the products’ locations in the warehouse, we can only use the data from October 2016 to December 2017. However, for seasonality analysis, we will still use the data from January 2015 to December 2017 to find out trends in the volume of goods. From October 2016 to December 2017, there are 133,620 rows of data.
The data provided by our sponsor are obtained from 3 of their in-house systems:
 
*Warehouse Management System (WMS) - The warehouse management system is used to keep track of the movement of the goods coming in, going out and within the warehouse itself. The WMS is a proprietary system developed in-house.
 
*Reporting System (RS) - This system provides the company with performance management systems which allow the company to manage their KPIs. Data from the WMS is being fed into the RS in order to generate summary reports.
 
*Human Resources System (HRS) - This system provides information on the permanent employees’ working hours and overtime hours.
 
For the purpose of this project, our sponsor has provided us with data for one of their client, which is a company selling powdered milk cans. This includes the inbound and outbound products for the year 2017. There are several excel workbooks provided which we will explain below.
 
===From Warehouse Management System===
 
  
==== Handling In Report ====
+
The image below shows a Sample Inbound Report Excel spreadsheet.
There is a total of around 140,000 rows of data spanning across January to December 2017 for the Inbound orders for this client. Each row represents a pallet.
+
[[Image:Sample Inbound Report.png|800px]]
[[File:AY2017-18T2 Group03 Warehouse Management System Handling In Data.png|1000px]]
 
  
''*The description is blanked out because it contains information about the client’s products''
+
<b><i>Metadata Dictionary</i></b>
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! Terminology !! Description
 
! Terminology !! Description
 
|-
 
|-
| STO No || Unique identifier for a particular inbound order. It is the same as the Customer Ref No in Inbound Order Status Summary Report from RS
+
| Doc_Num || Unique identifier for an inbound order.
 +
|-
 +
| ASN Date || Advanced Shipping Notice. ASN date represents the date when the ASN is downloaded.
 
|-
 
|-
| Trans Code || There are several trans code, but we will only focus on:
+
| GRN Date || Goods Received Note. The date when the GRN job is created.
*IFP: Plant Transfer (Physical system change, movement of goods)
 
 
|-
 
|-
| GRN Created Date || Goods Received Note. Once GRN is recorded, it means that inbound process is completed.
+
| GRN Post Date || The date when the GRN job is completed.
 
|-
 
|-
 
| PA Created Date || The date when the putaway job is created.
 
| PA Created Date || The date when the putaway job is created.
 
|-
 
|-
| PA Post Date || EThe date when the putaway job is completed.
+
| PA Post Date || The date when the putaway job is completed.
 +
|-
 +
| Owner_Short_Name || The short name of the client.
 +
|-
 +
| Loc_Short_Name || The identifier of the location which the product is placed at.
 +
|-
 +
| Product_Code || Unique identifier of a product.
 +
|-
 +
| Sum (D.Actual_Movement_Qty) || The quantity of products received.
 +
|-
 +
| Conv_Factor || The number of products in each carton.
 +
|-
 +
| Full Carton || The number of full cartons received.
 +
|-
 +
| Loose Carton || The number of loose cartons received (if any).
 
|-
 
|-
| Description || The product which will be stored in the warehouse e.g. Brand X 850g powdered milk cans or Brand X 400g powdered milk cans.
+
| Remarks || Remarks to inform of any loose cartons and how they are handled.
 
|}
 
|}
  
==== Handling Out Report ====
+
===Outbound Report<br/>===
The number of Outbound orders for this client from January 2017 to December 2017 is approximately 81,000 rows of data. Each row represents a pallet.
+
The number of Outbound orders for this client from January 2015 to December 2017 is approximately 657,578 rows of data. In 2015, there were 220,387 rows of data and this increased in 2016 to 221,498. In 2017, the number of rows of data was 215,667.
[[File:AY2017-18T2 Group03 Warehouse Management System Handling Out Data.png|500px]]
+
 
 +
From October 2016 to December 2017, there are 270,815 rows of data.
  
''*The description is blanked out because it contains information about the client’s products''
+
The image below shows a Sample Outbound Report Excel spreadsheet.
 +
[[Image:Sample Outbound Report.png|800px]]
  
 +
<b><i>Metadata Dictionary</i></b>
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
 
! Terminology !! Description
 
! Terminology !! Description
 
|-
 
|-
| SO Doc Num || Unique identifier for a particular outbound order. It is the same as the Order Num in Outbound Order Status Summary Report from RS.
+
| Doc_Num || Unique identifier for an outbound order.
 
|-
 
|-
| Trans Code || There are several trans code, but we will only focus on:
+
| Trans_Code || There are several trans codes:
*ORV - Return to vendor
+
* OSA (Outbound Stock Adjustment – Usually when there is a shortage or excess physical goods in the warehouse, we need to perform this in the system to ensure that the quantity tallies between system and physical)
*OSD - Meant for disposal
+
* OSO (For Outbound Shipment)
*OSO - For shipment outbound
+
* OOT (Outbound Ownership Transfer - When there is a change of plant code (example: from plant code 1201 to plant code 1402))
These 3 trans codes are to be taken into consideration because there is a physical movement of goods.
+
* OTA (Outbound Transaction Adjustment - When there is a data entry error and this transaction code is used to reverse the wrong transaction out)
 +
* OVT (Outbound Vendor Transfer - When there is short shipment)
  
 +
However, we will only focus on OSO.
 +
|-
 +
| Product_Code || Unique identifier of a product.
 +
|-
 +
| Qty || The total quantity of products needed for the order.
 +
|-
 +
| No of CTN || Number of cartons for the order.
 +
|-
 +
| SO Created Date || The date when the sales order is downloaded.
 +
|-
 +
| SO Post Date || It refers to the date when the SO is posted. Most SO Created Dates = SO Post Date. However, when users go into WMS to make changes, the SO Post Dates will change accordingly.
 +
|-
 +
| Expected Delivery Date || The date which the goods is expected to be shipped out.
 +
|-
 +
| PlannedPGIDate || Planned DP post date.
 +
|-
 +
| DP Creation Date || The date when the picking list is printed for the picker to pick the goods in warehouse.
 +
|-
 +
| DP Commit Date || The date when the picker has finished picking.
 
|-
 
|-
| DO Created Date|| The date which the delivery order is created.
+
| DO Created Date || The date which the delivery order is created. It indicates that we are ready for shipment and are waiting for the transporter to load our goods.
 
|-
 
|-
| DO Post Date || The date which the delivery order is completed.
+
| DO Commit || The date when the delivery order is committed. This happens after the goods has left the warehouse.
 
|-
 
|-
| Product Code || Unique identifier of a product.
+
| LocName || The name of the location in our warehouse.
 +
* Locations like 2R15C13 are the locations of our previous warehouse which has a different layout as the current warehouse.
 
|-
 
|-
| Prod Description || The product which will be stored in the warehouse e.g. Brand X 850g powdered milk cans or Brand X 400g powdered milk cans.
+
| Ship_To / TP_Full_Name || Outbound customer name.
 
|}
 
|}
  
=== From Reporting System ===
+
=<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Limitations</font></div>=
==== Inbound Order Status Summary Report ====
+
As with any other real life projects, there are limitations which we face especially with regards to the availability of data.
There is a total of around 3,000 rows of data spanning across January to December 2017 for the Inbound orders for this client from RS. This number is significant lesser than the Handling In Report as it is an aggregation of the data from WMS. Each row represents a container, which contains 22 pallets.
+
{| class="wikitable"
A snippet of the data for a Inbound Order Status Summary Report is shown below.
+
|-
 +
! S/N || Limitation !! Conclusion
 +
|-
 +
| 1 || There is no way to accurate link an Outbound shipment to an Inbound shipment. We can only link the Outbound product and Inbound product but not the exact shipment where the product is being brought into the warehouse. || We will assume a First In First Out (FIFO) order of shipment out of the warehouse as that is usually how logistics company operates. This means that the first Outbound in 2015 will be linked to the first Inbound in 2015, unless there are excess goods brought over from 2014. If so, calculations will be done based on the quantity to determine the number of each products left after Outbound shipment.
 +
|-
 +
| 2 || The current data provided only consists of information from 1 company, which cables and connectors, that our sponsor company is handling. || We assume that the same analysis can be replicated for the logistic warehousing of other companies’ products which are similar in nature.
 +
|-
 +
| 3 || The data provided from WMS might not contain all the transactions that happened due to human errors such as missed barcode scanning. || With a large enough dataset, it can be assumed that the results found by our team can be representative and considered accurate.
 +
|}
  
<br>
+
=<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Important Information about the Business Process</font></div>=
<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Analysis Methods</font></div>
+
The image below shows the business workflow for the Inbound process of inventory. It also states when each date attribute is collected throughout the entire process.
 +
[[Image:Inbound Business Process.png|800px]]
  
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
+
The image below shows the business workflow for the Outbound process of inventory. It also states when each date attribute is collected throughout the entire process.
'''<big>1. Exploratory analysis</big>'''
+
[[Image:Outbound Business Process.png|800px]]
  
We will be doing a horizontal analysis to understand what is the ratio of each cost component in a particular unit of finished good across different locations.
+
=<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Project Scope</font></div>=
 +
'''<big>1. Data Gathering and Scoping</big>'''<br/>
 +
Our sponsor has given us various excel spreadsheets, both Inbound and Outbound, with data from 2015 to 2017, for a client selling cables and connectors.
  
'''<big>2. Correlation</big>'''
+
Our project scope includes analysing the products ranking, products seasonality and products inbound and outbound volumes. Additionally, we will also look at the affinity between products to see if there are trends of any products frequently being shipped together. With the data, we will perform Exploratory Data Analysis using Python.
  
We will be doing a correlation analysis to identify variables that correlate with one another. We will also be identifying the variables that affect the total cost of goods produce which affects the total gross profit, such as the correlation between different cost components and the finished good cost. We will also be checking for multicollinearity between different cost components for complementary materials.
+
'''<big>2. Tools to be Used</big>'''<br/>
 +
We will be utilising several tools to aid us in the development of our dashboard. This includes:
 +
{| class="wikitable"
 +
|-
 +
! Technology !! Description
 +
|-
 +
| d3.js, dc.js and JavaScript || d3.js and dc.js are simple-to-use charting library written in JavaScript which allows us to add interactive charts to our web application for the dashboard. It will be used to develop our dashboard for the various visualizations mentioned above.
 +
|-
 +
| Python and Jupyter || Python is a programming language which we will use to create a script using Jupyter meant for cleaning the dataset when it is uploaded. It is also used to handle the missing and duplicated data.
 +
|-
 +
| Tableau Software || Tableau will be utilized for Exploratory Data Analysis. It will also be used to create a dashboard for our sponsor.
 +
|-
 +
| PostgreSQL || PostgreSQL will be used as the database for the dashboard and web service.
 +
|}
  
'''<big>3. Regression</big>'''
+
'''<big>3. Data Cleaning, Wrangling and Restructuring</big>'''<br/>
 +
Our team aims to combine the 3 years data for Inbound products into 1 excel spreadsheet. This will be the same for the Outbound report. We will also try to combine the Inbound and Outbound reports by using their unique identifier to link the data together. Data cleaning will be done using Python to handling missing values, duplicate data and non-uniform data, after integrating the data.
  
We will see if the total cost to produce a finished good can be expressed as a linear/polynomial equation of certain influential cost components. This is to identify which materials are the bottlenecks as well as weightage importance.
+
However, as the company shifted their warehouse in September 2016 and this lasted for 2 weeks, we will be using another dataset which only consists of data from October 2016 to December 2017 for both Inbound and Outbound.
 +
 +
The team will also use Python to write a script which will combine the various excel spreadsheets and extract the relevant fields before consolidating them into a single spreadsheet that will be used as an input for the analysis and the Operations Dashboard which will be done using Tableau or as a web application using JavaScript.
  
</font></div>
+
'''<big>4. Data Visualization and Reporting</big>'''<br/>
<br>
+
Once the data is cleaned and combined, we will extract the data and create a dashboard with various visualizations as mentioned earlier. This will be done using d3.js, dc.js and JavaScript. The interactive dashboard will allow managers to have easy access to visualizations showing the performance of the current operations as compared to past operations. For the visualisations, we are also considering to use Tableau to do up the dashboard. Both options have been communicated to our sponsor, but the final deliverable format has yet to be finalised.
  
<div style="border-style: solid; border-width:0; background: #149de7; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Trebuchet MS;border-bottom:5px solid white; border-top:5px solid black"><font color= #FFFFFF>Project Scope</font></div>
+
On top of the visualizations, we will also do Market Basket Analysis to find out the affinity between products. Also, we will report all our findings to our sponsor and propose how the goods should be arranged in the warehouse based on their categories (A, B or C).
 +
<!--End of Navigation Bar-->
  
<div style="background: #ffffff; text-align:left; line-height: wrap_content; text-indent: 0px; font-size:14px; font-family:Trebuchet MS"><font color= #000000>
+
===Software Used<br/>===
'''<big>1. Data Gathering and Scoping</big>'''
+
We used Python as a tool for the data cleaning, as well as loading of data into the database after certain processing. Specific python libraries used can be seen from the table below.
  
Our sponsor has given us the data in Excel file format for the baby oil product family in the year 2017. Geographically, the data involves three manufacturing plants in China, Thailand and India, and multiple distribution centres across over ten cities. From the supply chain perspective, the project will investigate the total costs associated with a finished good from the manufacturing plant to the wholesalers.
+
{| class="wikitable"
 
+
|-
'''<big>2. Tools to be Used</big>'''
+
! Python Library !! Purpose
 
+
|-
Our team has researched and discussed with the sponsor regarding the tools used in analyzing and presenting the data.
+
| pandas || pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It helps to convert data that is being uploaded into dataframes for easier processing.
 
+
|-
We will be using JMP Pro to perform exploratory data analysis. For visualization, we will be presenting our analytical results and dashboard using Tableau as XYZ is comfortable with using this software. It also fulfills their requirement of a dashboard that is highly interactive and allows them to drill down from the overview to the root causes.
+
| datetime || The datetime module supplies classes for manipulating dates and times in both simple and complex ways. It helped to convert certain data into the correct date and time format for creation of new variables that deals with date and time.
 
+
|-
We will also research R libraries to find suitable methods that perform analysis and data wrangling.
+
| math || It provides access to the mathematical functions defined by the C standard. Particularly in our project, it is used to round values off.
 +
|-
 +
| glob || The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. It is used during file reading when files are being uploaded onto the database.
 +
|-
 +
| re || This module provides regular expression matching operations similar to those found in Perl. It is used for file reading when files are being uploaded onto the database.
 +
|-
 +
| configparser || This module provides the ConfigParser class which implements a basic configuration language which provides a structure similar to what’s found in Microsoft Windows INI files. You can use this to write Python programs which can be customized by end users easily. In this case, it is used to help connect the python scripts to the postgres database by setting the configurations in the ini files.
 +
|-
 +
| psycopg2 || Psycopg is a PostgreSQL database adapter for the Python programming language.
 +
|-
 +
| os || This module provides a portable way of using operating system dependent functionality. For our project, it is used by the scripts to delete files after they are being uploaded onto the database.
 +
|}
  
'''<big>3. Data Cleaning, Wrangling and Restructuring</big>'''
+
For the operational dashboard, we made use of various JavaScript libraries including d3.js, dc.js, crossfilter.js and sigma.js. It is a node.js application which is a server used to host the Angular web dashboard application and serves as the web service layer where the front-end communicates with the back-end to retrieve data from the database. Python programming language is also used for bootstrapping of data and the data cleaning process.
 
 
The team wants to perform horizontal analysis on the cost of one material across the three locations. However, the same material have different material numbers in different manufacturing location. Thus, we can only compare based on the material’s textual description, which are recorded differently in the different locations.
 
 
 
Therefore, the team will need to reconcile these name differences, either through manual reparsing or by using scripts that perform approximate string matching. Our preliminary research points us towards the R library stringdist, which can match two approximately similar strings based on their string distance. (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf)
 
 
 
The team will also use R to write a script that will extract and join the relevant fields from the different Excel files into a consolidated spreadsheet, which will then be used as input for analysis and report generation.
 
 
 
'''<big>4. Data Visualization and Reporting</big>'''
 
 
 
Once the data is cleaned and collated, we can begin to extract the data and load onto a data visualization template in R that will facilitate reporting and upper management review of the company’s performance. The template will accept certain excel file inputs regarding XYZ’s yearly, monthly or daily operations and will produce an output, either in the form of an interactive dashboard or a visual presentation. There will also be statistical relations so that management can more easily discern trends and discrepancies in their operations.
 
 
 
Armed with these functions, XYZ can determine where to first look into to find redundant costs in their supply chain.
 
 
 
</font></div>
 
<br>
 
</center>
 
<!--End of Navigation Bar-->
 

Latest revision as of 19:46, 11 April 2018

AY2017-18T2 Group03 Team Logo.png


HOME ABOUT US PROJECT OVERVIEW DATA ANALYSIS PROJECT MANAGEMENT DOCUMENTATION MAIN PAGE
Previous Current


Motivation

The company has stayed ahead of their competition by embracing technology in the early days and now, they are starting to dabble into analytics in order to help make use of the large amount of data that they have generated throughout the course of their business. Currently, these information is being stored in their proprietary Warehouse Management System (WMS) which captures transaction information whenever goods are moving in, out or within the warehouse.

The amount of information being managed by the system is massive and it is very complicated, thus managers have to spend time to use pivot tables in excel to better visualize the information being generated in order to get a better view of how the current operations is being run. This uses up valuable time which could have been used for other more important operational matters.

One important information that is very useful to a warehouse manager is an ABC analysis which is an inventory categorization technique. The purpose of this analysis is to classify the inventory into 3 categories. The most general way of categorization is:

  • ‘A’ items – 20% of the items accounts for 80% of the annual consumption value of the items.
  • ‘B’ items - 30% of the items accounts for 15% of the annual consumption value of the items.
  • ‘C’ items - 50% of the items accounts for 5% of the annual consumption value of the items.

Not only that, in order to provide even more better insight for their customers, they would also like to be able to have a better view of the seasonality patterns of the products, as well as an affinity analysis of the products to further improve the way the products can be organized in the warehouse to increase the efficiency of the warehouse operations.

Objective and Goals

The main objectives of the project would be to do the following:

1. Create a Operations Dashboard to visualize the following information

  • Inbound / Outbound Volume Time Series Chart
    • The purpose of this chart is to visualize the operations performance at various time slices so as to allow operations to have a clear view of the inventory movement patterns for the various SKUs
    • X-axis: Time unit (day, week, month, quarter, year)
    • Y-axis: No. of units of product
    • Lines: Each line would symbolize the trend for a different SKU.
  • Product Ranking Chart
    • The purpose of this chart is to find out which are the best-selling items and which are the least popular items.
    • Treemap view of the products filtered by month / year
  • Product Seasonality Chart
    • The purpose of this chart is to see the seasonality factors of the different products.
    • X-axis: Month of the year
    • Y-axis: Number of units of product
    • Lines: Each line would symbolize the trend for a different SKU.

2. Conduct Associative Rule learning so as to attempt to see if there are affinities within the products within the warehouse so that inventory can be better organized within the warehouse.

Provided Data

The data provided by our sponsor are obtained from their proprietary Warehouse Management System (WMS) which is developed in-house. The Warehouse Management System is used to keep track of the movement of the goods coming in, moving out and within the warehouse itself. The process of goods coming into the warehouse is known as the Inbound process while the process of goods moving out of the warehouse is known as the Outbound process.

For this project, our sponsor has provided us with data for one of their client, which is a company selling cables and connectors. We will be provided with 3 years worth of data from year 2015 to 2017. There are 2 Excel workbooks provided, one for the Inbound goods and the other for the Outbound goods. A more in-depth explanation of the Excel workbooks will be explained below.

Inbound Report

There is a total of 308,730 rows of data spanning across January 2015 to December 2017 for the Inbound orders for this client. In 2015, there were 99,586 rows of data and this increased in 2016 to 102,379. In 2017, the number of rows of data was 106,765.

Upon further clarifications with our sponsor, we realized that they have moved warehouse at the beginning of September 2016 and this lasted for around 2 weeks. Therefore, for the analysis on the products’ locations in the warehouse, we can only use the data from October 2016 to December 2017. However, for seasonality analysis, we will still use the data from January 2015 to December 2017 to find out trends in the volume of goods. From October 2016 to December 2017, there are 133,620 rows of data.

The image below shows a Sample Inbound Report Excel spreadsheet. Sample Inbound Report.png

Metadata Dictionary

Terminology Description
Doc_Num Unique identifier for an inbound order.
ASN Date Advanced Shipping Notice. ASN date represents the date when the ASN is downloaded.
GRN Date Goods Received Note. The date when the GRN job is created.
GRN Post Date The date when the GRN job is completed.
PA Created Date The date when the putaway job is created.
PA Post Date The date when the putaway job is completed.
Owner_Short_Name The short name of the client.
Loc_Short_Name The identifier of the location which the product is placed at.
Product_Code Unique identifier of a product.
Sum (D.Actual_Movement_Qty) The quantity of products received.
Conv_Factor The number of products in each carton.
Full Carton The number of full cartons received.
Loose Carton The number of loose cartons received (if any).
Remarks Remarks to inform of any loose cartons and how they are handled.

Outbound Report

The number of Outbound orders for this client from January 2015 to December 2017 is approximately 657,578 rows of data. In 2015, there were 220,387 rows of data and this increased in 2016 to 221,498. In 2017, the number of rows of data was 215,667.

From October 2016 to December 2017, there are 270,815 rows of data.

The image below shows a Sample Outbound Report Excel spreadsheet. Sample Outbound Report.png

Metadata Dictionary

Terminology Description
Doc_Num Unique identifier for an outbound order.
Trans_Code There are several trans codes:
  • OSA (Outbound Stock Adjustment – Usually when there is a shortage or excess physical goods in the warehouse, we need to perform this in the system to ensure that the quantity tallies between system and physical)
  • OSO (For Outbound Shipment)
  • OOT (Outbound Ownership Transfer - When there is a change of plant code (example: from plant code 1201 to plant code 1402))
  • OTA (Outbound Transaction Adjustment - When there is a data entry error and this transaction code is used to reverse the wrong transaction out)
  • OVT (Outbound Vendor Transfer - When there is short shipment)

However, we will only focus on OSO.

Product_Code Unique identifier of a product.
Qty The total quantity of products needed for the order.
No of CTN Number of cartons for the order.
SO Created Date The date when the sales order is downloaded.
SO Post Date It refers to the date when the SO is posted. Most SO Created Dates = SO Post Date. However, when users go into WMS to make changes, the SO Post Dates will change accordingly.
Expected Delivery Date The date which the goods is expected to be shipped out.
PlannedPGIDate Planned DP post date.
DP Creation Date The date when the picking list is printed for the picker to pick the goods in warehouse.
DP Commit Date The date when the picker has finished picking.
DO Created Date The date which the delivery order is created. It indicates that we are ready for shipment and are waiting for the transporter to load our goods.
DO Commit The date when the delivery order is committed. This happens after the goods has left the warehouse.
LocName The name of the location in our warehouse.
  • Locations like 2R15C13 are the locations of our previous warehouse which has a different layout as the current warehouse.
Ship_To / TP_Full_Name Outbound customer name.

Limitations

As with any other real life projects, there are limitations which we face especially with regards to the availability of data.

S/N Limitation Conclusion
1 There is no way to accurate link an Outbound shipment to an Inbound shipment. We can only link the Outbound product and Inbound product but not the exact shipment where the product is being brought into the warehouse. We will assume a First In First Out (FIFO) order of shipment out of the warehouse as that is usually how logistics company operates. This means that the first Outbound in 2015 will be linked to the first Inbound in 2015, unless there are excess goods brought over from 2014. If so, calculations will be done based on the quantity to determine the number of each products left after Outbound shipment.
2 The current data provided only consists of information from 1 company, which cables and connectors, that our sponsor company is handling. We assume that the same analysis can be replicated for the logistic warehousing of other companies’ products which are similar in nature.
3 The data provided from WMS might not contain all the transactions that happened due to human errors such as missed barcode scanning. With a large enough dataset, it can be assumed that the results found by our team can be representative and considered accurate.

Important Information about the Business Process

The image below shows the business workflow for the Inbound process of inventory. It also states when each date attribute is collected throughout the entire process. Inbound Business Process.png

The image below shows the business workflow for the Outbound process of inventory. It also states when each date attribute is collected throughout the entire process. Outbound Business Process.png

Project Scope

1. Data Gathering and Scoping
Our sponsor has given us various excel spreadsheets, both Inbound and Outbound, with data from 2015 to 2017, for a client selling cables and connectors.

Our project scope includes analysing the products ranking, products seasonality and products inbound and outbound volumes. Additionally, we will also look at the affinity between products to see if there are trends of any products frequently being shipped together. With the data, we will perform Exploratory Data Analysis using Python.

2. Tools to be Used
We will be utilising several tools to aid us in the development of our dashboard. This includes:

Technology Description
d3.js, dc.js and JavaScript d3.js and dc.js are simple-to-use charting library written in JavaScript which allows us to add interactive charts to our web application for the dashboard. It will be used to develop our dashboard for the various visualizations mentioned above.
Python and Jupyter Python is a programming language which we will use to create a script using Jupyter meant for cleaning the dataset when it is uploaded. It is also used to handle the missing and duplicated data.
Tableau Software Tableau will be utilized for Exploratory Data Analysis. It will also be used to create a dashboard for our sponsor.
PostgreSQL PostgreSQL will be used as the database for the dashboard and web service.

3. Data Cleaning, Wrangling and Restructuring
Our team aims to combine the 3 years data for Inbound products into 1 excel spreadsheet. This will be the same for the Outbound report. We will also try to combine the Inbound and Outbound reports by using their unique identifier to link the data together. Data cleaning will be done using Python to handling missing values, duplicate data and non-uniform data, after integrating the data.

However, as the company shifted their warehouse in September 2016 and this lasted for 2 weeks, we will be using another dataset which only consists of data from October 2016 to December 2017 for both Inbound and Outbound.

The team will also use Python to write a script which will combine the various excel spreadsheets and extract the relevant fields before consolidating them into a single spreadsheet that will be used as an input for the analysis and the Operations Dashboard which will be done using Tableau or as a web application using JavaScript.

4. Data Visualization and Reporting
Once the data is cleaned and combined, we will extract the data and create a dashboard with various visualizations as mentioned earlier. This will be done using d3.js, dc.js and JavaScript. The interactive dashboard will allow managers to have easy access to visualizations showing the performance of the current operations as compared to past operations. For the visualisations, we are also considering to use Tableau to do up the dashboard. Both options have been communicated to our sponsor, but the final deliverable format has yet to be finalised.

On top of the visualizations, we will also do Market Basket Analysis to find out the affinity between products. Also, we will report all our findings to our sponsor and propose how the goods should be arranged in the warehouse based on their categories (A, B or C).

Software Used

We used Python as a tool for the data cleaning, as well as loading of data into the database after certain processing. Specific python libraries used can be seen from the table below.

Python Library Purpose
pandas pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It helps to convert data that is being uploaded into dataframes for easier processing.
datetime The datetime module supplies classes for manipulating dates and times in both simple and complex ways. It helped to convert certain data into the correct date and time format for creation of new variables that deals with date and time.
math It provides access to the mathematical functions defined by the C standard. Particularly in our project, it is used to round values off.
glob The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. It is used during file reading when files are being uploaded onto the database.
re This module provides regular expression matching operations similar to those found in Perl. It is used for file reading when files are being uploaded onto the database.
configparser This module provides the ConfigParser class which implements a basic configuration language which provides a structure similar to what’s found in Microsoft Windows INI files. You can use this to write Python programs which can be customized by end users easily. In this case, it is used to help connect the python scripts to the postgres database by setting the configurations in the ini files.
psycopg2 Psycopg is a PostgreSQL database adapter for the Python programming language.
os This module provides a portable way of using operating system dependent functionality. For our project, it is used by the scripts to delete files after they are being uploaded onto the database.

For the operational dashboard, we made use of various JavaScript libraries including d3.js, dc.js, crossfilter.js and sigma.js. It is a node.js application which is a server used to host the Angular web dashboard application and serves as the web service layer where the front-end communicates with the back-end to retrieve data from the database. Python programming language is also used for bootstrapping of data and the data cleaning process.