Difference between revisions of "ANLY482 AY2017-18T2 Group03 Data Analysis"

From Analytics Practicum
Jump to navigation Jump to search
Line 59: Line 59:
 
1. Year 2015 Summary Statistics<br/>
 
1. Year 2015 Summary Statistics<br/>
 
[[Image:Summary Statistics for Inbound Report 2015.png|800px]]
 
[[Image:Summary Statistics for Inbound Report 2015.png|800px]]
 +
<br/>
  
 
2. Year 2016 Summary Statistics<br/>
 
2. Year 2016 Summary Statistics<br/>
 
[[Image:Summary Statistics for Inbound Report 2016.png|800px]]
 
[[Image:Summary Statistics for Inbound Report 2016.png|800px]]
 +
<br/>
  
 
3. Year 2017 Summary Statistics<br/>
 
3. Year 2017 Summary Statistics<br/>
 
[[Image:Summary Statistics for Inbound Report 2017.png|800px]]
 
[[Image:Summary Statistics for Inbound Report 2017.png|800px]]
 +
<br/>
  
 
As we are also looking at the data from October 2016 to December 2017 only, since there was a change in the warehouse and the way LocName was stored, we also did the summary statistics for Before October 2016 and for October 2016 onwards. This is so that we can do a comparison, if need be, for the old warehouse compared to the new warehouse.
 
As we are also looking at the data from October 2016 to December 2017 only, since there was a change in the warehouse and the way LocName was stored, we also did the summary statistics for Before October 2016 and for October 2016 onwards. This is so that we can do a comparison, if need be, for the old warehouse compared to the new warehouse.
Line 70: Line 73:
 
1. Before October 2016 Summary Statistics<br/>
 
1. Before October 2016 Summary Statistics<br/>
 
[[Image:Summary Statistics for Inbound Report Before October 2016.png|800px]]
 
[[Image:Summary Statistics for Inbound Report Before October 2016.png|800px]]
 +
<br/>
  
 
2. October 2016 onwards Summary Statistics<br/>
 
2. October 2016 onwards Summary Statistics<br/>
 
[[Image:Summary Statistics for Inbound Report October 2016 onwards.png|800px]]
 
[[Image:Summary Statistics for Inbound Report October 2016 onwards.png|800px]]
 +
<br/>
  
 
We also conducted Exploratory Data Analysis on the Inbound dataset.
 
We also conducted Exploratory Data Analysis on the Inbound dataset.
Line 78: Line 83:
 
The chart below shows the GRN Date vs GRN Post Date. This represents the time difference between GRN Post Date and GRN Date and refers to the time taken to scan all the inbound goods. The average duration taken is 1.42 hours to complete the scanning. Also, we see that the peaks for each year is different. For 2015, the peak is in August. For 2016, the peak is in July and for 2017, the peak is in April.<br/>
 
The chart below shows the GRN Date vs GRN Post Date. This represents the time difference between GRN Post Date and GRN Date and refers to the time taken to scan all the inbound goods. The average duration taken is 1.42 hours to complete the scanning. Also, we see that the peaks for each year is different. For 2015, the peak is in August. For 2016, the peak is in July and for 2017, the peak is in April.<br/>
  
[[Image:Average GRN Date vs GRN Post Date Chart.png|800px]]
+
[[Image:Average GRN Date vs GRN Post Date Chart.png|500px]]
 +
<br/>
  
 
===Outbound Report<br/>===
 
===Outbound Report<br/>===
 
...
 
...

Revision as of 20:37, 26 February 2018

AY2017-18T2 Group03 Team Logo.png


HOME ABOUT US PROJECT OVERVIEW DATA ANALYSIS PROJECT MANAGEMENT DOCUMENTATION MAIN PAGE
Previous Current


Methodology

In this section, we will explain the methodology which our team plan to implement to perform analysis on the data provided by our sponsor.

We will be using Python for Exploratory Data Analysis (EDA) to better understand the dataset given and its characteristics. As part of data preprocessing, our team will be performing the following steps to obtain a clean dataset. These steps will eventually be converted into a script which will be used to clean the data file that is uploaded into the dashboard which we will develop for our sponsor.

Data Preprocessing

With every new dataset, we first must clean the data to remove irrelevant data that should not be included in our analysis. For data cleaning, the steps include:

  • Handling missing values. If there are missing values in a row of record, the entire row will be excluded because it will be inaccurate to include it.
  • Handling duplicate data. Duplicate data could occur when the employees double scan the barcode upon inbound of goods. Similarly, in the event of duplicate data, we will remove the entire row as well.
  • Resolving redundancies caused by data integration.

With the clean dataset, we will proceed to further explore the data and find out potential visualizations and analysis that can be done with the dataset to provide a more in-depth analysis and dashboard that will be useful for our sponsor.

S/N Data Cleaning Steps Justification & Rationale
1 ... ...

Exploratory Data Analysis

Inbound Report

The following few diagrams show the basic summary statistics for the Inbound Report for the years 2015 to 2017.

1. Year 2015 Summary Statistics
Summary Statistics for Inbound Report 2015.png

2. Year 2016 Summary Statistics
Summary Statistics for Inbound Report 2016.png

3. Year 2017 Summary Statistics
Summary Statistics for Inbound Report 2017.png

As we are also looking at the data from October 2016 to December 2017 only, since there was a change in the warehouse and the way LocName was stored, we also did the summary statistics for Before October 2016 and for October 2016 onwards. This is so that we can do a comparison, if need be, for the old warehouse compared to the new warehouse.

1. Before October 2016 Summary Statistics
Summary Statistics for Inbound Report Before October 2016.png

2. October 2016 onwards Summary Statistics
Summary Statistics for Inbound Report October 2016 onwards.png

We also conducted Exploratory Data Analysis on the Inbound dataset.

The chart below shows the GRN Date vs GRN Post Date. This represents the time difference between GRN Post Date and GRN Date and refers to the time taken to scan all the inbound goods. The average duration taken is 1.42 hours to complete the scanning. Also, we see that the peaks for each year is different. For 2015, the peak is in August. For 2016, the peak is in July and for 2017, the peak is in April.

Average GRN Date vs GRN Post Date Chart.png

Outbound Report

...