Difference between revisions of "G7 Confirmatory Analysis"

From Analytics Practicum
Jump to navigation Jump to search
Line 49: Line 49:
  
 
<font color = "#ED515C" face= "Century Gothic" size=16px>
 
<font color = "#ED515C" face= "Century Gothic" size=16px>
Business Unit Exploration
+
Operational Performance/BUs
 
</font>
 
</font>
 
<br>
 
<br>
[[Image:DHL_BUs.png|center|1300x320px]]
 
 
<font color="#212121" face= "Franklin Gothic Book" size=4px>
 
<font color="#212121" face= "Franklin Gothic Book" size=4px>
We wanted to identify the major Business Units in our dataset. BU 4 was the top business unit in terms of number of shipments for the years 2015 & 2016 and ranked second in 2017 accounting for 43.02% of the total shipments in the data. BU 2 ranked second for the years 2015 & 2016 and ranked first for the year 2017 accounting for 28.33% of the total shipments in the data.  
+
First, we wanted to determine if the difference between operational performance across different BUs as observed in our Exploratory Analysis was statistically significant or not. Since the performance is measured by Shipment Status which is a binary variable (1 if Delayed, 0 Otherwise) and independent variable is a categorical nominal variable, we will use Chi-square test to determine the relationship between the 2 variables. The table below shows that the p-value is less than 0.01. Therefore, we can conclude that at 99% confidence level, there is a statistically significant relationship between Shipment Status and Business Units for all three years in our data set.  
 +
 
 +
We also conducted a Phi & Cramer’s V which measures the strength of the association between the two variables. The statistic of about 0.29 suggests a strong relationship.
 
</font>
 
</font>
 +
[[Image:DHL_CABU.PNG|center|1300x320px]]
 +
<font color="#212121" face= "Franklin Gothic Book" size=4px>
 +
We also wanted to test if the distribution of delay days is different for different Business units which can help us identify if certain Business units are delayed by more number of days compared to others or not. We cannot use parametric tests for Delay Days, so we use a non-parametric test to compare the distribution of delay days across different BUs. Since our independent variable is a categorical nominal variable with more than 2 categories, so we use K-Independent Samples Kruskal-Wallis Test.
 +
</font>
 +
[[Image:DHL_CABU2.png|center|1300x320px]]
 +
<font color="#212121" face= "Franklin Gothic Book" size=4px>
 +
As can be seen above, the Null Hypothesis was rejected as the p-value is less than 0.01, which allows the team to conclude at 99% confidence level, the distribution of delay days is not the same across different BUs.
 +
</font>
 +
<br>
 +
<font color = "#ED515C" face= "Century Gothic" size=16px>
 +
Operational Performance/Flight Types
 +
</font>
 +
<font color="#212121" face= "Franklin Gothic Book" size=4px>
 +
We wanted to determine if the difference between operational performance for different Flight type as observed in section 6.4 was statistically significant or not. Since the performance is measured by Shipment Status which is a binary variable (1 if Delayed, 0 Otherwise) and independent variable is a categorical nominal variable, we will use Chi-square test to determine the relationship between the 2 variables. The table below shows that the p-value is less than 0.01. Therefore, we can conclude that at 99% confidence level, there is a statistically significant relationship between Shipment Status and Flight Type for all three years in our data set.
 +
 +
We also conducted a Phi & Cramer’s V which measures the strength of the association between the two variables. The statistic of about 0.26 suggests a moderately strong relationship. 
 +
</font>
 +
[[Image:DHL_CAFT.PNG|center|1300x320px]]
 +
 +
<font color="#212121" face= "Franklin Gothic Book" size=4px>
 +
Similar to the analysis for different Business Units, we wanted to test if the distribution of delay days is different for different Flight Types which can help us identify if certain Flight type is delayed by more number of days compared to others or not. Again, we use K-Independent Samples Kruskal-Wallis Test.
 +
</font>
 +
[[Image:DHL_CAFT2.png|center|1300x320px]]
 +
<font color="#212121" face= "Franklin Gothic Book" size=4px>
 +
As seen from the results obtained, the Null Hypothesis was rejected as the p-value is less than 0.01, which allows the team to conclude at 99% confidence level the distribution of delay days is not the same across different flight types. However, our Flight type also contained the category ‘OTHERS’ which can cause the test to show statistically significant results even if there is no difference between just cargo and passenger flight types. Thus, we decided to conduct a 2 independent samples test (Mann-Whitney) across Passenger and Cargo flight types. The results of which are shown below: 
 +
</font>
 +
[[Image:DHL_CAFT3.PNG|center]]

Revision as of 15:44, 15 April 2018

DHL Common Banner.png

HOME

 

PROJECT OVERVIEW

 

ANALYSIS & FINDINGS

 

PROJECT MANAGEMENT

 

ABOUT US

 

PRACTICUM HOMEPAGE

 

Note: Due to the confidential nature of our project, we will not be able to reveal all the missing values/fields on this wiki.

Operational Performance/BUs
First, we wanted to determine if the difference between operational performance across different BUs as observed in our Exploratory Analysis was statistically significant or not. Since the performance is measured by Shipment Status which is a binary variable (1 if Delayed, 0 Otherwise) and independent variable is a categorical nominal variable, we will use Chi-square test to determine the relationship between the 2 variables. The table below shows that the p-value is less than 0.01. Therefore, we can conclude that at 99% confidence level, there is a statistically significant relationship between Shipment Status and Business Units for all three years in our data set.

We also conducted a Phi & Cramer’s V which measures the strength of the association between the two variables. The statistic of about 0.29 suggests a strong relationship.

DHL CABU.PNG

We also wanted to test if the distribution of delay days is different for different Business units which can help us identify if certain Business units are delayed by more number of days compared to others or not. We cannot use parametric tests for Delay Days, so we use a non-parametric test to compare the distribution of delay days across different BUs. Since our independent variable is a categorical nominal variable with more than 2 categories, so we use K-Independent Samples Kruskal-Wallis Test.

DHL CABU2.png

As can be seen above, the Null Hypothesis was rejected as the p-value is less than 0.01, which allows the team to conclude at 99% confidence level, the distribution of delay days is not the same across different BUs.
Operational Performance/Flight Types We wanted to determine if the difference between operational performance for different Flight type as observed in section 6.4 was statistically significant or not. Since the performance is measured by Shipment Status which is a binary variable (1 if Delayed, 0 Otherwise) and independent variable is a categorical nominal variable, we will use Chi-square test to determine the relationship between the 2 variables. The table below shows that the p-value is less than 0.01. Therefore, we can conclude that at 99% confidence level, there is a statistically significant relationship between Shipment Status and Flight Type for all three years in our data set.

We also conducted a Phi & Cramer’s V which measures the strength of the association between the two variables. The statistic of about 0.26 suggests a moderately strong relationship.

DHL CAFT.PNG

Similar to the analysis for different Business Units, we wanted to test if the distribution of delay days is different for different Flight Types which can help us identify if certain Flight type is delayed by more number of days compared to others or not. Again, we use K-Independent Samples Kruskal-Wallis Test.

DHL CAFT2.png

As seen from the results obtained, the Null Hypothesis was rejected as the p-value is less than 0.01, which allows the team to conclude at 99% confidence level the distribution of delay days is not the same across different flight types. However, our Flight type also contained the category ‘OTHERS’ which can cause the test to show statistically significant results even if there is no difference between just cargo and passenger flight types. Thus, we decided to conduct a 2 independent samples test (Mann-Whitney) across Passenger and Cargo flight types. The results of which are shown below:

DHL CAFT3.PNG