G7 Confirmatory Analysis
Operational Performance/BUs
First, we wanted to determine if the difference between operational performance across different BUs as observed in our Exploratory Analysis was statistically significant or not. Since the performance is measured by Shipment Status which is a binary variable (1 if Delayed, 0 Otherwise) and independent variable is a categorical nominal variable, we will use Chi-square test to determine the relationship between the 2 variables. The table below shows that the p-value is less than 0.01. Therefore, we can conclude that at 99% confidence level, there is a statistically significant relationship between Shipment Status and Business Units for all three years in our data set.
We also conducted a Phi & Cramer’s V which measures the strength of the association between the two variables. The statistic of about 0.29 suggests a strong relationship.
We also wanted to test if the distribution of delay days is different for different Business units which can help us identify if certain Business units are delayed by more number of days compared to others or not. We cannot use parametric tests for Delay Days, so we use a non-parametric test to compare the distribution of delay days across different BUs. Since our independent variable is a categorical nominal variable with more than 2 categories, so we use K-Independent Samples Kruskal-Wallis Test.
As can be seen above, the Null Hypothesis was rejected as the p-value is less than 0.01, which allows the team to conclude at 99% confidence level, the distribution of delay days is not the same across different BUs.
Operational Performance/Flight Types
We wanted to determine if the difference between operational performance for different Flight type as observed in section 6.4 was statistically significant or not. Since the performance is measured by Shipment Status which is a binary variable (1 if Delayed, 0 Otherwise) and independent variable is a categorical nominal variable, we will use Chi-square test to determine the relationship between the 2 variables. The table below shows that the p-value is less than 0.01. Therefore, we can conclude that at 99% confidence level, there is a statistically significant relationship between Shipment Status and Flight Type for all three years in our data set.
We also conducted a Phi & Cramer’s V which measures the strength of the association between the two variables. The statistic of about 0.26 suggests a moderately strong relationship.
Similar to the analysis for different Business Units, we wanted to test if the distribution of delay days is different for different Flight Types which can help us identify if certain Flight type is delayed by more number of days compared to others or not. Again, we use K-Independent Samples Kruskal-Wallis Test.
As seen from the results obtained, the Null Hypothesis was rejected as the p-value is less than 0.01, which allows the team to conclude at 99% confidence level the distribution of delay days is not the same across different flight types. However, our Flight type also contained the category ‘OTHERS’ which can cause the test to show statistically significant results even if there is no difference between just cargo and passenger flight types. Thus, we decided to conduct a 2 independent samples test (Mann-Whitney) across Passenger and Cargo flight types. The results of which are shown below:
Since the p-value is less than 0.01, the test results show at 99% confidence level, the distribution of delay days is different for Cargo and Passenger flight types.