ANLY482 AY2017-18T2 Group19 Data Exploration Findings Final

From Analytics Practicum
Jump to navigation Jump to search
G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


ANALYSIS

As established, there are 2 distinct groups of library users; those who overdue the books they borrowed until they are done with it, and those who borrow in succession until they are done with it. These 2 groups will be analysed separately. Insights found from the 2016 transactional data will be compared against the 2017 transactional data with the objective of investigating if the 1-hour extension in loan policy helped made the loan policy more sufficient for each profile of undergraduate students.


Users Who Overdue

For this group of users, insufficiency would be measured through the frequency of overdue transactions and the length of the overdue periods.


Frequency of Overdue Transactions

With the 1-hour extension in loan policy from 2016 to 2017, a lower frequency of overdue transactions would be expected assuming that undergraduate students maintain the same definition of sufficiency. In the following segment, the frequency of overdue transactions will be derived for both 2016 and 2017 before delving into the following up on potential insights.

Alt text
Figure 1: Distribution of Frequency of Overdue Transactions in 2016
Alt text
Figure 2: Distribution of Frequency of Overdue Transactions in 2017

35.31% of loan transactions were found to be overdue in 2016 (Figure 1) while 11.16% of loan transactions in 2017 were found to be overdue (Figure 2). This decrease in the percentage of overdue transactions could indicate towards the 1-hour extension in loan policy having a positive impact on such users. As such, it was of interest if this difference had statistical backing.

A Contingency Analysis was conducted with the intention of determining if there was statistical significance that the difference between the frequency distribution of overdue transactions across the years 2016 and 2017. The null hypothesis states that the frequency distribution of overdue transactions in 2016 and 2017 are equal. The Contingency report was derived and the results are described below.

Alt text
Figure 3: Mosaic Plot on the Frequency of Transactions that were Overdue across years 2016 and 2017
Alt text
Figure 4: Contingency Table on the Frequency of Transactions that were Overdue across years 2016 and 2017
Alt text
Figure 5: Tests Report on the Frequency of Transactions that were Overdue across years 2016 and 2017

At α=0.05, the test results show that the p-values of both Likelihood Ratio and Pearson tests are <.0001, hence rejecting the null hypothesis. This shows that there is a difference between the frequency distribution of overdue transactions in 2016 and 2017. This analysis was furthered with Fisher’s Exact Test which tested for the alternative hypothesis that the distribution of overdue transactions is greater in 2016 than in 2017. At α=0.05, the test results show that the p-value is <0.0001, thereby rejecting the null hypothesis and allowing for the conclusion that the probability of a transaction that is overdue occurring was higher in 2016 than in 2017.


Duration of Overdue Period

In this segment, the distribution of the duration of overdue period observed in 2016 and 2017 are first derived and explored before delving into the following up on potential insights. With a 1-hour increment in loan policy, it is expected that the overdue period be shorter and its distribution skew to the left.

Alt text
Figure 6: Distribution of Overdue Period in 2016

50% of the borrowings were observed to overdue for more than 2.45 hours and 25% of the borrowings were overdue for more than 13.87 hours. When consulting the ‘Summary Statistics’ report, the skewness statistics is noted to be a value of -2.99. A negative skewness value indicates that the distribution is negatively skewed. This could be due to the instances where individuals are observed to have borrowed 2-hour loan policy books for more than 100 hours.

Alt text
Figure 7: Distribution of Overdue Period in 2017

50% of the borrowings were overdue for up to 0.53 hours and 75% of the borrowings were overdue for up to 1.48 hours. When consulting the ‘Summary Statistics’ report, the skewness statistics shows itself as having a value of -7.97. The distribution of ‘overdue_period’ in 2017 appears to be more negatively skewed as compared to in 2016. This shows that in 2017, the transactions tend to observe an overdue period closer to 0, which indicates towards an increased sufficiency level for this user profile.

Following this discovery, the Goodness-of-Fit Test was conducted to determine if the distribution of ‘overdue_period’ in both years follow a normal distribution. The null hypothesis for such an analysis would state that the data follows the Normal distribution. The following figures detail the results:

Alt text
Figure 8: Goodness-of-Fit Test of Overdue Period in 2016
Alt text
Figure 9: Goodness-of-Fit Test of Overdue Period in 2017

As can be seen in Figure 8 and 9, at α=0.05, the test results show that both p-values are <0.0001, hence rejecting the null hypothesis. The stand of the alternative hypothesis can be adopted, and hence concluding that data for both overdue periods in 2016 and 2017 do not follow the Normal distribution.

Given the non-normality of the data, a nonparametric statistical test, the Wilcoxon Rank-Sums Test, was conducted to confirm if there is a significant difference between the distribution of overdue periods observed in 2016 and 2017. The null hypothesis states that the distribution of the overdue periods observed in 2016 and 2017 are equal and the alternative hypothesis states that the distribution of the overdue periods observed in 2016 and 2017 are not equal.

Alt text
Figure 10: Wilcoxon Rank Sums Test Results for Overdue Period across Years 2016 and 2017

Both the normal and the chi-square approximations for the Wilcoxon test statistic indicate significance at a p-value of <.0001 (Figure 10). As such, at α =0.05, the null hypothesis is rejected and the alternative hypothesis is confirmed. The distributions in the overdue period in 2016 and 2017 are proven to be significantly different. With this, given that the median of overdue period in 2017 is smaller than in 2016, it can be inferred that transactions in 2017 overdue for an amount of time closer to 0 than in 2016 after the 1-hour extension in loan policy.


Users Who Borrow Successively

In the following discussion, the 2-hour and 3-day loan transactions were appended together for 2016 and 3-hour and 3-day loan transactions were appended together for 2017. SMU Libraries believed that this would provide a more accurate depiction as course reserve materials with the same titles could have different loan policies. Past studies and observations conducted by SMU Libraries have shown that undergraduate students prefer borrowing course reserves with the 3-day loan policy but should those books be unavailable, they would settle for those with 2-hour or 3-hour loan policy instead.

For this group of users, insufficiency would be measured through the frequency of succession borrows and the length of total hours borrowed after accounting for successive borrows.


Frequency of Succession Borrows


With a 1-hour extension in the loan policy from 2-hour in 2016 to 3-hour in 2017, a decrease in the frequency of succession borrows would be expected if the undergraduate students’ definition of sufficiency remained constant. Thus, in this segment, the paper plans to investigate if the distribution of the frequency of succession borrows changed significantly between 2016 and 2017.

In this segment, the distribution of the frequency of succession borrows observed in 2016 and 2017 are first derived and explored before delving into the following up on potential insights.

Alt text
Figure 11: Distribution of the Frequency of Successive Borrows in 2016
Alt text
Figure 12: Distribution of the Frequency of Successive Borrows in 2017

Comparing the distribution of the data in 2016 and 2017 shown in Figure 11 and 12 respectively, it seems that the proportion of undergraduate students borrowing only once and not requiring a subsequent borrow was lower in 2016 than 2017. 89.10% of transactions were observed to be one-time off borrows in 2016 while in 2017, such transactions only make up 85.13% of the total recorded transactions. In addition, in both years, 99% of the transactions observe 3 or less borrowings in succession. It was then of interest if this finding is statistically significant as it might serve as a potential indication towards the futility of 1-hour extension policy implementation.

A contingency analysis was conducted to confirm if there is a difference in the frequency of succession borrows observed in 2016 and 2017. The null hypothesis hence states that the distribution of the frequency of succession borrows in 2016 and 2017 are equal and the alternative hypothesis states that the distribution of the frequency of succession borrows in 2016 and 2017 are not equal.

Alt text
Figure 13: Mosaic Plot of the Number of Successive Borrows
Alt text
Figure 14: Contingency Table of the Number of Successive Borrows
Alt text
Figure 15: Tests of the Number of Successive Borrows

At α=0.05, the test results show that the p- values of both Likelihood Ratio and Pearson tests are <.0001, hence rejecting the null hypothesis. This shows that there is a difference between the frequency distribution in 2016 and 2017. These differences are not just by chance. Judging from the Mosaic Plot and Contingency Table in Figure 13 and 14, it can be inferred that the probability of undergraduate students borrowing only once is higher in 2016 than 2017. This means that the probability that an undergraduate student finding the loan policy insufficient and needing a succeeding borrow is higher in 2017 than in 2016.


Duration of Hours Borrowed with Successions

After accounting for successive borrowing behaviours that the undergraduate students exhibit, this segment aims to find out a more accurate depiction of the distribution of the hours the undergraduate students are using the course reserve materials for. Previously, each transaction was viewed as independent from one another. This field considers the underlying possibility that undergraduate students are returning the books only to check it out again as the loan period is insufficient.

In this segment, the paper will explore the distribution of the hours borrowed first before delving into the following up on potential insights.

Alt text
Figure 16: Distribution of Hours Borrowed with Successions in 2016

In 2016, after accounting for the possibility of user successions, 50% of the transactions are observed to have borrowed for at least 2.23 hours while 25% of the transactions observed borrowings of at least 8.53 hours long. These findings can be observed from Figure 16.

Alt text
Figure 17: Distribution of Hours Borrowed with Successions in 2017

In 2017, after accounting for the possibility of user successions, 50% of the transactions are observed to have borrowed for at least 2.46 hours while 25% of the transactions observed borrowings of at least 3.86 hours long. These findings can be observed from Figure 17.

Following these discoveries, both the data from 2016 and 2017 are then tested for normality using the Goodness-of-Fit Test. The null hypothesis states that the data follows the Normal distribution.

Alt text
Figure 18: Goodness-of-Fit Test for Hours Borrowed with Successions in 2016
Alt text
Figure 19: Goodness-of-Fit Test for Hours Borrowed with Successions in 2017

As can be seen in Figure 18 and 19, at α=0.05, the test results show that both p-values are <0.0001, hence rejecting the null hypothesis. The stand of the alternative hypothesis can be adopted, and hence concluding that data for both overdue periods in 2016 and 2017 do not follow the Normal distribution.

Given that data from both 2016 and 2017 are found to lack normality, nonparametric test, Wilcoxon Rank Sums Test, was conducted to confirm the significance of the difference in hours borrowed with successions observed in 2016 and 2017. The null hypothesis, in this case, would be that the distribution of the hours borrowed with successions in 2016 and in 2017 are equal.

Alt text
Figure 20: Wilcoxon Rank Sums Test Results for Hours Borrowed with Successions

Both the normal and the chi-square approximations for the Wilcoxon test statistic indicate significance at a p-value of <.0001 as shown in Figure 20. As such, at α=0.05, there is sufficient evidence to reject the null hypothesis and confirm the alternative which is that the distribution in the hours borrowed with successions in 2016 and 2017 are different. It can be inferred that the distribution in the hours borrowed with successions became less negatively skewed in 2017 than in 2016.

The other main objective of this paper was to find out if there was sufficient evidence indicating towards the need to extend the loan policy further. Thus, a one-sample test for the mean using the nonparametric Wilcoxon Signed-Rank Test was conducted to aid with this analysis. Given that the current loan period is set at 3 hours, the probability of undergraduate students underutilising, fully utilising or over utilising the loan period was of much interest. Looking at just the 2017 data, the null hypothesis of the Wilcoxon Signed-Rank Test thus states that the median is equal to the postulated value set at 3.

Alt text
Figure 21: Wilcoxon Signed-Rank Test Results for Hours Borrowed with Successions

At α=0.05, the test results show a p-value of less than 0.001 as can be seen in Figure 21, thereby rejecting the null hypothesis. It can be concluded with 95% confidence that this population has a median distinct from 3.0 hours. In fact, when a lower tailed test is conducted, the p-value remains at <0.001, hence rejecting the null hypothesis and allowing for the conclusion that the median of the hours borrowed with successions is less than 3 hours.


DISCUSSION

The library patrons can be profiled into 2 categories: users who overdue and users who exhibit succession borrowing behaviour. SMU Libraries implemented a 1-hour extension in loan policy in hopes of the loan policy becoming more aligned with the undergraduate students’ expectations. The impact of the implementation of this update in the loan policy on the user profiles was analysed and the following conclusions were obtained:

1. Users who overdue

  • The probability of a transaction being overdue is higher in 2016 than in 2017. With a 1-hour extension in the loan policy, a higher proportion of the transactions did not overdue the course reserves and hence, it could be inferred that the loan policy became more sufficient for this group of users. This is a potential indication towards an increasing alignment with undergraduate students’ loan policy expectations.
  • The distribution of hours borrowed in 2017 is less negatively skewed than in 2016. The loan transactions in 2017 observe a shorter overdue period than loan transactions in 2016. It can be inferred that undergraduate students overdue for a shorter amount of time closer to 0 in 2017 than in 2016. This is a potential indication towards an increasing alignment with undergraduate students’ loan policy expectations.

2. Users who borrow in succession

  • It was established with statistical significance that there is a difference in the distribution in the hours borrowed with successions in 2016 and 2017. It was inferred that the distribution in the hours borrowed with successions became less negatively skewed in 2017 than in 2016, thereby indicating towards the potential alignment with undergraduate students’ loan policy expectations.
  • In 2017, it was established with statistical significance that the median of hours borrowed with successions is less than 3 hours. In 2017, 50% of undergraduate students do not fully utilise the 3-hour loan policy that is assigned to them. This could be a potential indication towards an alignment with undergraduate students’ loan policy expectations.
  • It was established with statistical significance that there is a difference between the frequency distribution of succession borrows in 2016 and 2017. Judging from the distribution, the probability that a transaction sees a succeeding borrow is potentially higher in 2017 than in 2016. This could be a potential indication towards a deviation from undergraduate students’ loan policy expectations.

Further analysis concerning the users who borrow in succession is required. When it was found that the probability that a transaction sees a succeeding borrow is potentially higher in 2017 than in 2016, this claim was investigated further. It was found that there were many more transactions involving the 3-day loans in 2016 than in 2017. As can be seen in Figure 22, there was a decrease in the number of transactions involving the loan of 3-day course reserve collection in 2017.

Alt text
Figure 22: Frequency Distribution of Transactions Involving 3-Day Course Reserve Collection across 2016 and 2017

The distribution of the frequency of succession borrows involving the 3-day course reserve collection was further investigated.

Alt text
Figure 23: Distribution of the Frequency of Succession Borrow in 2016
Alt text
Figure 24: Distribution of the Frequency of Succession Borrow in 2017

For transactions involving the 3-day course reserve collection, undergraduate students typically do not require a successive borrowing. In 2016, 94.73% of the transactions did not observe a subsequent borrow (Figure 23). This trend was similar in 2017 as well where 93.51% of transactions did not observe a subsequent borrow (Figure 24). On this note, with at least 3 times more transactions involving 3-day course reserve collection in 2016 than in 2017, it is natural that the proportion of one time borrows becomes higher in 2016 than in 2017.

Upon discovery of this finding, new information came to light. In 2017, there was another major change happening that could have influenced these results. Since the implementation of the 1-hour extension in loan policy, SMU Libraries was noted to have shifted the 3-day course reserve collection to another location in the library, away from the 3-hour course reserve collection in 2017. In 2016, the 2-hour course reserve collection was together with the 3-day course reserve collection while in 2017, the collections were apart. This movement went largely unnoticed to SMU undergraduate students; the client found that students was unaware that at another part of the library, the same course reserve title with a longer loan policy was available for circulation. Further studies should investigate into the potential of the impact this change in location would have on the borrowing pattern of the undergraduate students in 2017.

When the distribution of the frequency of succession borrows was derived with datasets including the course reserves of only 2-hour and 3-hour loan policies, the distributions were not as different as can be seen in Figure 25 and 26 respectively.

Alt text
Figure 25: Distribution of Frequency of Succession Borrow in 2016
Alt text
Figure 26: Distribution of Frequency of Succession Borrow in 2017

Once the 3-day loans are excluded from the analysis, the proportion of transactions observed to borrow a book title only once dropped from 86.17% in 2016 (Figure 25) to 84.70% in 2017 (Figure 26). Additional observation includes 99% of the transactions observe 4 or less borrowings in succession in 2016 compared to 3 or less borrowings in succession in 2017.

A Contingency Analysis was conducted with the intention of determining if there was statistical significance that the difference between the frequency of successive borrows across the years 2016 and 2017. The null hypothesis states that the frequency of successive borrows in 2016 and 2017 are equal. The Contingency report was derived, and the results are described below.

Alt text
Figure 27: Mosaic Plot of successive borrows by year
Alt text
Figure 28: Contingency Table of Successive Borrows by Year
Alt text
Figure 29: Tests of the Successive Borrows

At α=0.05, the test results show that the p-values of both Likelihood Ratio and Pearson tests are <.0.05, hence rejecting the null hypothesis. This shows that there is a difference between the frequency of succession borrowings in 2016 and 2017.

This analysis was furthered with Fisher’s Exact Right-Tail Test which tested for the alternative hypothesis that the distribution of succession borrows is greater in 2017 than in 2016. At α=0.05, the test results show that the p-value is <0.05, thereby rejecting the null hypothesis and allowing for the conclusion that the probability of a succession borrows occurring was higher in 2017 than in 2016.

Since the change in location seems to be a major contributing factor, the accuracy of the claim that the update in loan policy did not benefit the users who perform succession borrowing could have been compromised. As such, all the above considerations should be taken into account for a proper conclusion towards whether this is indeed an indication towards an insufficiency of the 3-hour loan policy for the users who borrow in succession.


CONCLUSION

On a whole, the 1-hour extension in loan policy from 2-hour to 3-hour in 2017 saw an improvement in the users who overdue where the probability of an overdue transaction became less and the length of overdue period became shorter. The impact of the 1-hour extension in loan policy from 2-hour to 3-hour in 2017 on the users who borrow in succession is inconclusive. Although the length of hours borrowed with successions became less negatively skewed given the smaller spread, it was noted that the 3-day course reserve collection was moved in 2017 and this might have compromised the insights found regarding the frequency of succession borrow and the hours borrowed with successions of undergraduate students. This should be further investigated to confirm the accuracy of this statement.

Given the increased sufficiency observed by the users who overdue, there is no evidence that SMU Libraries should not extend the loan policy by another hour given the existence of overdue transactions. Based on the overdue period in 2017, 50% of transactions observed to need up to 0.53 hours more. Based on the hours borrowed with succession, 25% of transactions observed to need at least 3.86 hours. It is proposed that SMU Libraries consider implementing a 4-hour loan policy and conducting another analysis after a 12-month cycle of implementation. The same steps should be taken and should the results show that the circumstances improve, the 1-hour extension in loan policy would be justified. In the 2017 data, the median of the hours borrowed with succession is proven with statistical significance that it is less than 3 hours. This test should be conducted once again against a stipulated value of 3. Should there be significant evidence to show that the median remains less than 3, this 1-hour extension in loan policy may not be justified and SMU Libraries can consider reverting to the 3-hour loan policy.