ANLY482 AY2017-18T2 Group19 Data Exploration Findings

From Analytics Practicum
Jump to navigation Jump to search
G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


3-Hour Data Exploration

We were interested in first understanding the borrowing patterns of course reserve materials library users. For a start, we looked into the numbers of hours borrowed each time by visualizing the distribution of the ‘hours_borrowed’ variable onto a histogram plot and boxplot (Figure 1). In order to take into account the fact that library users are allowed different assigned loan periods when borrowing at different hours of the day according to the loan policies stipulated by SMU Libraries, we performed the same analysis with the ‘sufficiency_measure’ variable (Figure 2).

Alt text
Figure 1 shows the distribution of the 'hours_borrowed' variable for 3-hour dataset
Alt text
Figure 2 shows the distribution of 'sufficiency_measure' variable for 3-hour dataset

After analyzing the distribution on JMP, we find that 75% of library users borrow the 3-hour course reserve materials for 2.86 hours. On this note, in the ‘Summary Statistics’ section, we can see that one can be 95% confident that the hours borrowed by the next user would lie between 2.76 and 2.99 hours. When looking at the ‘sufficiency_measure’ variable, we see that 75% of the users have been satisfied with the 3-hour loan period, and still having at least 0.18 hours to spare before the borrowed book becomes overdue. In addition, we are 95% confident that library users would have between 1.12 and 1.37 hours more to spare when borrowing a 3-hour loan course reserve. These are evidences potentially suggesting the sufficiency of this 3-hour loan for at least 75% of the library users.

On a separate note, it was observed that 10% of library users borrow for hours ranging from 3.4 to 329 hours. With 10% of the transaction instances observing a -0.06 sufficiency level, we can also say that the 3-hour loan period is insufficient for 10% of library users who returns the books 0.06 hours past the assigned loan hours.


Days of Week

We furthered our analysis by comparing the ‘hours_borrowed’ variable with other variables, like for example, if there are differences in the ‘hours_borrowed’ between the various days of the week. Figure 3 shows that weekends have higher sufficiency measure as compared to weekdays, hence, users who borrow during the weekends tend to not fully utilized the loan period assigned.

Alt text
Figure 3 shows the average sufficiency measure by day of the week

As we further our analysis with HSD test, we could safely conclude with significant evidence that Friday and Saturday observes borrowing hours longer than the hours borrowed on Monday, Tuesday and Wednesday. Likewise, with the ‘sufficiency_measure’ variable, it is statistically significant that books borrowed on the weekends observe a higher sufficiency level than that of weekdays. These findings are possibly attributed to the longer assigned loan period that is allowed to the users given the later opening hours on weekends.

In addition, Chinese New Year holidays begun on Saturday, 28 January in the year 2017. As such, borrowings on Friday, 27 January were affected as borrowed books are required to return only in the following week when the library resumes its operations. As such, it follows that the ‘hours_borrowed’ variable observes larger numbers during this period. In addition, we found that course reserves borrowed on Tuesdays and Wednesdays are statistically significantly more sufficient than on Fridays. This holiday resulted in users returning their books 1 day late as the library opens on the 3rd day of the lunar new year. This relatively larger negative sufficiency level could have influenced the Friday’s mean value and thereby, justifying the observation.


Patron Group

We speculated that the background of the library users would play a role in influencing borrowing patterns and hence, we looked into ‘patron_group’ variable as we were interested to see if borrowing patterns differ between different patron groups. As seen in Figure 4, ‘Adjunct’, ‘Faculty’ and ‘Admin Staff’ patron groups have significant negative sufficiency measure. This means that they generally borrow for longer hours than the other patron groups.

Alt text
Figure 4 shows the sufficiency measure for 3-hour loans by school and patron group

After performing the relevant tests, we found that it was statistically significant that the abovementioned patron groups borrow for longer hours than the rest, and hence, it is consistent with our findings. This could be because they are not subjected to the same penalties as students are when borrowing the materials. This is further supported as it is statistically significant that books borrowed by the rest are more sufficient than the 3 patron groups who have a negative mean sufficiency measure. In addition, the differences in hours borrowed between PhD, Master and Undergraduate students were found to be insignificant.


Academic Term

Given the pertinent differences in traffic flow into library between different academic terms, we decided to look into the ‘term’ variable as we were interested to see if borrowing patterns differ between different academic terms.

The graph below shows that ‘Break’ and ‘AY16-17 T3B’ terms have negative average sufficiency measure.

Alt text
Figure 5 shows the average sufficiency measure by academic terms

As we follow up our results with the Tukey’s HSD test, it suggests that library users are observed to borrow for statistically significantly longer hours during the ‘Break’, ‘AY16-17 T3A’ and ‘AY16-17 T3B’ academic terms than in ‘AY16-17 T2’ and ‘AY17-18 T1’ academic periods. It is statistically significant that books borrowed during ‘AY16-17 T2’, ‘AY17-18 T1’ and ‘AY16-17 T3A’ are more sufficient than ‘AY16-17 T3B’ and ‘Break’. Taking into account this fact that library users borrow for longer hours during ‘Break’ and ‘AY16-17 T3B’ academic terms, it follows that the 3-hour loan policy is relatively less sufficient during these periods. A possible reason for this is because during the ‘Break’, ‘T3A’ or ‘T3B’ period, library users may not frequent the school as regularly and hence, it will result in a longer borrowing time and a lower sufficiency measure value.


Hour of Day

We looked into ‘hour’ variable as we were interested to see if borrowing patterns differ between different hours of the day, and if library users have the tendency to borrow during certain hours of the day. It is statistically significant that borrowings from 9pm to 12am observe longer hours than the rest of the day. Similar to when compared with the ‘day_of_week’ variable, this observation could be a result of the loan policy allowing for overnight loans after 9pm for weekdays and 6pm on weekends. By this we would assume that because there is a longer borrowing hours within these period, the ‘sufficiency_measure’ will be lower. However, if we look at the ‘sufficiency_measure’ result, it is shown that books borrowed from 6pm to 11pm is statistically more sufficient than most of the timings (Figure 6).

Alt text
Figure 6 shows the heatmap of sufficiency measure by day of week and hour of day

As such, if library users are found to borrow after 9pm on Monday for example, they would be entitled to 10 borrowing hours of the course reserve, and hence, the observation of longer hours of borrowing naturally ensues. But even if they return at 12am, they have a ‘sufficiency_measure’ of 7 hours which will be more than the usual 3 hours given to the patrons. This may mean that patrons who borrow during these timings do not wish to bring the book home to enjoy the overnight policy and instead just simply return them before leaving.


Exam Week

Given that we are examining course reserves that are classified under compulsory readings by professors, we assume that students would require it during timings nearing the exam weeks. As such, it was expected that the usage of these course reserves would intensify during the exam weeks. Hence, we looked into the ‘exam_week’ variable and it is found that non exam weeks have a higher average sufficiency measure.

Alt text
Figure 7 shows the average sufficiency measure ‘exam_week’

With the HSD test, found that it is statistically significant that borrowings during exam week does observe longer hours. This is consistent with the test done on the ‘sufficiency_measure’ where it is statistically significant that books borrowed during the exam week is less sufficient than non-exam weeks.


Break Week

During the week before the final exams, we thought that students would step up their studying and usage of course reserve materials for revision purposes. However, it was found that there is insufficient evidence to suggest that the hours borrowed by library users during these break weeks are different from the borrowings occurring not during break weeks. Patrons are not observed to hog the course reserve materials for longer than 3h, where the mean borrowing hours during break week or not is 2.77h and 2.89h respectively.

Alt text
Figure 8 shows the average sufficiency measure by ‘break_week’

However, it is shown that borrowings during break week is more sufficient than those made during non-break weeks (Figure 7). This is probably because adjuncts borrowed books during non-break weeks and that as shown earlier, they tend to borrow for longer hours and have higher chances of overdue-ing.


Number of borrowings against hour of day

We plotted the ‘number of borrows’ onto a heatmap (Figure 9) with y-axis of hour of day and x-axis of day of week. It pointed us into a plausible finding whereby the colors seemed more concentrated from 1200 to 1700 during the weekdays.

Alt text
Figure 9 shows the heatmap of number of borrows by day of week and hour of day

This is consistent with the HSD test as it shows that the number of borrows for hours between 1200 to 1700 are significantly different from the rest of the day. There are indeed more loan transactions during this period.


3-Day Data Exploration

The following figure describes the distribution of the ‘hours_borrowed’ variable for the 3-day transaction dataset. To standardize the units, a 3-day loan period translates to a 72-hour loan period. We are 95% confident that the mean of hours borrowed would lie between the range of 50.11 to 53.60 hours and that the mean of sufficient time is between 18.40 and 21.88. In addition, 50% of library users borrow the course reserve materials for less than 54.96 hours while 25% of library users are observed to borrow course reserves for more than 72.48 hours. As such, this potentially suggests the insufficiency of the 3-day loan policy for 25% of library users. This is consistent with the test conducted on the ‘sufficiency_measure’ where the loan is insufficiency for slightly more than 25% of the library users which accounts to about 350 borrows.

Alt text
Figure 10 shows the distribution of the 'hours_borrowed' variable for 3-day dataset
Alt text
Figure 11 shows the distribution of the sufficiency_measure variable for 3-day dataset


Day of Week

Friday has the lowest average sufficiency measure as compared the other days of the week (Figure 12). Users who borrowed their books on Fridays tend to utilize more of the assigned loan period.

Alt text
Figure 12 shows the average sufficiency measure by 'day_of_week'

From the HSD test, it shows that it is statistically significant that library users checking out the course reserve materials on Friday borrow for longer hours than on those borrowing on Monday, Wednesday, Thursday and Sunday. This is the same for the sufficiency measure where it is statistically significant where borrowings on these days are more sufficient than Friday. This means that books borrowed on Friday are usually kept over the weekends and returned on Monday which almost utilize the full 3-day loan period as compared to the others where they are usually returned closer to 2 days. The sufficiency measure for Monday, Wednesday, Thursday and Sunday have a sufficiency measure of about 22 or 23. This means that books borrowed on these days are used for slightly over 2 days on average.


Patron Group

It was found that ‘Alumni’ patron group borrows a statistically significant amount of hours more than ‘Undergraduate Students’, ‘PhD’, ‘Library Staff’, ‘Adjunct’ and ‘Others’ patron groups as they have significant negative sufficiency measure (Figure 13).

Alt text
Figure 13 shows the sufficiency measure for 3-day loans by school and patron group

The ‘Alumni’ patron group observed the highest mean borrowing hours of 136.44. However, in the entire year of 2017, the ‘Alumni’ patron group has borrowed only twice for 125.8 and 147.6 hours respectively, thereby justifying the high borrowing hours observed. The ‘Master’ students were observed to borrow for significantly higher hours than ‘Undergraduate’ students. This could possibly be due to the 2 out of 134 ‘Master’ students who borrowed for 320.4 and 265.44 hours each. These are relatively high hours that could have skewed the mean. This is also true for their sufficiency measure where ‘Master’ students and ‘Alumni’ have the lowest sufficiency measure at 8.85 and -64.44 respectively.


Academic Term

As seen from the graph below, it is observed that ‘AY16-17 T2’, ‘AY17-18 T1’ and ‘Break’ is significantly more sufficient as compared to ‘AY16-17 T3A’ and ‘AY16-17 T3B’.

Alt text
Figure 14 shows the sufficiency measure for 3-day loans by academic term

As we test the results, it shows that it is statistically significant that library users borrowing course reserve materials during ‘AY16-17 T3B’ academic term observe longer borrowing hours from those occurring during ‘AY16-17 T2’, ‘AY17-18 T1’ and ‘Break’ academic terms. At the same time, it is significant that borrowings during ‘AY16-17 T2’, ‘AY17-18 T1’ and ‘Break’ are more sufficient than ‘AY16-17 T3B’, and hence it is consistent with our results. This is possibly due to the 2 borrowing instances by master students that observed abnormally high hours of 320.4 and 265.44 hours respectively, resulting in a large negative sufficiency measure. Since there are only 19 transaction occurring in total during ‘AY16-17 T3B’ academic term, having 2 high observations would naturally influence and skew the mean. In addition, it is statistically significant that borrowings during ‘AY17-18 T1’ will be more sufficient than ‘AY16-17 T3A’ and ‘AY16-17 T3B’. This is probably due to the nature where patrons visit the library more often and hence have more opportunity to return their books without fully using the 72 hours.


Hour of Day

When working with the 3-hour data, it was found that borrowings from 9pm to 12am observe longer hours than the rest of the day. In contrast, there is insufficient evidence to suggest that library users follow a similar pattern when borrowing 3-day loan course reserves (Figure 15).

Alt text
Figure 15 shows the heatmap of number of borrows by day of week and hour of day

There are no significant differences between the borrowing hours and their sufficiency measure at any particular hour of day. We propose that this could be due to the fact that library users would have a 3-day borrowing period regardless of the hour in which they check out the book. Given that users are found, with 95% confidence, to require a book title for only 50 to 53 hours, they would not require a formal strategy when borrowing course reserves.


Exam Week

In contrast to the 3-hour findings, there is insufficient evidence to suggest that the hours borrowed by library users during the exam weeks are different from those not during exam weeks. Figure 16 shows that the average sufficiency measures for exam week and non-exam weeks are very similar.

Alt text
Figure 16 shows the sufficiency measure for 3-day loans by ‘exam_week’

It could indicate towards the sufficiency of a 72-hour loan period for library users, where users ultimately only require approximately 51 to 53 hours each time they borrow a book. This is the same for sufficiency measure where there is no statistical significance between exam week and non-exam week.


Break Week

Similar to the 3-hour findings, there is insufficient evidence to suggest that the hours borrowed by library users during the week before the finals week are different from those not during break week since the average sufficiency measures do not show any significant differences (Figure 17).

Alt text
Figure 17 shows the sufficiency measure for 3-day loans by ‘break_week’

It was found that library users do not intensify or prolong their course reserve materials usage during the week nearing the final exams. This is the same for sufficiency measure where there is no statistical significance between break week and non-break week.