ANLY482 AY2017-18T2 Group19 Project Findings
The same fields were added into the 2-hour, 3-hour and 3-day datasets with the sole intention of facilitating our data exploratory. All calculations were performed in JMP. The following fields were included:
No. | Name of Field | Description |
---|---|---|
01 | Loan_Timestamp | This variable is a result of concatenating the ‘loan_date’ and ‘loan_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal. |
02 | Return_Timestamp | This variable is a result of concatenating the ‘return_date’ and ‘return_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal. |
03 | Term | This allows us to segment the ‘loan_timestamp’ into academic terms to allow us to breakdown the analysis later on. We got the various academic terms start and end dates from the SMU’s official academic calendar document and applied an IF() logical statement to classify the various loan timestamps into the different academic terms. |
04 | Hours_borrowed | This variable refers to the number of hours borrowed per transaction. It is derived by calculating the date difference between the ‘return_timestamp’ and ‘loan_timestamp’ in terms of hours. This could potentially help us in understanding the usage patterns per borrow. |
05 | Assigned_loan_period | This variable refers to the amount of hours a user is entitled to when borrowing a book. The library’s policy is as follows: Depending on the hour in which the transaction occurs, the hours of usage allowed to the user differs. As such, we created a calculated field in which follows the above-mentioned rules by utilizing an IF() logical statement. |
06 | Overdue_period | This variable was created with intentions to further our analysis on the extent of user usage patterns. Given the varying assigned loan periods, we believed that it was necessary to take this variable into account when analyzing if a loan period is currently sufficient for users. As such, ‘overdue_period’ is calculated by deducting ‘assigned_loan_period’ from ‘hours_borrowed’. A positive value would indicate that the current loan period assigned is adequate for the users while a negative sufficiency measure would indicate otherwise. |
07 | Overdue? | This is a binary variable that classifies if a transaction is overdue or not. A value of 1 would indicate that the transaction was overdue and a value of 0 would indicate otherwise. This variable will be used primarily for investigation pertaining the users who overdue. |
08 | Time_Elapsed | This variable calculates the time elapsed (in hours) between the borrowing of the same book from the previous time by the same user. It is derived through subtracting the ‘loan_timestamp’ of the current transaction with the previous ‘return_timestamp’ if the transaction is observed to be involving the same ‘email’ and ‘title’. This variable will be used primarily for investigation pertaining the users who borrow in succession. |
09 | Transaction_Group | This variable was created with the purpose to serve as an identifier for transactions belonging to the same group. To be considered the same group, the fields ‘email’ and ‘title’ must remain the same while observing a ‘time_elapsed’ value of not more than 4. This means that transactions with the same user borrowing the same title within 4 hours of his preceding ‘return_timestamp’ would be considered as a single transaction. This variable will be used primarily for investigation pertaining the users who borrow in succession. |
10 | Hours_Borrowed_With_Successions | This variable sums up the ‘hours_borrowed’ that belongs to the same ‘transaction group’. Duplication will be removed during the analysis. This is a field that is updated from ‘hours_borrowed’ to account for the succession borrowing behaviour that library users exhibit. This variable will be used primarily for investigation pertaining the users who borrow in succession. |
We performed a ‘Summary Statistics’ for all the columns in our 3-hour and 3-day datasets in order to identify the number of observations with missing values. Missing data have the potential to influence our findings and conclusions drawn from the data, and as such, it was essential that we performed this analysis to sieve out how much missing data we have in the datasets and subsequently, decide on how we would like to proceed on from there.
Dataset | Before | After |
---|---|---|
3-Hour Transaction Dataset |
|
|
3-Day Transaction Dataset |
|
Given our client’s stake in the project, we consulted them on the rows that had ‘return_timestamp’ missing. Seeing that it is not possible to find the ‘return_timestamp’ that was attached to these transactions, we then proceeded on to exclude those rows with our client’s permission. In addition, without the ‘return_timestamp’, it would not be possible to analyze the ‘hours_borrowed’ and ‘sufficiency_measure’ of these observations. This exclusion was performed on both transaction datasets.
The other fields with missing values, including ‘active cc’ and ‘isbn’, were not used in any of our analysis thus far, and hence remains untouched.
For both loan policies, we removed outliers where the hours borrowed is less than 0. A negative value under the ‘hours_borrowed’ variable would indicate that the books were returned even before the time they were borrowed. Seeing that this circumstance is not possible, we voided such observations.
3-Hour loan
Due to the library loan policy, books under 3-hour loan are able to be returned the next day depending on the time of borrow. This means that there is a maximum loan time of 19.5 hours which is from Sat, 6pm to Sun, 1.30pm. This results in a maximum of 19.5 hours of sufficiency measure where book is borrowed and returned immediately. This sets the sufficiency measure upper bound for our analysis.
At the same time, the library does not accept overdue books that are returned more than 2 weeks. Therefore, we have restricted our maximum overdue timing to be 336 hours. This sets the sufficiency measure lower bound for our analysis at -336.
3-Day loan
Books with 3-day loans do not have the overnight policy like the 3-hour loan policy does. Hence the upper bound for the sufficiency measure is merely set to 72 hours.
The same policy applies for the books overdue for more than 2 weeks. Therefore, the maximum overdue timing will be 336 hours. This sets the sufficiency measure lower bound for our analysis at -336.
In conclusion, the following table summarizes the boundaries set for our datasets.
Dataset | Variable | Lower Bound | Upper Bound | Number of Outliers Removed |
---|---|---|---|---|
2/3-Hour Transaction Dataset | Hours Borrowed | 0 | 355.5 | 6 |
Overdue Period | -336 | 19.5 | 6 | |
3-Day Transaction Dataset | Hours Borrowed | 0 | 408 | 0 |
Overdue Period | -336 | 72 | 0 |
As SMU Libraries’ main group of users is the undergraduates, this paper would focus on the Year 1 to Year 4 undergraduates. As such, the following patron groups and years of study are filtered out:
- Patron Group: Adjunct, Admin Staff, Alumni, Faculty, Master, PhD, Others, Research Staff
- Year of Study: Year 0, Year 5 and above
A total of 2164 rows are being removed from the 2-hour, 3-hour and 3-days datasets.