Difference between revisions of "ANLY482 AY2017-18T2 Group19 Project Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 108: Line 108:
 
<td width="35%" border="1" align="left">[[Image:G19_Missing_Values_3D_Before.png|center|500x300px]]
 
<td width="35%" border="1" align="left">[[Image:G19_Missing_Values_3D_Before.png|center|500x300px]]
 
<center>Total = 1401</center></td></tr>
 
<center>Total = 1401</center></td></tr>
 +
</table>
 +
 +
Given our client’s stake in the project, we consulted them on the rows that had ‘return_timestamp’ missing. Seeing that it is not possible to find the ‘return_timestamp’ that was attached to these transactions, we then proceeded on to exclude those rows with our client’s permission. In addition, without the ‘return_timestamp’, it would not be possible to analyze the ‘hours_borrowed’ and ‘sufficiency_measure’ of these observations. This exclusion was performed on both transaction datasets.
 +
 +
The other fields with missing values, including ‘active cc’ and ‘isbn’, were not used in any of our analysis thus far, and hence remains untouched.
 +
 +
&nbsp;
 +
 +
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>REMOVING OUTLIERS</u></font></div>
 +
 +
[[Image:G19_Lost_Damaged_Items_Policy.png|center|600x300px]]
 +
 +
For both loan policies, we removed outliers where the hours borrowed is less than 0. A negative value under the ‘hours_borrowed’ variable would indicate that the books were returned even before the time they were borrowed. Seeing that this circumstance is not possible, we voided such observations. 
 +
 +
<u>3-Hour loan</u>
 +
 +
Due to the library loan policy, books under 3-hour loan are able to be returned the next day depending on the time of borrow. This means that there is a maximum loan time of 19.5 hours which is from Sat, 6pm to Sun, 1.30pm. This results in a maximum of 19.5 hours of sufficiency measure where book is borrowed and returned immediately. This sets the sufficiency measure upper bound for our analysis.
 +
 +
At the same time, the library does not accept overdue books that are returned more than 2 weeks. Therefore, we have restricted our maximum overdue timing to be 336 hours. This sets the sufficiency measure lower bound for our analysis at -336.
 +
 +
<u>3-Day loan</u>
 +
 +
Books with 3-day loans do not have the overnight policy like the 3-hour loan policy does. Hence the upper bound for the sufficiency measure is merely set to 72 hours.
 +
 +
The same policy applies for the books overdue for more than 2 weeks. Therefore, the maximum overdue timing will be 336 hours. This sets the sufficiency measure lower bound for our analysis at -336.
 +
 +
In conclusion, the following table summarizes the boundaries set for our datasets.
 +
 +
<table rules="all" width="80%" cellpadding="6" cellspacing="3" style="margin: 1em auto 1em auto; font-weight: normal; border-style: solid">
 +
<tr style="background-color:white; color: black;"><th>Dataset</th><th>Variable</th><th>Lower Bound</th><th>Upper Bound</th><th>Number of Outliers Removed</th></tr>
 +
 +
<tr><td width="20%" border="1" align="center" rowspan = "2">3-Hour Transaction Dataset</td>
 +
<td width="20%" border="1" align="center" >Hours_borrowed</td>
 +
<td width="20%" border="1" align="center">0</td>
 +
<td width="20%" border="1" align="center">355.5</td>
 +
<td width="20%" border="1" align="center">6</td></tr>
 +
 +
<tr><td width="20%" border="1" align="center">Sufficiency Measure</td>
 +
<td width="20%" border="1" align="center">-336</td>
 +
<td width="20%" border="1" align="center">19.5</td>
 +
<td width="20%" border="1" align="center">6</td></tr>
 +
 +
<tr><td width="20%" border="1" align="center" rowspan = "2">3-Day Transaction Dataset</td>
 +
<td width="20%" border="1" align="center" >Hours_borrowed</td>
 +
<td width="20%" border="1" align="center">0</td>
 +
<td width="20%" border="1" align="center">408</td>
 +
<td width="20%" border="1" align="center">0</td></tr>
 +
 +
<tr><td width="20%" border="1" align="center">Sufficiency Measure</td>
 +
<td width="20%" border="1" align="center">-336</td>
 +
<td width="20%" border="1" align="center">72</td>
 +
<td width="20%" border="1" align="center">0</td></tr>
 
</table>
 
</table>

Revision as of 16:27, 27 February 2018

G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


 

ADDITIONAL COLUMNS

The same fields were added into the 3-hour and 3-day datasets with the sole intention of facilitating our data exploratory. All calculations were performed in JMP. The following fields were included:

No.Name of FieldDescription
01 Loan_Timestamp This variable is a result of concatenating the ‘loan_date’ and ‘loan_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal.
02 Return_Timestamp This variable is a result of concatenating the ‘return_date’ and ‘return_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal.
03 Term This allows us to segment the ‘loan_timestamp’ into academic terms to allow us to breakdown the analysis later on. We got the various academic terms start and end dates from the SMU’s official academic calendar document and applied an IF() logical statement to classify the various loan timestamps into the different academic terms.
04 Hours_borrowed This variable refers to the number of hours borrowed per transaction. It is derived by calculating the date difference between the ‘return_timestamp’ and ‘loan_timestamp’ in terms of hours. This could potentially help us in understanding the usage patterns per borrow.
05 Assigned_loan_period This variable refers to the amount of hours a user is entitled to when borrowing a book. The library’s policy is as follows:
G19 Assigned Loan Period Policy.png

Depending on the hour in which the transaction occurs, the hours of usage allowed to the user differs. As such, we created a calculated field in which follows the above-mentioned rules by utilizing an IF() logical statement.
06 Sufficiency_measure This variable was created with intentions to further our analysis on the extent of user usage patterns. Given the varying assigned loan periods, we believed that it was necessary to take this variable into account when analyzing if a loan period is currently sufficient for users. As such, ‘sufficiency_measure’ is calculated by deducting ‘assigned_loan_period’ from ‘hours_borrowed’. A positive value would indicate that the current loan period assigned is adequate for the users while a negative sufficiency measure would indicate otherwise.
07 Exam_week This variable classifies ‘loan_timestamp’ into whether they occur during exam periods as stipulated in SMU’s official academic calendar. We believe that usage patterns may be influenced during examinations. This variable is a binary variable, whereby the observations only displays ‘Y’ for ‘Yes’ or ‘N’ for ‘No’.
08 Break_week This variable classifies ‘loan_timestamp’ into whether they occur during break weeks (week 14), the week before the exam weeks as stipulated in SMU’s official academic calendar. We believe that usage patterns may be influenced during the week before examinations. This variable is a binary variable, whereby the observations only displays ‘Y’ for ‘Yes’ or ‘N’ for ‘No’.
09 Part_of_day This variable refers to the part of day the ‘loan_timestamp’ occurs, whether the book loan occurs in the ‘morning’, ‘afternoon’, or ‘night’.

 

MISSING VALUES

We performed a ‘Summary Statistics’ for all the columns in our 3-hour and 3-day datasets in order to identify the number of observations with missing values. Missing data have the potential to influence our findings and conclusions drawn from the data, and as such, it was essential that we performed this analysis to sieve out how much missing data we have in the datasets and subsequently, decide on how we would like to proceed on from there.

DatasetBeforeAfter
3-Hour Transaction Dataset
G19 Missing Values 3H Before.png
Total = 13281
G19 Missing Values 3H After.png
Total = 12958
3-Day Transaction Dataset
G19 Missing Values 3D Before.png
Total = 1401

Given our client’s stake in the project, we consulted them on the rows that had ‘return_timestamp’ missing. Seeing that it is not possible to find the ‘return_timestamp’ that was attached to these transactions, we then proceeded on to exclude those rows with our client’s permission. In addition, without the ‘return_timestamp’, it would not be possible to analyze the ‘hours_borrowed’ and ‘sufficiency_measure’ of these observations. This exclusion was performed on both transaction datasets.

The other fields with missing values, including ‘active cc’ and ‘isbn’, were not used in any of our analysis thus far, and hence remains untouched.

 

REMOVING OUTLIERS
G19 Lost Damaged Items Policy.png

For both loan policies, we removed outliers where the hours borrowed is less than 0. A negative value under the ‘hours_borrowed’ variable would indicate that the books were returned even before the time they were borrowed. Seeing that this circumstance is not possible, we voided such observations.

3-Hour loan

Due to the library loan policy, books under 3-hour loan are able to be returned the next day depending on the time of borrow. This means that there is a maximum loan time of 19.5 hours which is from Sat, 6pm to Sun, 1.30pm. This results in a maximum of 19.5 hours of sufficiency measure where book is borrowed and returned immediately. This sets the sufficiency measure upper bound for our analysis.

At the same time, the library does not accept overdue books that are returned more than 2 weeks. Therefore, we have restricted our maximum overdue timing to be 336 hours. This sets the sufficiency measure lower bound for our analysis at -336.

3-Day loan

Books with 3-day loans do not have the overnight policy like the 3-hour loan policy does. Hence the upper bound for the sufficiency measure is merely set to 72 hours.

The same policy applies for the books overdue for more than 2 weeks. Therefore, the maximum overdue timing will be 336 hours. This sets the sufficiency measure lower bound for our analysis at -336.

In conclusion, the following table summarizes the boundaries set for our datasets.

DatasetVariableLower BoundUpper BoundNumber of Outliers Removed
3-Hour Transaction Dataset Hours_borrowed 0 355.5 6
Sufficiency Measure -336 19.5 6
3-Day Transaction Dataset Hours_borrowed 0 408 0
Sufficiency Measure -336 72 0