Difference between revisions of "ANLY482 AY2017-18T2 Group19 Project Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 48: Line 48:
 
|}
 
|}
 
<!--Sub Navbar End-->
 
<!--Sub Navbar End-->
 +
 +
 +
&nbsp;
 +
 +
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>ADDITIONAL COLUMNS</u></font></div>
 +
 +
The same fields were added into the 3-hour and 3-day datasets with the sole intention of facilitating our data exploratory. All calculations were performed in JMP. The following fields were included:
 +
 +
<table rules="all" width="80%" cellpadding="6" cellspacing="3" style="margin: 1em auto 1em auto; font-weight: normal; border-style: solid">
 +
<tr style="background-color:white; color: black;"><th>No.</th><th>Name of Field</th><th>Description</th></tr>
 +
<tr><td width="10%" border="1" align="center">01</td>
 +
<td width="20%" border="1" align="center">Loan_Timestamp</td>
 +
<td width="70%" border="1" align="left">This variable is a result of concatenating the ‘loan_date’ and ‘loan_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal.</td>
 +
<tr><td width="10%" border="1" align="center">02</td>
 +
<td width="20%" border="1" align="center">Return_Timestamp</td>
 +
<td width="70%" border="1" align="left">This variable is a result of concatenating the ‘return_date’ and ‘return_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal.</td>
 +
<tr><td width="10%" border="1" align="center">03</td>
 +
<td width="20%" border="1" align="center">Term</td>
 +
<td width="70%" border="1" align="left">This allows us to segment the ‘loan_timestamp’ into academic terms to allow us to breakdown the analysis later on. We got the various academic terms start and end dates from the SMU’s official academic calendar document and applied an IF() logical statement to classify the various loan timestamps into the different academic terms.</td>
 +
<tr><td width="10%" border="1" align="center">04</td>
 +
<td width="20%" border="1" align="center">Hours_borrowed</td>
 +
<td width="70%" border="1" align="left">This variable refers to the number of hours borrowed per transaction. It is derived by calculating the date difference between the ‘return_timestamp’ and ‘loan_timestamp’ in terms of hours. This could potentially help us in understanding the usage patterns per borrow.</td>
 +
<tr><td width="10%" border="1" align="center">05</td>
 +
<td width="20%" border="1" align="center">Assigned_loan_period</td>
 +
<td width="70%" border="1" align="left">This variable refers to the amount of hours a user is entitled to when borrowing a book. The library’s policy is as follows:<br>
 +
[[Image:G19_Assigned_Loan_Period_Policy.png|center|650x100px]]<br>
 +
Depending on the hour in which the transaction occurs, the hours of usage allowed to the user differs. As such, we created a calculated field in which follows the above-mentioned rules by utilizing an IF() logical statement.</td>
 +
<tr><td width="10%" border="1" align="center">06</td>
 +
<td width="20%" border="1" align="center">Sufficiency_measure</td>
 +
<td width="70%" border="1" align="left">This variable was created with intentions to further our analysis on the extent of user usage patterns. Given the varying assigned loan periods, we believed that it was necessary to take this variable into account when analyzing if a loan period is currently sufficient for users. As such, ‘sufficiency_measure’ is calculated by deducting ‘assigned_loan_period’ from ‘hours_borrowed’. A positive value would indicate that the current loan period assigned is adequate for the users while a negative sufficiency measure would indicate otherwise.
 +
<tr><td width="10%" border="1" align="center">07</td>
 +
<td width="20%" border="1" align="center">Exam_week</td>
 +
<td width="70%" border="1" align="left">This variable classifies ‘loan_timestamp’ into whether they occur during exam periods as stipulated in SMU’s official academic calendar. We believe that usage patterns may be influenced during examinations. This variable is a binary variable, whereby the observations only displays ‘Y’ for ‘Yes’ or ‘N’ for ‘No’.
 +
<tr><td width="10%" border="1" align="center">08</td>
 +
<td width="20%" border="1" align="center">Break_week</td>
 +
<td width="70%" border="1" align="left">This variable classifies ‘loan_timestamp’ into whether they occur during break weeks (week 14), the week before the exam weeks as stipulated in SMU’s official academic calendar. We believe that usage patterns may be influenced during the week before examinations. This variable is a binary variable, whereby the observations only displays ‘Y’ for ‘Yes’ or ‘N’ for ‘No’.
 +
<tr><td width="10%" border="1" align="center">09</td>
 +
<td width="20%" border="1" align="center">Part_of_day</td>
 +
<td width="70%" border="1" align="left">This variable refers to the part of day the ‘loan_timestamp’ occurs, whether the book loan occurs in the ‘morning’, ‘afternoon’, or ‘night’.
 +
 +
</table>
 +
 +
&nbsp;
 +
 +
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>MISSING VALUES</u></font></div>
 +
We performed a ‘Summary Statistics’ for all the columns in our 3-hour and 3-day datasets in order to identify the number of observations with missing values. Missing data have the potential to influence our findings and conclusions drawn from the data, and as such, it was essential that we performed this analysis to sieve out how much missing data we have in the datasets and subsequently, decide on how we would like to proceed on from there.
 +
 +
<table rules="all" width="80%" cellpadding="6" cellspacing="3" style="margin: 1em auto 1em auto; font-weight: normal; border-style: solid">
 +
<tr style="background-color:white; color: black;"><th>Dataset</th><th>Before</th><th>After</th></tr>
 +
 +
<tr><td width="20%" border="1" align="center">3-Hour Transaction Dataset</td>
 +
<td width="35%" border="1" align="left">[[Image:G19_Missing_Values_3H_Before.png|center|500x300px]]
 +
<center>Total = 13281</center></td>
 +
<td width="35%" border="1" align="left">[[Image:G19_Missing_Values_3H_After.png|center|500x300px]]
 +
<center>Total = 12958</center></td></tr>
 +
 +
<tr><td width="20%" border="1" align="center">3-Day Transaction Dataset</td>
 +
<td width="35%" border="1" align="left"></td>
 +
<td width="35%" border="1" align="left">[[Image:G19_Missing_Values_3D_Before.png|center|500x300px]]
 +
<center>Total = 1401</center></td></tr>
 +
</table>

Revision as of 15:18, 27 February 2018

G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


 

ADDITIONAL COLUMNS

The same fields were added into the 3-hour and 3-day datasets with the sole intention of facilitating our data exploratory. All calculations were performed in JMP. The following fields were included:

No.Name of FieldDescription
01 Loan_Timestamp This variable is a result of concatenating the ‘loan_date’ and ‘loan_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal.
02 Return_Timestamp This variable is a result of concatenating the ‘return_date’ and ‘return_time’ variables into a single field. This enhances the ease-of-use and aesthetic appeal.
03 Term This allows us to segment the ‘loan_timestamp’ into academic terms to allow us to breakdown the analysis later on. We got the various academic terms start and end dates from the SMU’s official academic calendar document and applied an IF() logical statement to classify the various loan timestamps into the different academic terms.
04 Hours_borrowed This variable refers to the number of hours borrowed per transaction. It is derived by calculating the date difference between the ‘return_timestamp’ and ‘loan_timestamp’ in terms of hours. This could potentially help us in understanding the usage patterns per borrow.
05 Assigned_loan_period This variable refers to the amount of hours a user is entitled to when borrowing a book. The library’s policy is as follows:
G19 Assigned Loan Period Policy.png

Depending on the hour in which the transaction occurs, the hours of usage allowed to the user differs. As such, we created a calculated field in which follows the above-mentioned rules by utilizing an IF() logical statement.
06 Sufficiency_measure This variable was created with intentions to further our analysis on the extent of user usage patterns. Given the varying assigned loan periods, we believed that it was necessary to take this variable into account when analyzing if a loan period is currently sufficient for users. As such, ‘sufficiency_measure’ is calculated by deducting ‘assigned_loan_period’ from ‘hours_borrowed’. A positive value would indicate that the current loan period assigned is adequate for the users while a negative sufficiency measure would indicate otherwise.
07 Exam_week This variable classifies ‘loan_timestamp’ into whether they occur during exam periods as stipulated in SMU’s official academic calendar. We believe that usage patterns may be influenced during examinations. This variable is a binary variable, whereby the observations only displays ‘Y’ for ‘Yes’ or ‘N’ for ‘No’.
08 Break_week This variable classifies ‘loan_timestamp’ into whether they occur during break weeks (week 14), the week before the exam weeks as stipulated in SMU’s official academic calendar. We believe that usage patterns may be influenced during the week before examinations. This variable is a binary variable, whereby the observations only displays ‘Y’ for ‘Yes’ or ‘N’ for ‘No’.
09 Part_of_day This variable refers to the part of day the ‘loan_timestamp’ occurs, whether the book loan occurs in the ‘morning’, ‘afternoon’, or ‘night’.

 

MISSING VALUES

We performed a ‘Summary Statistics’ for all the columns in our 3-hour and 3-day datasets in order to identify the number of observations with missing values. Missing data have the potential to influence our findings and conclusions drawn from the data, and as such, it was essential that we performed this analysis to sieve out how much missing data we have in the datasets and subsequently, decide on how we would like to proceed on from there.

DatasetBeforeAfter
3-Hour Transaction Dataset
G19 Missing Values 3H Before.png
Total = 13281
G19 Missing Values 3H After.png
Total = 12958
3-Day Transaction Dataset
G19 Missing Values 3D Before.png
Total = 1401