Difference between revisions of "ANLY482 AY2017-18T2 Group19 Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 55: Line 55:
 
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>DATA COLLECTION</u></font></div>
 
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>DATA COLLECTION</u></font></div>
  
The usage of proxy browser by the users records their individual action as they search for the online course reserves they require. The usage of printed course reserves is recorded as the users borrow and return the books. Also, in-house usage of the books are recorded as the users return the books to the library counter instead of the book shelves.  
+
SMU libraries provided us with the datasets that were extracted from their system. Figure 1 shows the details the fields that were provided for each dataset.  
  
These information will be provided by by the client.
+
[[Image:G19_Datasets.png|center|800x150px]]
  
 +
As can be seen in the figure above, the transaction records are obtained from 2 different time periods: 12-month worth of data from year 2016 and 12-month worth of data from year 2017. In the 2016 dataset, loan policies are 2-hour and 3-day long while in the 2017 dataset, the loan policies are 3-hour and 3-day long. The transaction data amounts to 48,832 records in total while the master data has 528 records.
  
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>DATA CLEANING AND TRANSFORMATION</u></font></div>
+
An informal primary research was also conducted. Through this, it was found that there were 2 distinct library user profiles. Should the undergraduate students find the loan policy insufficient, they would act in the following 2 ways:
  
The data will need to be transformed into the required formats using various techniques, such as rules and patterns technique, in order for us to perform the necessary processing later. Duplicates and irrelevant data will be removed.  
+
#They will overdue the books past the time the book is due and will return it only when they are done with it at a later time. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within the loan period.
  
 +
#They will borrow in succession. This group of users may borrow the same book title from the course reserves collection immediately after returning it. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within a single loan.
  
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>EXPLORATION OF DATA</u></font></div>
+
This observation will be taken into account when cleaning and preparing the data for analysis.  
 
 
JMP, Tableau and Javascript will be used for data exploration and visualization. We set out to design a dashboard that aims to answer the following questions:
 
# What proportion of the school is using the course reserve materials?
 
# Are all the course reserve materials fully utilized?
 
# When are the course reserve materials being utilized?
 
# Are we acquiring course materials that students are not using?
 
 
 
To start off, we would like a graph that is capable of visualizing usage over time. Given the immense number of course reserve materials available, we settled on horizon graphs which utilizes position and color to reduce vertical space while still fulfilling functionalities exhibited by a simple line graph. A horizon graph displays metric behavior over time in relation to a baseline. Ideally, with this graph, we will be able to identify when the course reserve materials are most in used and which of the course reserve materials are most in used or not most in used.
 
 
 
In addition, we require a graph that allows the users to easily identify if the single measure of interest pits well against a target value, and hence, we chose to visualize with bullet graphs. Bullet graphs are able to display those information like a bar graph without compromising on the amount of space required. The following picture demonstrates how the bullet graphs can be read:
 
 
 
[[Image:G19_Bullet_Chart.png|700px|center]] &nbsp;
 
 
 
 
 
With this, ideally, we will be able to identify and analyze the utilization rates of each of the course reserve materials.
 
 
 
On top of these, we will need the respective information on student users to supplement the analysis. The main fields of classifications of students remains in the schools they belong to. As such, for each course reserve material, we would like to visualize the background information on the students using these materials. Thereby, answering the first question of who is using these resources. To answer this question, we have chosen the simple bar graph, where we can easily see the rankings of the schools that uses the resource the most and so on.
 
 
 
Coupling above mentioned graphs with interactive filters and functionalities onto a dashboard, we aim to use the following dashboard to answer the above mentioned questions:
 
  
 +
<div style="background: #FFFFFF; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px;letter-spacing:-0.03em;font-size:16px;id:UT1"><font face='Century Gothic' color=#000000 ><u>DATA CLEANING AND TRANSFORMATION</u></font></div>
  
[[Image:G19_Proposed_Schedule.png|1000px|center]] &nbsp;
+
The data will need to be transformed into the required formats, by adding calculated fields. Duplicates and irrelevant data will be removed.

Revision as of 19:52, 15 April 2018

G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


DATA COLLECTION

SMU libraries provided us with the datasets that were extracted from their system. Figure 1 shows the details the fields that were provided for each dataset.

G19 Datasets.png

As can be seen in the figure above, the transaction records are obtained from 2 different time periods: 12-month worth of data from year 2016 and 12-month worth of data from year 2017. In the 2016 dataset, loan policies are 2-hour and 3-day long while in the 2017 dataset, the loan policies are 3-hour and 3-day long. The transaction data amounts to 48,832 records in total while the master data has 528 records.

An informal primary research was also conducted. Through this, it was found that there were 2 distinct library user profiles. Should the undergraduate students find the loan policy insufficient, they would act in the following 2 ways:

  1. They will overdue the books past the time the book is due and will return it only when they are done with it at a later time. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within the loan period.
  1. They will borrow in succession. This group of users may borrow the same book title from the course reserves collection immediately after returning it. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within a single loan.

This observation will be taken into account when cleaning and preparing the data for analysis.

DATA CLEANING AND TRANSFORMATION

The data will need to be transformed into the required formats, by adding calculated fields. Duplicates and irrelevant data will be removed.