ANLY482 AY2017-18T2 Group19 Methodology

From Analytics Practicum
Revision as of 20:01, 15 April 2018 by Joanne.ong.2014 (talk | contribs)
Jump to navigation Jump to search
G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


DATA COLLECTION

SMU libraries provided us with the datasets that were extracted from their system. Figure 1 shows the details the fields that were provided for each dataset.

G19 Datasets.png

As can be seen in the figure above, the transaction records are obtained from 2 different time periods: 12-month worth of data from year 2016 and 12-month worth of data from year 2017. In the 2016 dataset, loan policies are 2-hour and 3-day long while in the 2017 dataset, the loan policies are 3-hour and 3-day long. The transaction data amounts to 48,832 records in total while the master data has 528 records.

An informal primary research was also conducted. Through this, it was found that there were 2 distinct library user profiles. Should the undergraduate students find the loan policy insufficient, they would act in the following 2 ways:

  1. They will overdue the books past the time the book is due and will return it only when they are done with it at a later time. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within the loan period.
  1. They will borrow in succession. This group of users may borrow the same book title from the course reserves collection immediately after returning it. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within a single loan.

This observation will be taken into account when cleaning and preparing the data for analysis.


DATA CLEANING AND TRANSFORMATION

Data cleaning and preparation involves:

  1. Removal of missing data values and outliers
  2. Standardize duplicates
  3. Redefinition of scope to targeted groups
  4. Addition of calculated variables