ANLY482 AY2017-18T2 Group19 Methodology

From Analytics Practicum
Jump to navigation Jump to search
G19 Logo.png


G19 Home.png   HOME

 

G19 Overview Icon.png   PROJECT OVERVIEW

 

G19 Findings Icon.png   PROJECT FINDINGS

 

G19 Management Icon.png   PROJECT MANAGEMENT

 

G19 Documentation Icon.png   DOCUMENTATION

 

G19 To Main Page icon.png   BACK TO MAIN PAGE


 


DATA COLLECTION

SMU libraries provided us with the datasets that were extracted from their system. Figure 1 shows the details the fields that were provided for each dataset.

G19 Datasets.png

As can be seen in the figure above, the transaction records are obtained from 2 different time periods: 12-month worth of data from year 2016 and 12-month worth of data from year 2017. In the 2016 dataset, loan policies are 2-hour and 3-day long while in the 2017 dataset, the loan policies are 3-hour and 3-day long. The transaction data amounts to 48,832 records in total while the master data has 528 records.

An informal primary research was also conducted. Through this, it was found that there were 2 distinct library user profiles. Should the undergraduate students find the loan policy insufficient, they would act in the following 2 ways:

  1. They will overdue the books past the time the book is due and will return it only when they are done with it at a later time. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within the loan period.
  1. They will borrow in succession. This group of users may borrow the same book title from the course reserves collection immediately after returning it. The duration of the loan policy would be considered insufficient in this case as the users are unable to finish the usage of the books within a single loan.

This observation will be taken into account when cleaning and preparing the data for analysis.


DATA CLEANING AND PREPARATION

Data cleaning and preparation involves:

  1. Removal of missing data values and outliers
  2. Standardize duplicates
  3. Redefinition of scope to targeted groups
  4. Addition of calculated variables


ANALYSIS AND TESTING

The 2 user profiles (i.e. users who overdue and users who borrow in succession) will be analysed separately. Whether or not the loan policy is sufficient for these 2 groups of users will be analyzed through measures of sufficiency. For users who overdue, their sufficiency level across 2016 and 2017 will be measured through the frequency of overdue transactions and the distribution of overdue period. For users who borrow in succession, their sufficiency level across 2016 and 2017 will be measured through the frequency of succession borrow and the distribution of hours borrowed with succession.

To confirm if there is statistical significance in the difference in frequencies observed in 2016 and 2017, contingency analyses will be performed due to the data's nominal nature. Fisher's exact test will be conducted, when appropriate.

To confirm if there is statistical significance in the difference in the distributions across the years, a means or median test would be conducted. The choice depends on whether the continuous datasets follows a normal distribution. A goodness-of-fit test will be conducted for this purpose. If the dataset follows a normal distribution, Tukey Kramer test will be used. Otherwise, Wilcoxon Signed Rank test will be used.