ANLY482 AY2016-17 T2 Group09: Project Findings

From Analytics Practicum
Jump to navigation Jump to search


Back To Main Home   Project Overview   Project Findings   Project Management   Documentation


Midterm Final


About Data

Our project focuses on analyzing the library entry information from the card reader logs from 01/01/2016 to 30/06/2016. The card readers are located at the main entrance of LKSLIB and at the linkbridge side entrance. This provides us with the entry information, which includes timestamp and basic information about the student. Raw data files come in CSV format, by monthly, with each tap as a data entry. Therefore we have 6 months data for main entry, and another 6 months for linkbridge. The original columns include:

  • Date: the date when the student enters library (d/m/y)
  • Time: the timestamp when the student enters (h:m:s)
  • Device Name: which gantry the student uses to enter
  • Email: the hashed email of the student
  • User group: which user group the student belongs to (undergrad, master, PHD)
  • Statistical Category 1: the school of the student
  • Statistical Category 2: the major of the student
  • Statistical Category 3: the admission year of the student
  • Statistical Category 4: the graduation year of the student

Data Preparation

1. Concatenate 6 months data into one data file

  • The date format need to be standardized before combine
  • This gives us a total of 469,949 entris of datamis

2. Rename columns
3. Remove missing value
4. Recode column value

  • Missing values under major column were group into 'unknown'

5. Detect outliers

  • Removed 39 records which appear outside library operation hour
  • Removed 2 records which admission year are later than 2016

6. Derive new columns

  • Hour
  • Month
  • Day of Week
  • IsAlumni
  • Year of Study
  • Academic Week
  • Term

After cleaning, there are 468,891 entries used for exploratory analysis.

Data Exploration

Fingings about data exploration can be found in our interim report.