ANLY482 AY2016-17 T2 Group7: Project Overview

From Analytics Practicum
Revision as of 07:16, 7 January 2017 by Jx.wang.2013 (talk | contribs)
Jump to navigation Jump to search

Home

Team

Project Overview

Project Findings

Project Management

Documentation


Business Problem & Motivation

The role of the analytics team (part of Learning & Information Services) in Li Ka Shing Library is to discover meaningful insights about user behaviour so as to provide necessary assistance in forms of library e-resources training, helpdesk and support. However, the current problem is that they do not know what to do with the logging data collected from the library’s main web page, http://library.smu.edu.sg/. Thus, the logging data files are neglected and therefore the library analytics team wishes to collaborate with us in realizing the full potential of this data.


Project Objective

This project aims to do analysis on log files to:

  1. Understand user behaviour by using a data-driven approach
  2. Understand the relationship between different search queries for different users
  3. Examine the event sequence for unique users (Eg. What articles did User A searched together or 1 after another in sequence)


Datasets

Proxy log data & student information data (Names of Students are Hashed)

Data Dictionary

Proxy log data:

59.189.71.33 tDU1zb0CaV2B8qZ 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba [01/Jan/2016:00:01:39 +0800] "GET http://heinonline.org:80/HOL/VMTP?base=js&handle=hein.journals/fchlj23&div=7&collection=journals&input=(The%20Great%20Peace)&set_as_cursor=19&disp_num=20&viewurl=SearchVolumeSOLR%3Finput%3D%2528The%2520Great%2520Peace%2529%26div%3D7%26f_size%3D600%26num_results%3D10%26handle%3Dhein.journals%252Ffchlj23%26collection%3Djournals%26set_as_cursor%3D19%26men_tab%3Dsrchresults%26terms%3D%2528The%2520Great%2520Peace%2529 HTTP/1.1" 200 2121 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"

Proxy Log Data:

Parameters Description Example
Http address This is the IP address of the webpage 59.189.71.33
Session ID Each session is identified by an unique ID, which corresponds to 1 session by a single user tDU1zb0CaV2B8qZ
Unique Student ID (Hashed) The student ID is hashed by the SMU Library so as to protect the identity of users 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba
Timestamp This is the timing which the log is recorded, and the log is recorded whenever the user performs a task. The time is in 24 hours format and in local Singapore time GST+0800. [01/Jan/2016:00:01:39 +0800]
HTML method The search query by the user typically comes after this HTML method. GET

Student Information Data:

“feb0e4d05b236c0bcc0c7331dc754921cf9189c4c1317b0b112696fcf68cd2f8, MASTER School of Accountancy, MSc in CFO Leadership, AY_2014, GY_2015”

Parameters Description Example
Unique Student ID (Hashed) This is provided so that we can match the unique student ID to the corresponding ones in the proxy data logs. feb0e4d05b236c0bcc0c7331dc754921cf9189c4c1317b0b112696fcf68cd2f8
Level of Education This indicates which level of education the user is in, typically Masters or Bachelors programme. MASTER
Unique Student ID (Hashed) The student ID is hashed by the SMU Library so as to protect the identity of users 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba
School This indicates the school that the user is from. School of Accountancy
Type of Programme This indicates the specific programme the user is undertaking. MSc in CFO Leadership
Admission Year This indicates the year which the user is admitted into SMU. AY_2014
Graduating Year This indicates the year which the user is graduated from SMU. GY_2015