ANLY482 AY2016-17 T2 Group16: PROJECT OVERVIEW

From Analytics Practicum
Revision as of 17:42, 8 January 2017 by Rui.song.2013 (talk | contribs)
Jump to navigation Jump to search

HOME

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION


BUSINESS PROBLEM & MOTIVATION

The library subscribes to eBook platforms with contents from a range of publishers. These databases provide contents that have largely enriched the library’s resources and make an integral part of the library repository. When a student user requests contents from the databases, the request goes through the library’s proxy server. The proxy server captures a digital trace for each user request, which contains request url, user ID, and user agent. With the aim of providing easy services to users, the management hope to better understand the usage patterns of the databases. The challenge is to programmatically extract the meaningful user inputs within billions of request record, as most records are irrelevant to the project objectives. Our main focus of the project is to understand the usefulness of the library eBook database in fulfilling the student queries. By analysing the proxy trace, we can define the characteristics of the users as well as examining the usage rate of the database. The success rate of the students queries are also part of target findings. Since the dataset is not static, we aim to provide a processing pipeline to help the sponsor in looking for new findings with new enrolled students in the future.

PROJECT OBJECTIVE

  1. Understand the characteristics of database and users requests.

Help the management understand and take actions on each database by profiling databases. The dimensions for profiling includes user profiles (faculty, program, year) number of requests popular requested items. The profiling will be done on multiple timescales (eg. by hours in the day, by days of the week, or by weeks) to identify chronological patterns. The analysis result will help decision makers understand who requested for which items from each database. By slicing data on multiple timescales, the team will be able to identify the peak periods and general trend of requests for each database. Help the management better understand the user behaviours by profiling users. Identify special e-book usage patterns for students from different faculty or with different academic performance. The usage patterns are chronological patterns, device used, and sites requested. This would help the management to better understand the users, and devise better approach to improve service for each user type. Timing of student access the database is another aspect of our focus, which is to dive deeper into the finding and patterns regarding the e-materials users. A focus we will be taking is to compare the behaviour pattern of the normal students against the dean's-list students. In particular, when do they access the material. One possible aspect is to find out any group is particularly favourable in last minute work. Consistency of the student access the resources is a continuous aspect of the previous objective as it is useful to know how students need the materials across the entire semester. Whether it is widespread or intensely concentrated on a certain period. So it can give insights to resouces manager to Within the sessions of all the users, we can carry out ‘Market Basket Analysis’ to sniff out the popular combination of e-books being viewed and downloaded. The purpose of such action is to potentially create a foundation for e-resources recommendation in the future.

DATASETS

===


METHODOLOGY

Data exploration

Data Preparation

Data analysis

Reporting and data processing pipeline