Maximum Project Overview

Project Motivation

The library currently aims to optimise its resource availability and distributions channels to maximise the learning effectiveness of its students. This could be in terms of increasing resources available for certain highly searched topics, altering current trainings and workshops to focus on any common mistakes committed by students while using the assets or finding any unexpected trends in user journey through digital and physical touch points. They further want to know if usage patterns vary between students based on certain attributes like Programme, Year of Graduation and Education Level. For this purpose, they have conducted an initial survey for the freshman batch of 2017 to evaluate the difference in their confidence level in various research skills before and after joining SMU, factoring in several considerations like modules taken, library workshops attended and so on and so forth. They wish for us to understand if this survey contains any actionable insights.

Project Objectives

We had an initial discussion with our project sponsor and they would like us to create a visual dashboard to ascertain the relationship between the initiatives and resources of the library, and student performance (in terms of confidence and optimal usage of resources).

The objectives of the project would be of the following:

Business objective: To identify factors that relate to and predict student confidence in performing library research tasks and help improve library training initiatives.
Technical objective: To use data analytics tools and statistical methods to study the data and obtain insights that would facilitate the business objective.

To achieve our two primary objectives, we will need to:

To understand the data domains
To understand the current library training process
To identify if there exist any students who experience high or low confidence and its contributing factors
To create a dashboard to provide the client with an automated solution for understanding the effectiveness of their trainings and confidence level of the students

Data

The sponsor has provided us with five datasets - student data, pre- and post- survey data, request log data, and turnstile data.

The student dataset contains information about the current students of SMU across all batches. The record attributes are the following:

email (hashed to a 64-digit- long hexadecimal number for non-disclosure reasons)
education level
faculty
admission year
graduation year
degree program

The pre- and post- survey dataset contains responses of students before and after the first semester of freshman year on their confidence level in various research skills. Some of the record attributes are as follows:

email (hashed to a 64-digit- long hexadecimal number for non-disclosure reasons)
school
modules taken
library workshops attended

The request log dataset contains records captured by the library’s URL rewriting proxy server throughout the year of 2017. This dataset captures all user requests to external databases. The record attributes are the following:

user ID
session ID
search database
timestamp
search query

The turnstile dataset contains records captured by the library’s gantries throughout the year of 2017. This dataset captures physical taps on the gantries of the library. The record attributes are the following:

date
time
device name
email (hashed to a 64-digit- long hexadecimal number for non-disclosure reasons)

Project Methodology

The methodology that we went through can be divided into:

Data cleaning to remove redundancies and missing data points, and to make sense of the data that has been provided. We also merged pre- and post- survey data to be able to make a comparison.
Exploratory Data Analysis (EDA) to make some initial discoveries on the relationships between the changes in confidence of the freshmen and the library trainings that they underwent during the semester.

Moving on, we plan to:

Conduct correlation analysis to obtain solid insights about the correlations between the trainings and students' confidence levels.
Analyse the proxy logs data to gather further insights on students' search habits, with the goal of merging these findings with our correlation findings to craft salient recommendations.

Project Scope

Phase 0: Learning about the Case Context

We gathered information about the trainings conducted by the library to learn about the case context. This includes:

Mapping out the workshops and trainings conducted by the library across the semester targeted for freshmen
Reviewing the content of the trainings conducted and how they relate to the courses taken by freshmen from the different schools

Phase 1: Data Cleaning

As we were given several datasets by our sponsor, in the first phase, we studied the datasets to understand each of their variables and values to discern which ones would be useful given our project scope. Following that, we furthuer studied the variables and values of the datasets that we chose to use. This involved the following steps:

Recording the description and range for each variable and its values
Identifying irrelevant or duplicate fields
Resolving missing and invalid values
Cross-check related variables to verify accuracy
Transform variables for ease of analysis
Record assumptions made
Convert data values appropriately by removing null values, filling appropriate values
Combining related datasets on key variables
Documenting all of the above

Phase 2: Data Exploration

In the second phase, we conducted exploratory data analysis.

Exploratory data analysis steps include:

Studying the distributions of variables
Identifying and treating outliers/anomalies
Checking of assumptions about the relationships between the variables
Develop hypotheses based on literature

This analysis should go through a number of iterations, as we will continually compare our findings to existing literature as well as what we know of student behaviour.

Phase 3: Dashboard Creation

By the final phase, with a good understanding of the data and case, we will perform statistical analysis for showing which factors affect confidence. From this, we will be able to develop recommendations for SMU Library.

Steps include:

Conduct statistical analysis to show correlation between training and confidence
Interpret the analysis to develop strategies that SMU Library can adopt

Maximum Project Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools