Difference between revisions of "Maximum Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(10 intermediate revisions by 3 users not shown)
Line 39: Line 39:
 
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Motivation</font></div>
 
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Motivation</font></div>
 
<br/>
 
<br/>
 
+
The SMU Libraries Analytics and Research Department strives to develop a data-informed approach for achieving strategic objectives related to library operations and user needs. For this purpose, they have conducted an initial survey for the freshman batch of 2017 to evaluate the difference in their confidence level in various research skills before and after their first semester at SMU. The library currently aims to develop it’s trainings (content, methodology & availability) catering to solving specific student problems associated with those skills based on findings from the survey. Considering the importance of using library resources efficiently, it is central to understand the different trends and patterns students demonstrate in their usage of library resources and how it relates to attributes like modules undertaken, trainings attended and so forth. This will help us to provide the library with specific problems and targeted solutions based on schools and modules to eventually make the research process of an SMU student more effective and efficient.
The library currently aims to optimise its resource availability and distributions channels to maximise the learning effectiveness of its students. This could be in terms of increasing resources available for certain highly searched topics, altering current trainings and workshops to focus on any common mistakes committed by students while using the assets or finding any unexpected trends in user journey through digital and physical touch points. They further want to know if usage patterns vary between students based on certain attributes like Programme, Year of Graduation and Education Level. For this purpose, they have conducted an initial survey for the freshman batch of 2017 to evaluate the difference in their confidence level in various research skills before and after joining SMU, factoring in several considerations like modules taken, library workshops attended and so on and so forth. They wish for us to understand if this survey contains any actionable insights.
 
  
 
<br/><div align="left">
 
<br/><div align="left">
 
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Objectives</font></div>
 
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Objectives</font></div>
 
<br/>
 
<br/>
 +
The objectives of the project are the following:
 +
# Business objective: To discover the current confidence level of freshmen across different faculties and identify trends. Moreover, to explain, with clear visuals, how students have responded to different trainings for each skill at the end of the semester
 +
# Technical objective: To use data analytics tools and statistical methods to study the data and obtain insights to facilitate the business objective
  
We had an initial discussion with our project sponsor and they would like us to create a visual dashboard to ascertain the relationship between the initiatives and resources of the library, and student performance (in terms of confidence and optimal usage of resources).
+
To achieve our two primary objectives, we will need:
 
 
The objectives of the project would be of the following:
 
 
 
# Business objective: To identify factors that relate to and predict student confidence in performing library research tasks and help improve library training initiatives.
 
# Technical objective: To use data analytics tools and statistical methods to study the data and obtain insights that would facilitate the business objective.
 
 
 
To achieve our two primary objectives, we will need to:
 
 
* To understand the data domains
 
* To understand the data domains
* To understand the current library training process
+
* To understand the library training process
 
* To identify if there exist any students who experience high or low confidence and its contributing factors
 
* To identify if there exist any students who experience high or low confidence and its contributing factors
* To create a dashboard to provide the client with an automated solution for understanding the effectiveness of their trainings and confidence level of the students
+
* To create a visual representation of the effectiveness of the trainings conducted during the semester, and provide recommendations.
  
 +
<br/><div align="left">
 +
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Data</font></div>
 +
<br/>
  
 +
Our sponsor conducted two surveys with the freshman batch of 2017. Pre-survey was conducted before the start of the semester (Aug 2017) and post-survey at the end of the semester (Nov 2017). The pre and post survey datasets contain responses of students before and after the first semester on their confidence level in research skills.
  
 
<br/><div align="left">
 
<br/><div align="left">
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Data</font></div>
+
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Methodology</font></div>
 
<br/>
 
<br/>
 +
Our methodology can be summarised as:
 +
# We began our analysis by understanding the data domains that were provided along with secondary research
 +
#We continued to clean to data and transform it according to research requirements
 +
# We carried out exploratory data analysis using Tableau 10.0. This is where we did the visualisation analysis using the divergent stacked bar graphs
 +
# From the initial insights, we sought to statistically prove the relationships that were observed. For this, we used JMP Pro 13 to carry out the chi-squared tests
 +
# We conducted the text analysis to find out if students had any major issues with the trainings conducted
 +
# Lastly, we used all the analysis done to give recommendations to the library
  
The sponsor has provided us with five datasets - student data, request log data, turnstile data, and pre and post survey data.
+
<br/><div align="left">
 +
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Scope</font></div>
 +
<br/>
 +
'''Phase 1: Learning about the Case Context'''
  
The student dataset contains information about the current students of SMU across all batches. The record attributes are the following:
+
We gathered information about the trainings conducted by the library to learn about the case context. This includes:
* email (hashed to a 64-digit- long hexadecimal number for non-disclosure reasons)
+
* Mapping out the workshops and trainings conducted by the library across the semester targeted for freshmen
* education level
+
* Reviewing the content of the trainings conducted and how they relate to the courses taken by freshmen from the different schools
* faculty
 
* admission year
 
* graduation year
 
* degree program
 
  
The request log dataset contains records captured by the library’s URL rewriting proxy server throughout the year of 2017. This dataset captures all user requests to external databases. The record attributes are the following:
+
'''Phase 2: Data Cleaning'''
* user ID
 
* session ID
 
* search database
 
* timestamp
 
* search query
 
  
The turnstile dataset contains records captured by the library’s gantries throughout the year of 2017. This dataset captures physical taps on the gantries of the library. The record attributes are the following:
+
As we were given several datasets by our sponsor, in the first phase, we studied the datasets to understand each of their variables and values to discern which ones would be useful given our project scope. Following that, we furthur studied the variables and values of the datasets that we chose to use.  
* date
+
The steps include:
* time
+
# Recording the description and range for each variable and its values
* device name
+
# Identifying irrelevant or duplicate fields
* email (hashed to a 64-digit- long hexadecimal number for non-disclosure reasons)
+
# Resolving missing and invalid values
 +
# Cross-check related variables to verify accuracy
 +
# Transform variables for ease of analysis
 +
# Record assumptions made
 +
# Convert data values appropriately by removing null values, filling appropriate values
 +
# Combining related datasets on key variables
 +
# Documenting all of the above
  
The pre and post survey dataset contains responses of students before and after the first semester of freshman year on their confidence level in various research skills. Some of the record attributes are as follows:
+
'''Phase 3: Data Exploration'''
* email (hashed to a 64-digit- long hexadecimal number for non-disclosure reasons)
 
* school
 
* modules taken
 
* library workshops attended
 
  
<br/><div align="left">
+
In the second phase, we conducted exploratory data analysis.
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Methodology</font></div>
+
The steps include:
<br/>
+
* Studying the distributions of variables
 +
* Identifying and treating outliers/anomalies
 +
* Checking of assumptions about the relationships between the variables
 +
* Develop hypotheses based on literature
  
As we have not obtained the data until the NDA is signed, we will only share our initial thought process of how we will tackle the project. We shall adopt closely to the Data Analytics Lifecycle approach.
+
This analysis was iterated a number of times, and we continually compared our findings to existing literature as well as what we knew of student behaviour.
  
Our plan of action is to discern the effectiveness of the library eBook databases in meeting the research needs of students. By analysing the proxy entries, we can define the usage pattern of its users and divide them into distinct clusters based on demographic and behavioural traits. Furthermore, we intend to track student user journey once they start interacting with the several physical and digital touch points sequentially. As such, we have also conducted a secondary research from various university published articles to gain a broad understanding of now turnstile and proxy data could be used to draw insights.
+
'''Phase 4: Statistical Analysis'''
  
At this phase of the project, we will focus on understanding the given dataset and clean the data. Concurrently, we will decide on the analytical model and prepare the data accordingly.
+
With a good understanding of the data and case, we performed statistical analysis.
 
+
The steps include:
<br/><div align="left">
+
* Conduct statistical analysis to test the relationship between training and confidence
<div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #4AB6A6 solid 32px;"><font color="#4AB6A6">Project Scope</font></div>
+
* Interpret the analysis to develop strategies that SMU Library can adopt
<br/>
 
  
While the project will be primarily focussed on answering the questions mentioned above, our client has been supportive enough to let us experiment with different analytical tools and present any other significant insights we derive.
+
'''Phase 5: Text Analysis'''
  
* We will be unable to conduct a yearly or seasonal analysis as the dataset is limited to records from 2017 only.
+
The post survey contains a column with the comments of the respondents. We conducted word frequency analysis on the comments to derive insights about how the students feel about the library trainings.
* The dataset pertains to all students of SMU who used the library resources in the said time-period. However, the survey was only conducted for the freshman batch.
+
The steps include:
 +
* Identify relevant words for analysis (adjectives, nouns and verbs)
 +
* Determine minimum frequency for a word to be considered commonly used in the comments
  
 
<br/>
 
<br/>

Latest revision as of 12:44, 12 April 2018

Team20 Logo.jpg


HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

BACK TO MAIN PAGE

 


Project Motivation


The SMU Libraries Analytics and Research Department strives to develop a data-informed approach for achieving strategic objectives related to library operations and user needs. For this purpose, they have conducted an initial survey for the freshman batch of 2017 to evaluate the difference in their confidence level in various research skills before and after their first semester at SMU. The library currently aims to develop it’s trainings (content, methodology & availability) catering to solving specific student problems associated with those skills based on findings from the survey. Considering the importance of using library resources efficiently, it is central to understand the different trends and patterns students demonstrate in their usage of library resources and how it relates to attributes like modules undertaken, trainings attended and so forth. This will help us to provide the library with specific problems and targeted solutions based on schools and modules to eventually make the research process of an SMU student more effective and efficient.


Project Objectives


The objectives of the project are the following:

  1. Business objective: To discover the current confidence level of freshmen across different faculties and identify trends. Moreover, to explain, with clear visuals, how students have responded to different trainings for each skill at the end of the semester
  2. Technical objective: To use data analytics tools and statistical methods to study the data and obtain insights to facilitate the business objective

To achieve our two primary objectives, we will need:

  • To understand the data domains
  • To understand the library training process
  • To identify if there exist any students who experience high or low confidence and its contributing factors
  • To create a visual representation of the effectiveness of the trainings conducted during the semester, and provide recommendations.

Data


Our sponsor conducted two surveys with the freshman batch of 2017. Pre-survey was conducted before the start of the semester (Aug 2017) and post-survey at the end of the semester (Nov 2017). The pre and post survey datasets contain responses of students before and after the first semester on their confidence level in research skills.


Project Methodology


Our methodology can be summarised as:

  1. We began our analysis by understanding the data domains that were provided along with secondary research
  2. We continued to clean to data and transform it according to research requirements
  3. We carried out exploratory data analysis using Tableau 10.0. This is where we did the visualisation analysis using the divergent stacked bar graphs
  4. From the initial insights, we sought to statistically prove the relationships that were observed. For this, we used JMP Pro 13 to carry out the chi-squared tests
  5. We conducted the text analysis to find out if students had any major issues with the trainings conducted
  6. Lastly, we used all the analysis done to give recommendations to the library

Project Scope


Phase 1: Learning about the Case Context

We gathered information about the trainings conducted by the library to learn about the case context. This includes:

  • Mapping out the workshops and trainings conducted by the library across the semester targeted for freshmen
  • Reviewing the content of the trainings conducted and how they relate to the courses taken by freshmen from the different schools

Phase 2: Data Cleaning

As we were given several datasets by our sponsor, in the first phase, we studied the datasets to understand each of their variables and values to discern which ones would be useful given our project scope. Following that, we furthur studied the variables and values of the datasets that we chose to use. The steps include:

  1. Recording the description and range for each variable and its values
  2. Identifying irrelevant or duplicate fields
  3. Resolving missing and invalid values
  4. Cross-check related variables to verify accuracy
  5. Transform variables for ease of analysis
  6. Record assumptions made
  7. Convert data values appropriately by removing null values, filling appropriate values
  8. Combining related datasets on key variables
  9. Documenting all of the above

Phase 3: Data Exploration

In the second phase, we conducted exploratory data analysis. The steps include:

  • Studying the distributions of variables
  • Identifying and treating outliers/anomalies
  • Checking of assumptions about the relationships between the variables
  • Develop hypotheses based on literature

This analysis was iterated a number of times, and we continually compared our findings to existing literature as well as what we knew of student behaviour.

Phase 4: Statistical Analysis

With a good understanding of the data and case, we performed statistical analysis. The steps include:

  • Conduct statistical analysis to test the relationship between training and confidence
  • Interpret the analysis to develop strategies that SMU Library can adopt

Phase 5: Text Analysis

The post survey contains a column with the comments of the respondents. We conducted word frequency analysis on the comments to derive insights about how the students feel about the library trainings. The steps include:

  • Identify relevant words for analysis (adjectives, nouns and verbs)
  • Determine minimum frequency for a word to be considered commonly used in the comments