Difference between revisions of "Maximum Project Overview"
Line 40: | Line 40: | ||
<br/> | <br/> | ||
− | The | + | The SMU Libraries Analytics and Research Department strives to develop a data-informed approach for achieving strategic objectives related to library operations and user needs. For this purpose, they have conducted an initial survey for the freshman batch of 2017 to evaluate the difference in their confidence level in various research skills before and after their first semester at SMU. The library currently aims to develop it’s trainings (content, methodology & availability) catering to solving specific student problems associated with those skills based on findings from the survey. Considering the importance of using library resources efficiently, it is central to understand the different trends and patterns students demonstrate in their usage of library resources and how it relates to attributes like modules undertaken, trainings attended and so forth. This will help us to provide the library with specific problems and targeted solutions based on schools and modules to eventually make the research process of an SMU student more effective and efficient. |
<br/><div align="left"> | <br/><div align="left"> | ||
Line 46: | Line 46: | ||
<br/> | <br/> | ||
− | + | The objectives of the project are the following: | |
+ | # Business objective: To discover the current confidence level of freshmen across different faculties and identify trends. Moreover, to explain, with clear visuals, how students have responded to different trainings for each skill at the end of the semester | ||
+ | # Technical objective: To use data analytics tools and statistical methods to study the data and obtain insights to facilitate the business objective | ||
− | + | To achieve our two primary objectives, we will need: | |
− | |||
− | |||
− | |||
− | |||
− | To achieve our two primary objectives, we will need | ||
* To understand the data domains | * To understand the data domains | ||
− | * To understand the | + | * To understand the library training process |
* To identify if there exist any students who experience high or low confidence and its contributing factors | * To identify if there exist any students who experience high or low confidence and its contributing factors | ||
− | * To create a | + | * To create a visual representation of the effectiveness of the trainings conducted during the semester, and provide recommendations. |
− | |||
− | |||
<br/><div align="left"> | <br/><div align="left"> | ||
Line 65: | Line 60: | ||
<br/> | <br/> | ||
− | The sponsor | + | The sponsor conducted two surveys with the freshman batch of 2017. Pre-survey was conducted before the start of the semester (Aug 2017) and post-survey at the end of the semester (Nov 2017). The pre and post survey datasets contain responses of students before and after the first semester on their confidence level in research skills. After having cleaned and compiled the two sheets, the record attributes are as follows: |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | The pre | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | <center> | ||
+ | [[Image:Team20_Data_Dictionary.jpg|1000px]] | ||
+ | </center> | ||
<br/><div align="left"> | <br/><div align="left"> | ||
Line 100: | Line 70: | ||
<br/> | <br/> | ||
− | + | Our methodology can be summarised as: | |
− | + | # We began our analysis by transforming the data that was provided | |
− | # | + | # We carried out exploratory data analysis using Tableau 10.0. This is where we did the visualisation analysis using the divergent stacked bar graphs |
− | # | + | # From the initial insights, we sought to statistically prove the relationships that were observed. For this, we used JMP Pro 13 to carry out the chi-squared tests |
− | + | # We conducted the text analysis to find out if students had any major issues with the trainings conducted | |
− | + | # Lastly, we used all the analysis done to give recommendations to the library | |
− | |||
− | |||
− | |||
<br/><div align="left"> | <br/><div align="left"> | ||
Line 114: | Line 81: | ||
<br/> | <br/> | ||
− | Phase | + | Phase 1: Learning about the Case Context |
We gathered information about the trainings conducted by the library to learn about the case context. This includes: | We gathered information about the trainings conducted by the library to learn about the case context. This includes: | ||
Line 120: | Line 87: | ||
* Reviewing the content of the trainings conducted and how they relate to the courses taken by freshmen from the different schools | * Reviewing the content of the trainings conducted and how they relate to the courses taken by freshmen from the different schools | ||
− | |||
− | + | Phase 2: Data Cleaning | |
+ | As we were given several datasets by our sponsor, in the first phase, we studied the datasets to understand each of their variables and values to discern which ones would be useful given our project scope. Following that, we furthur studied the variables and values of the datasets that we chose to use. | ||
+ | This steps include: | ||
# Recording the description and range for each variable and its values | # Recording the description and range for each variable and its values | ||
# Identifying irrelevant or duplicate fields | # Identifying irrelevant or duplicate fields | ||
Line 134: | Line 102: | ||
# Documenting all of the above | # Documenting all of the above | ||
− | Phase | + | |
+ | Phase 3: Data Exploration | ||
In the second phase, we conducted exploratory data analysis. | In the second phase, we conducted exploratory data analysis. | ||
− | + | The steps include: | |
− | |||
* Studying the distributions of variables | * Studying the distributions of variables | ||
* Identifying and treating outliers/anomalies | * Identifying and treating outliers/anomalies | ||
Line 144: | Line 112: | ||
* Develop hypotheses based on literature | * Develop hypotheses based on literature | ||
− | This analysis | + | This analysis was iterated a number of times, and we continually compared our findings to existing literature as well as what we knew of student behaviour. |
− | |||
− | + | Phase 4: Statistical Analysis | |
− | + | With a good understanding of the data and case, we performed statistical analysis. | |
− | + | The steps include: | |
− | + | * Conduct statistical analysis to show correlation between training and confidence | |
+ | * Interpret the analysis to develop strategies that SMU Library can adopt | ||
<br/> | <br/> |
Revision as of 01:27, 10 April 2018
The SMU Libraries Analytics and Research Department strives to develop a data-informed approach for achieving strategic objectives related to library operations and user needs. For this purpose, they have conducted an initial survey for the freshman batch of 2017 to evaluate the difference in their confidence level in various research skills before and after their first semester at SMU. The library currently aims to develop it’s trainings (content, methodology & availability) catering to solving specific student problems associated with those skills based on findings from the survey. Considering the importance of using library resources efficiently, it is central to understand the different trends and patterns students demonstrate in their usage of library resources and how it relates to attributes like modules undertaken, trainings attended and so forth. This will help us to provide the library with specific problems and targeted solutions based on schools and modules to eventually make the research process of an SMU student more effective and efficient.
The objectives of the project are the following:
- Business objective: To discover the current confidence level of freshmen across different faculties and identify trends. Moreover, to explain, with clear visuals, how students have responded to different trainings for each skill at the end of the semester
- Technical objective: To use data analytics tools and statistical methods to study the data and obtain insights to facilitate the business objective
To achieve our two primary objectives, we will need:
- To understand the data domains
- To understand the library training process
- To identify if there exist any students who experience high or low confidence and its contributing factors
- To create a visual representation of the effectiveness of the trainings conducted during the semester, and provide recommendations.
The sponsor conducted two surveys with the freshman batch of 2017. Pre-survey was conducted before the start of the semester (Aug 2017) and post-survey at the end of the semester (Nov 2017). The pre and post survey datasets contain responses of students before and after the first semester on their confidence level in research skills. After having cleaned and compiled the two sheets, the record attributes are as follows:
Our methodology can be summarised as:
- We began our analysis by transforming the data that was provided
- We carried out exploratory data analysis using Tableau 10.0. This is where we did the visualisation analysis using the divergent stacked bar graphs
- From the initial insights, we sought to statistically prove the relationships that were observed. For this, we used JMP Pro 13 to carry out the chi-squared tests
- We conducted the text analysis to find out if students had any major issues with the trainings conducted
- Lastly, we used all the analysis done to give recommendations to the library
Phase 1: Learning about the Case Context
We gathered information about the trainings conducted by the library to learn about the case context. This includes:
- Mapping out the workshops and trainings conducted by the library across the semester targeted for freshmen
- Reviewing the content of the trainings conducted and how they relate to the courses taken by freshmen from the different schools
Phase 2: Data Cleaning
As we were given several datasets by our sponsor, in the first phase, we studied the datasets to understand each of their variables and values to discern which ones would be useful given our project scope. Following that, we furthur studied the variables and values of the datasets that we chose to use. This steps include:
- Recording the description and range for each variable and its values
- Identifying irrelevant or duplicate fields
- Resolving missing and invalid values
- Cross-check related variables to verify accuracy
- Transform variables for ease of analysis
- Record assumptions made
- Convert data values appropriately by removing null values, filling appropriate values
- Combining related datasets on key variables
- Documenting all of the above
Phase 3: Data Exploration
In the second phase, we conducted exploratory data analysis. The steps include:
- Studying the distributions of variables
- Identifying and treating outliers/anomalies
- Checking of assumptions about the relationships between the variables
- Develop hypotheses based on literature
This analysis was iterated a number of times, and we continually compared our findings to existing literature as well as what we knew of student behaviour.
Phase 4: Statistical Analysis
With a good understanding of the data and case, we performed statistical analysis. The steps include:
- Conduct statistical analysis to show correlation between training and confidence
- Interpret the analysis to develop strategies that SMU Library can adopt