IS428 AY2019-20T2 Assign CHOY YU MIN JUSTIN
Contents
Background and Motivation
SMU Libraries strives to achieve the following Mission and vision
Mission:
- To enable a culture of life-long learning through collaboration, engagement and outreach. It aims to provide seamless access to information using innovative and leading edge technology. The Library is committed to delivering exceptional services and building dynamic relationships within the SMU community and beyond.
Vision:
- To be a leading research library providing ubiquitous access to information using innovative strategies to drive intellectual exchange and the creation of knowledge.
As such, they have built their library service offerings that cover 4 main areas: Communication, Facilities and Equipment, Information & Resources, and Service delivery. These offerings cater to the Library's 4 main stakeholders - Faculty, Graduates, Staff, and Undergraduates (The biggest stakeholder). We can also understand that these stakeholders mainly fall under SMU's schools (Law, Information Systems, Social Sciences, Economics, Business(Largest number), Accountancy, and Others.
To ensure that SMU's offerings are constantly relevant and meet the demands of their stakeholders, SMU Libraries conducts a 2 yearly survey to gather important KPIs to better improve their services. However, this data is large in dimentionality and is largely opinionated in nature. This makes it difficult to draw truthful insights and careful analytics is required to clean, process, and visualise the data for accurate insights. Given the richness of the 2018 Library survey, it is all the more exciting to carefully unpack the data and visualise it to help SMU Libraries draw key insights to better improve their offerings.
Objectives
The objective of this project is to build a good and truthful data visualisation to help gather key insights from the libraries four main stakeholders:
- The undergraduate students,
- The postgraduate students,
- The faculty,
- The staff.
Success of the project would be seen as the following: (Users are able to...)
- Able to acquire clear insight on how the library is doing overall.
- Able to acquire clear insight on how satisfied each stakeholder group is with each offering.
- Able to identify specific areas that contribute to dissatisfaction/satisfaction through survey questions.
- Numbers backed up by relevant comments that show more objective feedback for actionable improvement.
Dataset Analysis
Raw Data Set
The Data provided was a set of 3 files:
- Raw data 2018-03-07 SMU LCS data file - KLG.xlsx
- 2018-02-16 SMU Library Survey Comments MAC.xls
- SMULibraries_BeHeardSurvey_FullReport.pdf
Of which, I only utilized "Raw data 2018-03-07 SMU LCS data file - KLG.xlsx" (I will refer to as "RAW DATA" from hereon), as it contained all the necessary fields and data points needed for the analysis. Nonetheless, as the first step of the project, I reviewed all the provided documents, and read through the BeHeardSurvey Full Report to get a baseline understanding of the fields available and used to evaluate the Library's performance. From here, I got a clear picture of the fields that were available that I needed to pre-process for my Data Visualisation. I also worked through the report to gain some initial ideas and things they did well and did not do well. The RAW DATA contains two sheets: Encoded Raw data, and Legend (mapping for encoding); as these are confusing, I will give a high level summary of the data fields available (and relevant) and their rough description below
Below is a breakdown of the key fields available in the RAW DATA:
No. | Field | Description |
---|---|---|
1. | Campus | This data is an encoding that represents which library the respondent most frequently used. options were 1:Li Ka Shing Library, 2:Kwa Geok Choo Law Library |
2. | Position | This data is an encoding that represents the respondent's stakeholder group in high detail (e.g. year1-4 undergrad, Graduate: doctoral... etc) |
3. | Study Area | This data is an encoding representing the respondent's field of study e.g. Accounting/Law. |
4. | Frequency of Visit | This data comes from the survey question "How frequently do you visit the library" and it is key for us to get an idea of the "loyalty" of stakeholders. |
5. | Likelihood of recommendation | This follows the market practice of collecting data to calculate Net promoter score. This field with answers from 1-10 needs to be pivoted and pre-processed. |
6. | Communication Offering Survey Questions (Importance & Performance) | This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Communication offering. |
7. | Facilities and Equipment Offering Survey Questions (Importance & Performance) | This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Facilities and Equipment offering. |
8. | Information & Resources Offering Survey Questions (Importance & Performance) | This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Information & Resources offering. |
9. | Service Delivery Offering Survey Questions (Importance & Performance) | This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Service Delivery offering. |
10. | Comments | This data comes in the format of free text, and should be made accessible in the Data visualization when relevant to help users go deeper than just a numeric opinion evaluation. |
Dataset Transformation
As the data generally existed in two sheets (1.Encoded raw data, 2. Legend Mapping), this served to be a big challenge that required a mix of pivot tables and creating calculated fields in order to join the data into one clean data source. Similarly, in the data cleaning process, I came to realize that there were some Visualizations that I wanted that would need me to carefully perform inner joins to ensure that the relevant dependencies existed for my analysis later. Lastly, as many questions were based on a Likert Scale, it was all opinionated and thus it would not be accurate to stick to the current Library mapping (which was by ordinal ranking). As such, I needed to re-map all the Likert answers accordingly into categorical fields. These steps are outlined below with their detailed problems and solutions specified accordingly.
Handling NA Responses
Problem:
- The library survey allowed for NA (Not applicable), and I did not want to factor these responses into the total percentage calculations later on. This can be seen from the raw data set (shown below)
Solution:
- In my tableau prep pre-processing step, I used a simple remove fields function before any pivoting to ensure that no NA records would be included for my analysis later.
Pivoting Encoded Survey Data
Problem:
- Survey data was stored into encoded fields in the raw data sheet.
- Different sets of survey questions with different sets of responses were used (e.g. likert 1-7, NPS 1-10... etc). This made the pivoting a lot more tricky as seen below
Solution:
- In my tableau prep pre-processing step, I set up pivot fields for both Raw data sheet and "Legend" Sheet to be inner joined carefully later.
Encoding of Data (Inner Joining Legend to Raw Data)
Problem:
- After pivoting, the data was still encoded and needed to be inner joined back to mapping for tableau analysis.
- Such Inner Joining would be tricky as it is between two sheets and we can expect many Null values to appear after each inner join.
Solution:
- I used multiple Inner Joins with remove data steps to slowly build the master data set correctly. One such example is shown below.
Eventually, after all the pivoting, filtering, and grouping, I exported the .hyper dataset to tableau as the baseline. The above steps are outlined in the tableau prep process shown below (click for clearer view).
Handling Likert (ordinal to categorical)
Problem:
- Even after building the sourcedata, I still needed to re-map the Likert options respectively into categorical values.
- Ordinal values and numbers should be avoided for likert scale options as they are opinionated and the gaps between numbers are not objective
Solution:
- To fix this issue, I did some research on good likert scales and settled on the following 7 scale ranking: Very Low, Low, Below Moderate, Moderate, Above Moderate, High, Very High.
- To perform the mapping, I created a simple tableau calculated field as shown below.
Interactive Visualization
The interactive visualization can be accessed here: https://public.tableau.com/profile/justin.choy#!/vizhome/JustinChoySMULibrarySurveyAnalysis/SMULibrary2018Storyboard
- Note to use the following settings for best experience:
- Set view to fullscreen
- Zoom out till 80%-90% when charts are most aligned.
Data Viz user journey Design
I am a firm believer that we will rarely find something if we do not know what we are looking for. As such, I have designed the overall storyboard to help users first explore at a high level to first gather some interesting findings and then follow through with the rest of the dashboards to dig up deeper insights based on the general question they already have. Similarly, to allow users to get a clear starting insight faster, I have built in meaningful filters between charts of each dashboard to help analysts focus their attention on factors that are more relevant to each other. The outcome of this design is the storyboard flow as shown below.
The dashboard modules in blue basically serve to help the user get higher level information on the library's key performance indicators and help them narrow down a little on a high level insight they might want to use the yellow dashboard modules to explore. The yellow dashboard modules on the other hand allow for a more granular level of exploration and helps users gather actionable insights so that they know where the library can be improved and how they can improve it for the specific stakeholder groups. I will go into detail explaining each dashboard.
Survey Summary
This is the storyboard view of the Survey Summary Visualisation and its features:
No. | Feature | Description & Benefits |
---|---|---|
1. | Filters by School and Library | Description:
Benefits:
|
2. | Number of Responses by Library with Hover over comments. | Description:
Benefits:
|
3. | Contributing Respondents | Description:
Benefits:
|
4. | Key Overall Performance Indicators | Description:
Benefits:
|
5. | NPS Distribution | Description:
Benefits:
|
Likert Overview
No. | Feature | Description & Benefits |
---|---|---|
1. | Filters by Service Offering, Likert Response, Library | Description:
Benefits:
|
2. | Diverging Stacked Bar Chart (Likert scale) | Description:
Benefits:
|
3. | Gap Analysis Per Likert Scale Option | Description:
Benefits:
|
Granular Exploration: Communication, Facilities & Equipment, Information & Resources, Service Delivery
No. | Feature | Description & Benefits |
---|---|---|
1. | Filters by School, Likert Response, Library | Description:
Benefits:
|
2. | Diverging Stacked Bar Chart (Likert scale) by survey question | Description:
Benefits:
|
3. | Gap Analysis Per Likert Scale Option | Description:
Benefits:
|
Analysis & Insights
Undergraduate Students
Li Ka Shing Library
Finding 1 - Undergraduates seem to have the lowest NPS & Proportion of High/Very High Satisfaction
At surface level, undergraduates seem to be the least satisfied compared to other stakeholders. A quick hover over the library bar comments revealed a large bulk of respondents' unhappiness about limited study places, seats, and opening hours. There are also a handful unhappy about toilet facilities and water coolers. This begs the question for us to look closer into Facilities and Equipment.
Furthermore, from looking at the distribution by school, we notice that business students make up about a third of the detractors (38%), we also see that business students visit frequency tends to largely fall into two main groups - Daily(32%) and Weekly(36%). Essentially:
- Undergraduates lowest NPS & Proportion of High/Very High Satisfaction
- Unhappiness might stem from Facilities and Equipment
- Business students make up a third of detractors and have quite high frequency of visit.
Finding 2 - Facilities and Equipment the likely cause for undergraduate dissatisfaction.
From these charts we can clearly see that the biggest gap between performance and importance is in the Very High ranking of the likert scale. In this ranking, we notice that the gap for Facilities and equipment is almost two times as much as that of the gap from the overall performance of all offerings. This is in line with the first suspicion and prompts us to investigate down to the granular survey question level to find the main culprits that bring down this offering. Essentially:
- Confirmed that Facilities and Equipment is likely cause for undergraduate dissatisfaction.
- Need to investigate further to identify main areas to improve.
Finding 3 - Identified top 3 main areas contributing to performance gap
From these charts it is clear that there are three main sub-offerings under facilities and equipment that are the main contributors to the dissatisfaction of undergraduates. These areas are (ranked from most severe to least severe):
- "Able to find a quiet place in the library to study when needed"
- "Able to find a place in the library to work in a group when needed"
- "Printing, scanning, photocopying available and meets needs"
Having discovered this, the library can now focus more of its resources to close these performance gaps. This finding thus demonstrates the use of this data visualization as designed, to identify general insights and to slowly work towards concrete actionable wisdom and decision making.