IS428 AY2019-20T2 Assign CHOY YU MIN JUSTIN

Background and Motivation

SMU Libraries strives to achieve the following Mission and vision

Mission:

To enable a culture of life-long learning through collaboration, engagement and outreach. It aims to provide seamless access to information using innovative and leading edge technology. The Library is committed to delivering exceptional services and building dynamic relationships within the SMU community and beyond.

Vision:

To be a leading research library providing ubiquitous access to information using innovative strategies to drive intellectual exchange and the creation of knowledge.

As such, they have built their library service offerings that cover 4 main areas: Communication, Facilities and Equipment, Information & Resources, and Service delivery. These offerings cater to the Library's 4 main stakeholders - Faculty, Graduates, Staff, and Undergraduates (The biggest stakeholder). We can also understand that these stakeholders mainly fall under SMU's schools (Law, Information Systems, Social Sciences, Economics, Business(Largest number), Accountancy, and Others.

To ensure that SMU's offerings are constantly relevant and meet the demands of their stakeholders, SMU Libraries conducts a 2 yearly survey to gather important KPIs to better improve their services. However, this data is large in dimentionality and is largely opinionated in nature. This makes it difficult to draw truthful insights and careful analytics is required to clean, process, and visualise the data for accurate insights. Given the richness of the 2018 Library survey, it is all the more exciting to carefully unpack the data and visualise it to help SMU Libraries draw key insights to better improve their offerings.

Objectives

The objective of this project is to build a good and truthful data visualisation to help gather key insights from the libraries four main stakeholders:

The undergraduate students,
The postgraduate students,
The faculty,
The staff.

Success of the project would be seen as the following: (Users are able to...)

Able to acquire clear insight on how the library is doing overall.
Able to acquire clear insight on how satisfied each stakeholder group is with each offering.
Able to identify specific areas that contribute to dissatisfaction/satisfaction through survey questions.
Numbers backed up by relevant comments that show more objective feedback for actionable improvement.

Dataset Analysis

Raw Data Set

The Data provided was a set of 3 files:

Raw data 2018-03-07 SMU LCS data file - KLG.xlsx
2018-02-16 SMU Library Survey Comments MAC.xls
SMULibraries_BeHeardSurvey_FullReport.pdf

Of which, I only utilized "Raw data 2018-03-07 SMU LCS data file - KLG.xlsx" (I will refer to as "RAW DATA" from hereon), as it contained all the necessary fields and data points needed for the analysis. Nonetheless, as the first step of the project, I reviewed all the provided documents, and read through the BeHeardSurvey Full Report to get a baseline understanding of the fields available and used to evaluate the Library's performance. From here, I got a clear picture of the fields that were available that I needed to pre-process for my Data Visualisation. I also worked through the report to gain some initial ideas and things they did well and did not do well. The RAW DATA contains two sheets: Encoded Raw data, and Legend (mapping for encoding); as these are confusing, I will give a high level summary of the data fields available (and relevant) and their rough description below

Below is a breakdown of the key fields available in the RAW DATA:

No.	Field	Description
1.	Campus	This data is an encoding that represents which library the respondent most frequently used. options were 1:Li Ka Shing Library, 2:Kwa Geok Choo Law Library
2.	Position	This data is an encoding that represents the respondent's stakeholder group in high detail (e.g. year1-4 undergrad, Graduate: doctoral... etc)
3.	Study Area	This data is an encoding specific to graduates and undergraduates and it represents their field of study e.g. Accounting/Law.
4.	Frequency of Visit	This data comes from the survey question "How frequently do you visit the library" and it is key for us to get an idea of the "loyalty" of stakeholders.
5.	Likelihood of recommendation	This follows the market practice of collecting data to calculate Net promoter score. This field with answers from 1-10 needs to be pivoted and pre-processed.
6.	Communication Offering Survey Questions (Importance & Performance)	This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Communication offering.
7.	Facilities and Equipment Offering Survey Questions (Importance & Performance)	This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Facilities and Equipment offering.
8.	Information & Resources Offering Survey Questions (Importance & Performance)	This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Information & Resources offering.
9.	Service Delivery Offering Survey Questions (Importance & Performance)	This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Service Delivery offering.
10.	Comments	This data comes in the format of free text, and should be made accessible in the Data visualization when relevant to help users go deeper than just a numeric opinion evaluation.

Dataset Transformation

As the data generally existed in two sheets (1.Encoded raw data, 2. Legend Mapping), this served to be a big challenge that required a mix of pivot tables and creating calculated fields in order to join the data into one clean data source. Similarly, in the data cleaning process, I came to realize that there were some Visualizations that I wanted that would need me to carefully perform inner joins to ensure that the relevant dependencies existed for my analysis later. Lastly, as many questions were based on a Likert Scale, it was all opinionated and thus it would not be accurate to stick to the current Library mapping (which was by ordinal ranking). As such, I needed to re-map all the Likert answers accordingly into categorical fields. These steps are outlined below with their detailed problems and solutions specified accordingly.

Handling NA Responses

Problem:

The library survey allowed for NA (Not applicable), and I did not want to factor these responses into the total percentage calculations later on. This can be seen from the raw data set (shown below)

Solution:

In my tableau prep pre-processing step, I used a simple remove fields function before any pivoting to ensure that no NA records would be included for my analysis later.

Pivoting Encoded Survey Data

Problem:

Survey data was stored into encoded fields in the raw data sheet.
Different sets of survey questions with different sets of responses were used (e.g. likert 1-7, NPS 1-10... etc). This made the pivoting a lot more tricky as seen below

Solution:

In my tableau prep pre-processing step, I set up pivot fields for both Raw data sheet and "Legend" Sheet to be inner joined carefully later.

Encoding of Data (Inner Joining Legend to Raw Data)

Problem:

After pivoting, the data was still encoded and needed to be inner joined back to mapping for tableau analysis.
Such Inner Joining would be tricky as it is between two sheets and we can expect many Null values to appear after each inner join.

Solution:

I used multiple Inner Joins with remove data steps to slowly build the master data set correctly. One such example is shown below.

Eventually, after all the pivoting, filtering, and grouping, I exported the .hyper dataset to tableau as the baseline. The above steps are outlined in the tableau prep process shown below (click for clearer view).

Handling Likert (ordinal to categorical)

Problem:

Even after building the sourcedata, I still needed to re-map the Likert options respectively into categorical values.
Ordinal values and numbers should be avoided for likert scale options as they are opinionated and the gaps between numbers are not objective

Solution:

To fix this issue, I did some research on good likert scales and settled on the following 7 scale ranking: Very Low, Low, Below Moderate, Moderate, Above Moderate, High, Very High.
To perform the mapping, I created a simple tableau calculated field as shown below.

Interactive Visualization

The interactive visualization can be accessed here: https://public.tableau.com/profile/justin.choy#!/vizhome/JustinChoySMULibrarySurveyAnalysis/SMULibrary2018Storyboard

Note to use the following settings for best experience:
- Set view to fullscreen
- Zoom out till 80%-90% when charts are most aligned.

Data Viz user journey Design

I am a firm believer that we will rarely find something if we do not know what we are looking for. As such, I have designed the overall storyboard to help users first explore at a high level to first gather some interesting findings and then follow through with the rest of the dashboards to dig up deeper insights based on the general question they already have. Similarly, to allow users to get a clear starting insight faster, I have built in meaningful filters between charts of each dashboard to help analysts focus their attention on factors that are more relevant to each other. The outcome of this design is the storyboard flow as shown below.

The dashboard modules in blue basically serve to help the user get higher level information on the library's key performance indicators and help them narrow down a little on a high level insight they might want to use the yellow dashboard modules to explore. The yellow dashboard modules on the other hand allow for a more granular level of exploration and helps users gather actionable insights so that they know where the library can be improved and how they can improve it for the specific stakeholder groups. I will go into detail explaining each dashboard.

Survey Summary

This is the storyboard view of the Survey Summary Visualisation and its features:

No.	Feature	Description & Benefits
1.	Filters by School and Library	Description: These filters will allow users to filter the dashboard charts by school and/or library. This is important as different managers/department heads might want to perform analysis based on specific libraries, between libraries, or for specific students. Benefits: Allows for quick high level analysis by filters. Helps users get more precise insights on groups of interest.
2.	Number of Responses by Library with Hover over comments.	Description: This chart allows the user to always be aware of the number of responses that contribute to each percentage or each group selected. Moreover, this chart responds to other filters and can show comments of students when hovered over. Benefits: Most charts show percentages and it helps users keep aware of the total number of responses being shown at all times. After a setting all the filters of interest, the user can use this bar to look at some comments from the respondents that contributed to the results such as NPS etc.
3.	Contributing Respondents	Description: This is a simple bar that allows the user to know how many participants contributed to this survey from each of their stakeholder groups. Benefits: Similar to the library chart, this chart helps to keep the user in touch with the number of stakeholders that contribute to the many percentage charts shown.
4.	Key Overall Performance Indicators	Description: This set of charts give the user a quick flavor of how satisfied each of their stakeholder groups are. Benefits: This high level view can help users narrow down and focus on the groups of their interest and develop initial high level insights to explore further (e.g. "Undergrads seem to have the lowest overall satisfaction and NPS. What might be contributing to this?"
5.	NPS Distribution	Description: This dynamic chart is interactive and allows users to filter respondents to find out more about the promoters and detractors of each stakeholder group. Benefits: Helps users quickly filter other charts to identify danger/positive groups of stakeholders and know more about their profile and some of their comments.

Likert Overview

No.	Feature	Description & Benefits
1.	Filters by Service Offering, Likert Response, Library	Description: These filters will allow users to filter the dashboard charts by Service Offering, Likert Response, Library. This allows users to follow from the Survey summary and investigate one step deeper into service offerings. Benefits: Allows for quick high level analysis by filters. Helps users get one step deeper but still high level at service offering.
2.	Diverging Stacked Bar Chart (Likert scale)	Description: This chart is a gantt view chart that shows the proportion of respondents according to each likert response. Benefits: Allows users to get high level view of distribution of responses for each group. As it is now colored according to Likert responses, users can deduce a more truthful insight than looking at average score analysis.
3.	Gap Analysis Per Likert Scale Option	Description: This chart is interactively linked to the Likert Selector and calculates the gap between number of responses for (Performance - Importance) divided by the total number of respondents per group. Benefits: This bar chart allows for much easier comparison between performance and importance for each likert answer. Users can quickly identify the biggest positive and negative gaps per offering to do further analysis.

Granular Exploration: Communication, Facilities & Equipment, Information & Resources, Service Delivery

No.	Feature	Description & Benefits
1.	Filters by School, Likert Response, Library	Description: These filters will allow users to filter the dashboard charts by School, Likert Response, Library. This allows users to dive into granular details and find out how each of their stakeholders feel about sub-offerings based on specific survey questions. Benefits: Allows for in depth and flexible analysis by questions and stakeholders. Helps narrow down on possible action items and their respective target groups to improve library performance.
2.	Diverging Stacked Bar Chart (Likert scale) by survey question	Description: This chart is a gantt view chart that shows the proportion of respondents according to each likert response. Benefits: Allows users to analyse each question and find out who and how many stakeholders are responding positively/negatively by survey question.
3.	Gap Analysis Per Likert Scale Option	Description: This chart is interactively linked to the Likert Selector and calculates the gap between number of responses for (Performance - Importance) divided by the total number of respondents per group. Benefits: This bar chart allows for much easier comparison between performance and importance for each likert answer. Users can quickly identify the biggest positive and negative gaps and proportion of respondents who contributed to this result. This assists future planning and allows for more targeted action items.

IS428 AY2019-20T2 Assign CHOY YU MIN JUSTIN

Contents

Background and Motivation

Objectives

Dataset Analysis

Raw Data Set

Dataset Transformation

Handling NA Responses

Pivoting Encoded Survey Data

Encoding of Data (Inner Joining Legend to Raw Data)

Handling Likert (ordinal to categorical)

Interactive Visualization

Data Viz user journey Design

Survey Summary

Likert Overview

Granular Exploration: Communication, Facilities & Equipment, Information & Resources, Service Delivery

Analysis & Insights

Undergraduate Students

Li Ka Shing Library

Kwa Geok Choo Law Library

Postgraduate Students

Li Ka Shing Library

Kwa Geok Choo Law Library

Faculty

Li Ka Shing Library

Kwa Geok Choo Law Library

Staff

Li Ka Shing Library

Kwa Geok Choo Law Library

Conclusion & Future Work

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools