IS428 AY2019-20T2 Assign TANNY LAI

From Visual Analytics for Business Intelligence
Revision as of 00:23, 15 March 2020 by Tanny.lai.2017 (talk | contribs)
Jump to navigation Jump to search

Problem & Motivation

Problem
Every two years, SMU Libraries conduct a comprehensive survey in which faculty, students and staff have the opportunity to rate various aspects of SMU library's services. The survey provides SMU libraries with input to help enhance existing services and to anticipate emerging needs of SMU faculty, students and staff. However, the survey results presented by InSync lack of friendly and interactive visualization to aid the library staffs to easily understand the results and derive value-adding insights.

Motivation
In class, we have learned that we should always use a divergent stack bar chart to do analysis on Likert scales and not to find the average of the scale. Upon viewing the survey report done by Insync, I feel inspired to apply the concepts, methods, and techniques I have learned in my Visual Analytics IS428 class. Furthermore, it would be far more interesting and practical to be able to resolve real-world problems using visual analytics skills that I have learned.

The interactive visualization can reveal the level of services provided by SMU libraries as perceived by:

  • The undergraduate students,
  • The postgraduate students,
  • The faculty,
  • The staff

Dataset Analysis & Transformation Process

Understanding the dataset

Before diving in to do an in-depth analysis and creating charts, it is essential to understand the dataset given and identify the best method for data preparation by understanding the respective format and attributes(columns). The dataset provided in the assignment consists of 2 datasheets:

1. The data in codes and number
Excel Legends


2. The legend to describe the data in layman terms.
Dataset

This section will elaborate on the dataset analysis and transformation process for the dataset in order to prepare the data for import and analysis on interactive visualization.

Firstly, I identified that the dataset contains three types of data - demographic, behavioral and feedback/opinions.


Under demographic data types, the dataset has surveyee's data on:

  1. StudyArea - Major area of study, research or teaching
  2. Position - Position in SMU (1 column)
  3. ID - International (non-exchange) student (1 column)


Under behavioral data types, the dataset has surveyee's data on:

  1. Campus - Library mostly used (1 column)
  2. HowOften- How frequently do you visit the library/campus/access library resources (3 columns)


Under feedback data types, the dataset has surveyee's data on:

  1. NPS - How likely surveyee is to recommend the library service to other students (1 column)
  2. Importance- A list of importance ratings for 26 library services (26 columns)
  3. Performance- A list of performance ratings for 26 library services (26 columns) and surveyee's satisfaction with the library (1 column)
  4. Comments - Suggestions for improvement or any other comments about the Library (1 column)
  5. NA - Are the services applicable to the surveyee? (1 column)


Transformation

1. After understanding the dataset, it is noticeable that the data attributes HowOften, Importance, Performance, and NA needs to be pivot as they have many columns and this would make data analysis for the charts tough at the later steps.

pivot Pivot process

Solution: In order to solve the issue of too many columns for the same data attribute I put the dataset with the codes into TableauPrep and pivot it three times. Firstly, I pivot the HowOften attribute. Continuously, I pivot Importance and Performance together and lastly, I pivot NA. Finally, I output it in a TDE file format. 


2. Despite preparing the data by using pivot, the data was still not ready for analysis as they were in code format which was hard for a user who is not familiar to understand.

Tableau2.jpg Tableau3.jpg Tableau4.jpg Tableau5.jpg Tableau6.jpg Tableau7.jpg Tableau 8.jpg Tableau 9.jpg

Solution: Since the dataset is in code format, to make it easier for the user who will be using the analytic dashboard, I had created calculated fields to help convert the data. I avoided using Alias although it gets the job done faster, the user will have to create their own alias every time they create a new Tableau file and it will be troublesome. By creating a calculated field, the code can be reused and the user will not be required to do alias multiple times. Overall, for the dataset, I had created calculated fields to rename the codes for the attributes: Study Area, Position, HowOften Responses, Yes No(NA) Question Title, Importance and Performance, Importance and Performance Question Categories, Importance and Performance Question Title, Response Label, and YN (NA) Question Categories.

Tableau1.jpg
Also, as we're only interested in certain positions that fall in the groups - undergraduate students, postgraduate students, faculty, and staff. I had created a calculated field just to retrieve our interested groups.


3. The survey question involves getting the net promoter score to gauge the loyalty of the library's relationships with its users but it is not possible to do the analysis without separating the results of the data attributes into the detractors, promoters, and neutrals and calculate the NPS score

Nps1.jpg Nps2.jpg Nps3.jpg Nps4.jpg

Solution: In order to be able to create a visualization for the data, it is necessary to create four calculated fields to calculate the percentage of detractors, promoters, and neutrals as well as the NPS score.


4. In the dataset, the survey questions under the data attribute for Importance and Performance are on a Likert scale format. To perform an analysis of the Likert scale data type, the survey results need to be presented in the divergent stacked bar chart format. However, the current dataset is not friendly in allowing users to create a divergent stacked bar chart.

Lks1.jpg Lks2.jpg Lks3.jpg Lks4.jpg

Solution: To be able to create a divergent stacked bar chart visualization for the Likert Scale data type, it is necessary to create four calculated fields. First, I created a calculated field to separate the number of positive and half of the number of neutral responses to build the right side of the chart. Second, I create another calculated field to separate the number of negative responses and half of the number of neutral responses to build the left side of the chart. Third, I created another calculated field to calculate the percentage of negative responses. Lastly, I created a calculated field to calculate the percentage of positive responses.

5. The dataset involved data on Performance(similar to satisfaction) and Importance, thus, I deemed that it is necessary to do a visualisation on gap analysis for the performance compared to the Importance.

Solution: 


Interactive Visualization

The interactive visualization can be accessed here:https://public.tableau.com/profile/tanny.lai#!/vizhome/TannyIS428IndividualAssignment/Story1?publish=yes

Technique Purpose Steps
Example Example Example
Example Example Example
Example Example Example

Analysis & Insights

Undergraduate Students

Postgraduate Students

Faculty

Staff