Difference between revisions of "IS428 AY2019-20T2 Assign LIM SI LING"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 41: Line 41:
 
| Legend
 
| Legend
 
| The legend that contains the exact survey questions and the question code that they represent.
 
| The legend that contains the exact survey questions and the question code that they represent.
 +
|}
 +
=== Data Preparation ===
 +
The data collected from the survey had many missing values apart from those who have picked N/A for the question. Therefore, data with key areas missing are to be excluded from the data visualisation since they were no longer useful in providing insights.
 +
<br>To remove missing data, codes were written in python to determine which rows to remove.
 +
 +
{| class="wikitable" width="100%"
 +
|-
 +
! style="font-weight: bold;background: #57546E;color:#fbfcfd;width: 50%" | Screenshot
 +
! style="font-weight: bold;background: #57546E;color:#fbfcfd;" | Action taken and description
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:Import python.png|700px|center]]
 +
| The excel workbook "Raw data 2018-03-07 SMU LCS data file - KLG.xlsx" is imported into jupyter notebook.
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:2.png|700px|center]]
 +
| To make it easier to deal with the data, cells that are empty are filled with -1 as their values
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:6.png|700px|center]] <br>[[File:7.png|700px|center]]
 +
| Next, the criteria for which data to be removed is determined as follows:
 +
# Identifier questions like Campus, Position, StudyArea, HowOftenL, HowOftenC, HowOftenW and ID must not be empty at all costs, otherwise data is to be removed
 +
# If N/A is selected, then corresponding importance and performance must be 0, otherwise data is to be removed
 +
<br>The code will create a new column named" To_remove" indicating if the row of data is to be deleted. The column is added to the DataFrame and exported back to excel format and renamed "cleaned_survey_data.xlsx".
 +
<br>121 rows of data will be deleted.
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:8.png|300px|center]]
 +
| Legend is copied over to the newly exported dataset.
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:9.png|500px|center]]
 +
| The new excel workbook is then imported into Tableau.
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:5.png|700px|center]]
 +
| However, the dataset cannot be used directly, as the question codes are not linked to the actual question itself. Pivoting of the data set is required.
 +
<br>The survey question columns are highlighted (from I01 to NPS1) and pivoted.
 +
<br>'Pivot Field Names' is renamed to 'Q_code' and 'Pivot Field Values' to 'Rating'.
 +
|-
 +
| style="padding-top: 10px; padding-bottom: 10px;" | [[File:10.png|700px|center]]
 +
| Lastly, the survey questions have to be mapped to the question code.
 +
<br>This was done by dragging the Legend Sheet right next to SMU and join them by left outer join, keeping all the data rows in SMU. The two sheets are joint by their question code (Q_code in SMU and Code in Legend).
 
|}
 
|}

Revision as of 00:03, 15 March 2020

Overview

Every two years, SMU Libraries conduct a comprehensive survey in which faculty, students and staff can rate various aspects of SMU library's services. The survey results are an important feedback for SMU libraries to understand the quality of their services and facilities and identify key areas that needs improvement to better meet the needs of their users.

Objectives

I am tasked by SMU libraries to use visual analytics approach to reveal the level of services provided by them as perceived by:

  • the undergraduate students
  • the postgraduate students
  • the faculty
  • the staff

There was previously a report (Library Survey Report) generated. However, it consists of many separated graphs and charts which is too disjointed to be read properly. Additionally, the report incorrectly uses Mean as a measure. The survey questions are graded based on a 7 point Likert scale. The mean in a Likert scale can't be found because it is not possible to find an average of “agree”, “disagree”, and “neutral. Therefore, the goal is to produce a visualisation of the survey results by reducing the number of charts and providing dashboards to highlight performance of SMU libraries in terms of overall satisfaction, facilities, resources and services provided.

Survey Data

Description of Data

The survey data provided is a single excel workbook containing two sheets:

Sheetname Description
SMU The raw survey results containing survey respondents and each of their reply to the survey questions in the form of ratings.

The raw survey results have a total of 2639 rows or number of participants. There are a total of 89 column. However, there are missing data as well that needed to be removed. The attributes of the data can be divided into the following:

  1. Information on participants:
    • ResponseID (unique ID representing survey respondents)
    • Campus (which library do respondents use more)
    • Position (Type of user)
    • ID (international Student or not)
    • HowOftenL (frequency of visit to library)
    • HowOftenC (frequency of visit to Campus)
    • HowOftenW (frequency of accessing library resources)
  2. Survey Questions on quality of library facilities and services:
    • Divided into 3 types (Importance, Performance and N/A)
    • Respondents who picked 'N/A' for the question need not select a rating for both importance and performance.
    • There are 26 questions for importance and N/A and 27 questions for performance, represented by 78 different columns
  3. Free response questions:
    • Comment1 (suggestions for improvement)
Legend The legend that contains the exact survey questions and the question code that they represent.

Data Preparation

The data collected from the survey had many missing values apart from those who have picked N/A for the question. Therefore, data with key areas missing are to be excluded from the data visualisation since they were no longer useful in providing insights.
To remove missing data, codes were written in python to determine which rows to remove.

Screenshot Action taken and description
Import python.png
The excel workbook "Raw data 2018-03-07 SMU LCS data file - KLG.xlsx" is imported into jupyter notebook.
2.png
To make it easier to deal with the data, cells that are empty are filled with -1 as their values
6.png

7.png
Next, the criteria for which data to be removed is determined as follows:
  1. Identifier questions like Campus, Position, StudyArea, HowOftenL, HowOftenC, HowOftenW and ID must not be empty at all costs, otherwise data is to be removed
  2. If N/A is selected, then corresponding importance and performance must be 0, otherwise data is to be removed


The code will create a new column named" To_remove" indicating if the row of data is to be deleted. The column is added to the DataFrame and exported back to excel format and renamed "cleaned_survey_data.xlsx".
121 rows of data will be deleted.

8.png
Legend is copied over to the newly exported dataset.
9.png
The new excel workbook is then imported into Tableau.
5.png
However, the dataset cannot be used directly, as the question codes are not linked to the actual question itself. Pivoting of the data set is required.


The survey question columns are highlighted (from I01 to NPS1) and pivoted.
'Pivot Field Names' is renamed to 'Q_code' and 'Pivot Field Values' to 'Rating'.

10.png
Lastly, the survey questions have to be mapped to the question code.


This was done by dragging the Legend Sheet right next to SMU and join them by left outer join, keeping all the data rows in SMU. The two sheets are joint by their question code (Q_code in SMU and Code in Legend).