IS428 AY2019-20T2 Assign CHOY YU MIN JUSTIN

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Justin choy title.png

Background and Motivation

SMU Libraries strives to achieve the following Mission and vision

Mission:

  • To enable a culture of life-long learning through collaboration, engagement and outreach. It aims to provide seamless access to information using innovative and leading edge technology. The Library is committed to delivering exceptional services and building dynamic relationships within the SMU community and beyond.

Vision:

  • To be a leading research library providing ubiquitous access to information using innovative strategies to drive intellectual exchange and the creation of knowledge.

As such, they have built their library service offerings that cover 4 main areas: Communication, Facilities and Equipment, Information & Resources, and Service delivery. These offerings cater to the Library's 4 main stakeholders - Faculty, Graduates, Staff, and Undergraduates (The biggest stakeholder). We can also understand that these stakeholders mainly fall under SMU's schools (Law, Information Systems, Social Sciences, Economics, Business(Largest number), Accountancy, and Others.

To ensure that SMU's offerings are constantly relevant and meet the demands of their stakeholders, SMU Libraries conducts a 2 yearly survey to gather important KPIs to better improve their services. However, this data is large in dimentionality and is largely opinionated in nature. This makes it difficult to draw truthful insights and careful analytics is required to clean, process, and visualise the data for accurate insights. Given the richness of the 2018 Library survey, it is all the more exciting to carefully unpack the data and visualise it to help SMU Libraries draw key insights to better improve their offerings.

Objectives

The objective of this project is to build a good and truthful data visualisation to help gather key insights from the libraries four main stakeholders:

  • The undergraduate students,
  • The postgraduate students,
  • The faculty,
  • The staff.


Success of the project would be seen as the following: (Users are able to...)

  • Able to acquire clear insight on how the library is doing overall.
  • Able to acquire clear insight on how satisfied each stakeholder group is with each offering.
  • Able to identify specific areas that contribute to dissatisfaction/satisfaction through survey questions.
  • Numbers backed up by relevant comments that show more objective feedback for actionable improvement.

Dataset Analysis

Raw Data Set

The Data provided was a set of 3 files:

  1. Raw data 2018-03-07 SMU LCS data file - KLG.xlsx
  2. 2018-02-16 SMU Library Survey Comments MAC.xls
  3. SMULibraries_BeHeardSurvey_FullReport.pdf


Of which, I only utilized "Raw data 2018-03-07 SMU LCS data file - KLG.xlsx" (I will refer to as "RAW DATA" from hereon), as it contained all the necessary fields and data points needed for the analysis. Nonetheless, as the first step of the project, I reviewed all the provided documents, and read through the BeHeardSurvey Full Report to get a baseline understanding of the fields available and used to evaluate the Library's performance. From here, I got a clear picture of the fields that were available that I needed to pre-process for my Data Visualisation. I also worked through the report to gain some initial ideas and things they did well and did not do well. The RAW DATA contains two sheets: Encoded Raw data, and Legend (mapping for encoding); as these are confusing, I will give a high level summary of the data fields available (and relevant) and their rough description below

Below is a breakdown of the key fields available in the RAW DATA:

No. Field Description
1. Campus This data is an encoding that represents which library the respondent most frequently used. options were 1:Li Ka Shing Library, 2:Kwa Geok Choo Law Library
2. Position This data is an encoding that represents the respondent's stakeholder group in high detail (e.g. year1-4 undergrad, Graduate: doctoral... etc)
3. Study Area This data is an encoding specific to graduates and undergraduates and it represents their field of study e.g. Accounting/Law.
4. Frequency of Visit This data comes from the survey question "How frequently do you visit the library" and it is key for us to get an idea of the "loyalty" of stakeholders.
5. Likelihood of recommendation This follows the market practice of collecting data to calculate Net promoter score. This field with answers from 1-10 needs to be pivoted and pre-processed.
6. Communication Offering Survey Questions (Importance & Performance) This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Communication offering.
7. Facilities and Equipment Offering Survey Questions (Importance & Performance) This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Facilities and Equipment offering.
8. Information & Resources Offering Survey Questions (Importance & Performance) This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Information & Resources offering.
9. Service Delivery Offering Survey Questions (Importance & Performance) This data comes in the form of Likert survey from 1(Low Importance)-7(High Importance). It needs to be pivoted and pre-processed to analyse the library's performance in the Service Delivery offering.
10. Comments This data comes in the format of free text, and should be made accessible in the Data visualization when relevant to help users go deeper than just a numeric opinion evaluation.

Dataset Transformation

As the data generally existed in two sheets (1.Encoded raw data, 2. Legend Mapping), this served to be a big challenge that required a mix of pivot tables and creating calculated fields in order to join the data into one clean data source. Similarly, in the data cleaning process, I came to realize that there were some Visualizations that I wanted that would need me to carefully perform inner joins to ensure that the relevant dependencies existed for my analysis later. Lastly, as many questions were based on a Likert Scale, it was all opinionated and thus it would not be accurate to stick to the current Library mapping (which was by ordinal ranking). As such, I needed to re-map all the Likert answers accordingly into categorical fields. These steps are outlined below with their detailed problems and solutions specified accordingly.

Handling Null Data

Explain handling null

Handling Likert (ordinal to categorical)

explain ...

Interactive Visualization

The interactive visualization can be accessed here: https://public.tableau.com/profile/justin.choy#!/vizhome/JustinChoySMULibrarySurveyAnalysis/SMULibrary2018Storyboard

Analysis & Insights

Undergraduate Students

Postgraduate Students

Faculty

Staff

Conclusion & Future Work