Difference between revisions of "IS428 AY2019-20T2 Assign PEH ANQI"
Line 21: | Line 21: | ||
The dataset contained of 88 columns. 7 of them provide basic information of the respondents, their position, Studyarea, frequency of them visiting the library, etc. 2 of them are on respondents satisfaction level and likelihood of recommending the library to others. 78 of them are likert scale questions where respondents have to rank different services provided by the library by importance of the services and how well the library perform in providing these services. Lastly, 1 of them is a free text containing additional comments from the respondents. | The dataset contained of 88 columns. 7 of them provide basic information of the respondents, their position, Studyarea, frequency of them visiting the library, etc. 2 of them are on respondents satisfaction level and likelihood of recommending the library to others. 78 of them are likert scale questions where respondents have to rank different services provided by the library by importance of the services and how well the library perform in providing these services. Lastly, 1 of them is a free text containing additional comments from the respondents. | ||
+ | ===Data cleaning in excel=== | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 44: | Line 45: | ||
|The column name for the likert scale questions on importance and performance is represented by a code. Eg: I01, P01, NA01, etc. This makes it difficult to know what each column represents. Hence, I used the "HLOOKUP" function in excel to get the column code with column name by referring to the legend worksheet. | |The column name for the likert scale questions on importance and performance is represented by a code. Eg: I01, P01, NA01, etc. This makes it difficult to know what each column represents. Hence, I used the "HLOOKUP" function in excel to get the column code with column name by referring to the legend worksheet. | ||
|valign="top"| | |valign="top"| | ||
− | |||
− | |||
|- | |- | ||
|} | |} |
Revision as of 15:38, 15 March 2020
Contents
Overview
Singapore Management University have two libraries, the Li Ka Shing Library and the Kwa Geok Choo Law Library. Every two years, the libraries would conduct a comprehensive survey in which faculty, students and staff have the opportunity to rate various aspects of SMU library's services. The survey provides SMU libraries with inputs to help enhance existing services and to anticipate emerging needs of SMU faculty, students and staff.
The past survey reports were mainly made-up of pages of tables, which are very difficult to comprehend. Hence, the task is to create interactive data visualisation to transform these tables into visual representation that allow SMU libraries to gain useful insights, and reveal the level of services provided by SMU libraries as perceived by the following stakeholders:
- Faculty
- Undergraduate students
- Postgraduate students
- Staff
Data Preparation
About the Data
The 2018 library survey data is used for this assignment. Two excel data was provided, Raw data 2018-03-07 SMU LCS data file - KLG.xlsx and the 2018-02-16 SMU Library Survey Comments MAC.xls. The 2018-02-16 SMU Library Survey Comments MAC.xls file consist of comments provided by each respondents which is also available in the Raw data 2018-03-07 SMU LCS data file - KLG.xlsx file. Hence, I will be using only the Raw data 2018-03-07 SMU LCS data file - KLG.xlsx file.
A total of 2639 responds was collected from the 2018 Survey. However 1 respondent, ID 833, only answered 4 questions out of a total of 87 and did not provide any information on the respondent position, campus and study area. Hence respondent 833's input will be omitted from the visualization.
The dataset contained of 88 columns. 7 of them provide basic information of the respondents, their position, Studyarea, frequency of them visiting the library, etc. 2 of them are on respondents satisfaction level and likelihood of recommending the library to others. 78 of them are likert scale questions where respondents have to rank different services provided by the library by importance of the services and how well the library perform in providing these services. Lastly, 1 of them is a free text containing additional comments from the respondents.
Data cleaning in excel
1. Remove respondent 833 |
Respondent with responseID 833 is removed as identification attributes are blank (Campus, Position, StudyArea), the respondent also only answered 4 out of the 88 questions. Hence I decided to omitted this respondent from the visualization. | |
2. Replace column Campus, Position, StudyArea and ID's numerical value |
The values in the excel file are represented by numerical values, making it difficult to interpret the data without referring to the legend. Hence, I used the “LOOKUP” function in excel to get the values for Campus, Position, StudyArea and ID field by referring to the legend worksheet. | |
3. Replace column name for likert scale questions |
The column name for the likert scale questions on importance and performance is represented by a code. Eg: I01, P01, NA01, etc. This makes it difficult to know what each column represents. Hence, I used the "HLOOKUP" function in excel to get the column code with column name by referring to the legend worksheet. | |
4. Pre-processing Comments |
The column name for the likert scale questions on importance and performance is represented by a code. Eg: I01, P01, NA01, etc. This makes it difficult to know what each column represents. Hence, I used the "HLOOKUP" function in excel to get the column code with column name by referring to the legend worksheet. |