ISSS608 2016 17 T1 Project Team 3 Report
|  |  |  |  | 
Contents
Motivation
University ranking is always one consideration for to-be applicants. There is a certain prestige in having a better ranking, and universities generally work towards having better standing. According to Kaggle, "Ranking universities is a difficult, political, and controversial practice."
Through this visual analytics project, we look to analyse the world distribution of these universities, its attributes, rankings over time, and seek to draw insights from them.
This would also help parents, to-be applicants in their research for making informed decision in selecting their dream universities to pursue their higher education. 3 different sets of available university rankings are used in this visualizations. This would serve as an one stop-shop for them, instead of them scurrying about different websites and resources.
Review and Critic on Past Works
In the paper by Gratzl, Lex, Gehlenborg, Pfister and Streit, an open source application called LineUp was utilized as an interactive technique for multiple attributes analysis. In this tool users are able to create the rankings of the universities based on their own assessment of the weightage of each attribute to the overall ranking. While this is an interesting way,
Design Framework
Overview
For this project, we focused on the visualization of the rankings and attributes of three data sources, using a variety of visualization tools namely, Tableau, QlikSense, and d3.js. Individual data source exploration is done through Tableau, and QlikSense to gain insights into each dataset. Tableau is utilized in the creation of topographical visualizations and dashboards. QlikSense also offers dynamic visualizations and support for D3.js extensions.
D3.js offers dynamicity and is an interactive platform for easy access of the visual representations in browser format. The focus is on ensuring all interactive features are available across 3 data sets in D3.js, for ease of comparisons.
Data Source & Data Preparation
These three datasets provide similar, yet individualistic insight in the overall score of each university and its world rank, with multi-year records. Other time-variant attributes of the university are also available.
The Shanghai data does not have the country information available, and hence, time is spent in ensuring that the country information is available. The regions information is also populated for all three datasets as they were not available in the raw dataset.
Data cleaning is conducted to ensure consistency within the data set, especially in the changing names of universities. Ranking is changed to numeric, and recoded where necessary. The CWUR, Shanghai and Times data are prepared in the CSV format, to be read in and parsed in d3.js.
Data from 3 datasets are combined (by university) where needed, for the needful visualization, for comparison.
The diagram on the left also depicts the data dictionaries of the 3 datasets.
Their attributes show us the parameters measured, and the respective ranking the universities across the years.
The Table on the left put together the similar indicators from the three ranking systems under each of the criteria (measures). This provides a common baseline when making comparison between these measures from the three data sources.
Design Principles Used
World university rankings is the listing of higher education institutions ordered using a combination of indicators. Rankings in the form of numbers and scores are rather hard to comprehend. There is a need to represent data in a more understandable visual format. In this project, we have adopted Shneiderman’s famous mantra as our design guidelines for the visual data analysis. The Mantra, “Overview first, zoom and filter, then details-on demand”, describes how data should be presented so that it is most effective for users.
Overview First
Overview provides a general context for understanding the dataset; it paints a “picture” of the whole data entity that the information visualization represents. It’s the first thing a viewer sees in the dashboard, and guides him / her to other parts of the visualisation for further exploration.
For example, choropleth map is used as a way to visualize the number of universities over a geographical area. This can show variations across the displayed location at an overview glance. 
Zoom and filter
Zooming and filtering both involve reducing the complexity of the data representation by removing extraneous information from view.  This involves zooming and filtering the data using the interactive features: filtering, highlighting, drill-down, range selector, etc. For example, zooming may be drilling down from all global to country-specific ranking data while filtering may be excluding information in a specific year range.
Details-on-demand
The details-on-demand can be useful for relating the detailed information by a simple action, such as a mouse-over for a tool-tip details. It should allow user to go all the way to dig to the minute details.
Tools Used & Data Visualisation Elements
Tableau, QlikSense and D3.js are used for visualization, while JMP is used during data preparation.
Tableau Visualization
Choropleth Map
Consistently across 3 datasets, the use of Choropleth Map allows one to look easily where the universities are. Filtering by year, region, country and rank is also possible. This is the commonality across visualizations of the 3 datasets, and a powerful visualization.
From the CWUR system, it shows that a majority of top 100 universities globally are located in United States.
CWUR Ranking System
From the CWUR data source, there are 8 key indicators that affecting the ranking system. The indicators are namely Quality of Education, Alumni Employment, Quality of Faculty, Publications, Influence, Citations, Broad Impact and Patents. It is important to understand the relationship between these heterogeneous attributes that will affect the rankings of universities.
Thus the choice of visualisation objects are important for the visualization of multi-attribute (multivariate) rankings analysis.In this section, three dashboards will be presented for discussions on why certain Visualisation Objects were selected. In addition, a Story Board was created in the Tableau Visualisation to enhance the user interaction experience. The story board contains a sequence of dashboards that linked together to convey information, provide context, demonstrate how one visual analysis relate to another.
Ranking Overview Dashboard
Heat Map
Heatmap is generically used for visual representation that uses variations in color / size to encode a quantitative variable. The heat map on the left hand side is used to reveal the number of universities by ranking category, for each country. The count of universities was encoded as color as well as size in the Heat Map matrix.   At this level, the purpose to use Heat Map is to provide a rough approximation of magnitude in terms of counts of university in each country. As seen in the Heat Map, the top ranked 20 Universities for year 2015 are located in USA and UK. Even though China has many Universities in the ranking list, however its median in the world rank is at about 750, with its best ranking is at 56 (lower whisker in boxplot)
The Table view on the right hand side gives the details-on-demand view when the user zoom in to a selected country, or a selected rank category in the heat map.
Differences Ranking in Certain Countries
Box and Whisker Plot
This statistical graph is used to show the distribution of rankings among all universities in each country. In a box and whisker plot: the ends of the box are the upper and lower quartiles, so the box spans the interquartile range. The median is marked by a vertical line inside the box. Each dot in the boxplot indicates one university within the country. For example, the boxplot reveals that USA has many universities with its median ranking at 300. Singapore has two universities, with NTU at its upper whisker ranking at #135, and NUS at its lower whisker of 65.
Relationship between Ranking vs Indicators
Scatter Plot for Indicators Analysis
A scatter plot displays the correlation between a pair of variables. Given a set of 8 indicators (variables), there are 8 pairs of indicators vs world rank for the universities. These scatter plots were organized into a matrix, making it easy to look at all pairwise correlations. The lower the ranking number means a better score in that indicator. Some interesting observations can be seen in this dashboard:
- Among all indicators, Broad Impact shows a linear relationship between Broad Impact indicator vs World Rank
- Among all indicators, Quality of Education and Quality of Faculty scored fairly well as compared to others. The highest rank score is less than 250 for Quality of Faculty, whereas that for Quality of Education is less than 400.
- Alumni Employment, Citations, Influence, Patents have scattered patterns in their rankings relationship vs the World Rank
 
SHANGHAI ARWU
The strength of the SHANGHAI ARWU dataset is in its 11 years record, which holds the most number of years of records among the 3 datasets. However, there have been some missing data in some of the attributes and it does not make any sense to make any value replacement as this would further distort the view of ranking. The missing data tend to come from universities who are ranked lower. The results are taken as is, with the knowledge there could be missing data of their attributes which could not be all representative of the university's standing. Nonetheless, we tap on the strength and seek to gain insight from its time-series information.
Overview
As with the approach of "Overview First", choropleth map, bar chart, circle view, side by side circle view are used in providing a quick overview of the spread of universities and the corresponding rankings.



 
Line Graph
Selected top overseas locations are selected for this visualizations, namely, US, UK, AU and JP. They are popular overseas countries where Singapore residents go for further studies. Through this visualization, one would be able to see how these countries rank, and what some of the top universities are. Filtering is important here, as one wants to see the key top ranking universities. 
Line Graph
One would also be keen to understand how universities in Singapore rank. And it is here, we see NUS always going strong, while NTU has made tremendous stride forward.
Scatter Plot
This is a scatterplot comparison of the attributes with Rank. Here, we look at the R-square of factors with rank, comparing Top 25 and Top 50 universities. The R-square of factors changes with rank. Here, one observes that Awards is highly correlated with ranking in Top 25, while HICI is highly correlated with ranking in Top 50.
Scatter Plot
This is a scatterplot comparison of the attributes with Rank. Here, we look at the R-square of factors with rank, comparing Top 50 and Top 100 universities.The R-square of factors changes with rank. Here, one observes for both Top 50, and Top 100 ranking, HICI is the highest correlated factor. 
Scatter Plot
This is a scatterplot comparison among the attributes, including Rank.  Here, we look at  Alumni, Award and HICI. One observes a stronger positive trending for HICI, as compared to Alumni, and Award.
Scatter Plot
This is a scatterplot comparison among the attributes, including Rank. Here, we look at NS, PCP, and PUB.
TIMES Ranking System
Introduction of the 4 dashboards.
Dashboard 1 - Heatmap of World Universities
Heat Map
This Dashboard allows users to efficiently identify the distributions of the world universities by regions, countries, ranks and year. The four heatmaps in this dashboard depicts the distribution of world universities by the number of universities, the average international students percentage, average student to staff ratio and the average female students percentage in each country. The greater the colour intensity, the greater the value in each heatmap. Users could also filter the dashboard by the three regions, America (AMEA), Europe & Middle East (EMEA) and Asia & Pacific (APAC) and zoom into the distributions by regions. 
If the users are interested in reading the distributions by individual country, they could key the name into the country filter to look at the distribution by the country/countries of their interest. Additionally users could set the distributions in a certain world rank range (such as top 100) using the filter on the right.
Dashboard 2 - Scoring Criteria Analysis
Correlation Plot
This Dashboard allows users to review the correlation of the criteria used in the scoring of the total score for the world ranks in the Times Higher Education Study. Users could toggle between regions, year and world ranks. The Y-axis of the correlation plot is fixed at Total Score and users could alternate between different criteria by using the filter "Selected" to change the criteria.
While users could also review the average value of the total score and each individual scoring criteria by countries and regions.
Qlik Sense Visualization
Ranking Summary for Top 100 Universities(CWUR)
The use of Area Chart, Bar Chart and Table allows one to compare the World Rank based on selection by Year, Country.
As seen in the Bar Chart, there are some variations in ranking, when comparing 2014 vs 2015.
Indicators Analysis (for Top 100 Rank)
The use of Radar Chart, Scatter Plot and Chord Diagram allows one to analyse and comparing the multi-attribute rankings for different Universities.
Zooming into the selected University in Paris, its ranking is less than 50, however it seems it did not fare well in two of its measure indicators (patent and citations), with relatively high ranking compared to other indicators.
D3.js Interactive Visualization
- Tab: Ranking Overview (Stacked Graph)
1)Which are the top universities?
The Stacked Scatter allows users to have an overview first at the Top 50 universities as listed in the three sources.
- Tab: Indicator Analysis (Scatter Plot)
2)What are the parameters used and the correlation amongst them? 
The scatter plot is chosen to show the correlation between the given pair of variables based on user’s selection on the indicator in x-axis and y-axis. It provides users an appreciation of the parameters used and the correlation among each other and to the world rank.
For example, as seen in the diagram, the scatterplot shows the pairwise ranking relationship between Quality of Education vs. Quality of Faculty in year 2015 (based on CWUR data source). The color code encoded for the bubbles indicate the grouping of universities by Ranking Category. For universities that were ranked within the top 20 (yellow bubbles), the variation in the two indicators ranking tend to be much smaller. As the world ranking increases, the variations in pairwise relationship widens.
- Tab: Comparison Analysis (Time Series Line Graph)
3) How do they compare?
Users could use this tool to do a quick comparison between the universities they are interested in. Users could choose 1 between the 3 data sets, and compare 2 universities at any one time. There are 6 (mini) line graphs in this tab, where comparisons of 6 attributes (including rank) with time (years) are displayed.
- Tab: Parallel (Parallel Coordinates Plot)
4) What are the strengths and weaknesses?
The parallel coordinates plot is used to analyzing multivariate data. The plot has many vertical axes since the dataset has different indicators. This interactive visualisation technique gives the scores ordered based on multiple heterogeneous attributes. 
Once users have shortlisted their universities of interest, they could use this parallel plot to review its strengths and weaknesses. It enables users to interactively brush over a particular university of interest and can explore the scores for each attribute in a single view. This gives an insight as to which attributes (e.g. research, teaching, citations etc.) are the strengths and weakness for a university. 
- Tab: Tab: Tree (Tree Plot)
5) Summary
Users could use this collapsible tree to look at a summary of the attributes of the universities by the region, country and by year.
Use Cases
The ranking of different universities the world over has always been a topic of great interest. Rankings are used by students to find the best university for their studies, by faculty to choose the right university to teach, and, by university management in their decision making.
There are 2 possible use cases which are the following:-
Use Case #1 - University Management's Perspective
Ranking methodologies can never be a perfect representation of the quality of an institution. It is only meant to be used as a guideline for students, faculty and the university administration. At the same time, they provide valuable insight into where a student is placed among his/her competitors around the world, which universities provide the best career prospects for teaching staff, and in which fields a university is lacking. 
The essence of these rankings lies in the aggregation of data provided by a large number of universities all around the world. Thus, there is a need to quickly grasp the required information from the data without having to perform tedious calculations. Graphical representation of the information will thus be ideal to quickly assess a university in different fields of study.
Use Case #2 - Prospective Student's Perspective
The D3.js platform will allow both parties to review the standing of the university through the years, and the areas of strengths and weakness, in relative to other competing universities. In particular, for the prospective student, which are deciding between 2 universities, the comparison tab will offer comparison of ranking, and the relative strength/weaknesses in each area.
Discussion
Discussion What has the audience learned from your work? What new insights or practices has your system enabled? A full blown user study is not expected, but informal observations of use that help evaluate your system are encouraged.
<< Will be added after Poster Presentation with inputs given by the visitors >>
Future Work
Due to time constraints and technical limitations, there were ideas that could not be fulfilled and which can be included in future enhancements.
- Inclusion of distribution of university on the choropleth map in D3 in ranking overview
- Inclusion of choice in choosing the country and university in D3 parallel coordinates plot and scatter plot for granularity view.
- Inclusion of a highlight feature for the selected node/university as its attribute moves through time in scatter plot in D3 Indicator Analysis
- Inclusion of a zoom-in feature of the choropleth map to allow users to zoom in to the university location. In this case, the university’s latitude and longitude points need to be populated to the dataset.
- Inclusion of a ternary chart for further insight discovery through simultaneous use of three variables.
Guides
Installation Guide
To run and use the application, please follow the following steps.
(1) D3.js package
- Download the D3 compressed package from the link provided below.
- D3.js Installation [[1]]
 
- After unzipping the package, launch the ‘index.html’ file.
(Note: Mozilla Firefox is the preferred browser).
(2) Tableau & QlikSense package
- Tableau & Qlik Sense Installation [[2]]
 
User Guide
After the ‘index.html’ file is launched using Mozilla Firefox, 5 tabs will present itself. One can toggle each tab to navigate through different visualizatons.






















