Arisaig Final Progress

From Analytics Practicum
Revision as of 16:47, 20 April 2015 by Kr.tan.2011 (talk | contribs)
Jump to navigation Jump to search
Appannalogo.png Home Project Proposal Project Management Project Progress Project Final Progress Final Deliverable


ANALYSIS APPROACH & CHALLENGES FACED

Data Understanding & Representation
As mentioned of the nature of the project, it involves the use of multi-dimensional data, the following are the main categories that the project uses:

  • Spatial (geographical) data
  • Temporal (time-series) data

This creates challenges in terms of understanding how the various data interact and work together as well as the representations of the findings of the patterns and relationships within the data. In the following parts, the main challenges addressed in this thesis are:

  1. Visual Representation of Spatial Data
  2. Visual Representation of Temporal Data
  3. Visual Representation of High Dimensional Data
  4. Interactivity of Multi-Linked Views
  5. Context Provision

Visual Representation of Spatial Data

The spatial data provides the opportunity to provide representation of how geographical proximities could have inter-dependency that would give the investors better insights for decision-making. This opportunity creates the problem of how to represent information such that it is still comprehensive to budding analysts/investors. In order to cater to this need, the project implemented Choropleth for the visualization of spatial data.

Choropleth provides an overall map view to see the performance of the various countries and regions with respect to a certain variable at question. The limitation is that Choropleth which showcases the countries rankings through the use of color intensity is only able to showcase one variable at a time (univariate). In order to improve the functional use of Choropleth in representation of the such data, it could be done through the integration of zoomable user interface (ZUI). It is developed to allow user to zoom in on selected region for a focused view of the visualisation. In addition, the interactivity of Choropleth would be better with the graph being responsive to user clicks and selections on the map, allowing user to view information of only “what that is of interest”. All in all, these designs would help supplement the analytical value of the Choropleth map.

Visual Representation of Temporal Data

The characteristics of Temporal data is the time period/time intervals values. The main use of such data is to treat time as another numerical variable and the representation of time is then performed using a linear time axis. With respect to this project, the exploration of time is incorporated through the use of a time slider to represent the various time periods used in the dataset. As such, instead of following the typical approach of representing time via assignment of a specific axis, the project approach the representation of temporal data through the use of animation. What this means is that an additional variable could be added to the axes for more insightful analysis. The gist of this approach is that graphical representations at any one time are representations of a specific year only and the scrolling of the time slider would cause the visualisations to adapt, showcasing the new time’s data.

Visual Representation of High Dimensional Data

Parallel Coordinates

The data characteristics after the collection phase shows that there is a continuous format to the data, which means that the data have the opportunity to be made for comparison and sorting. However, the representation of the high dimensional data is not easy and could be overwhelming to average layman users. This is a consideration that the project has to bear in mind. In order to showcase such data, the project employed the use of Parallel Coordinates plot to view multiple dimensions at once.

The benefits of parallel coordinates plot is that one can see the overall patterns across various selected attributes (e.g. high birth rate, high life expectancy and low population density). In addition, it also fits the bill of allowing the view of how a country “stands” in comparison to other countries, in regional and worldwide. However, since parallel coordinates show everything to the users, it may be perceive as a “messy chunk of lines” to the untrained users. As such, there possesses a difficulty of making sure that there are sufficient guides and walkthroughs to train the users to see the patterns the graph is capable of offering. In addition to provision of training, the project has also incorporated the use of interactivity to make data exploration easier. Some ideas adopted are brushing to show/hide, highlighting of selected country and also shifting of axes to the comfort of the user.

The parallel coordinates do have other limitation as well, and one such example that could affect analytical exploration is that parallel coordinates plot is unable to tell the user the spread and distribution of the countries for the axes. In order to overcome this problem, the project introduced two enhancement that could be overlaid onto the parallel coordinates plot: box plot and histogram. The box plot provides statistical insights such as the median and interquartile range, while the histogram allows the users to see the frequency distribution of each varying axes. The overlays are also responsive to events hence could be a useful tool for the users on their exploration journey.



TECHNICAL CHALLENGES

Here is the report