Difference between revisions of "Arisaig Final Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 30: Line 30:
 
Choropleth provides an overall map view  to see the performance of the various countries and regions with respect to a certain variable at question. The limitation is that Choropleth which showcases the countries rankings through the use of color intensity is only able to showcase one variable at a time (univariate). In order to improve the functional use of Choropleth in representation of the such data, it could be done through the integration of zoomable user interface (ZUI). It is developed to allow user to zoom in on selected region for a focused view of the visualisation. In addition, the interactivity of  Choropleth would be better with the graph being responsive to user clicks and selections on the map, allowing user to view information of only “what that is of interest”. All in all, these designs would help supplement the analytical value of the Choropleth map.<br>
 
Choropleth provides an overall map view  to see the performance of the various countries and regions with respect to a certain variable at question. The limitation is that Choropleth which showcases the countries rankings through the use of color intensity is only able to showcase one variable at a time (univariate). In order to improve the functional use of Choropleth in representation of the such data, it could be done through the integration of zoomable user interface (ZUI). It is developed to allow user to zoom in on selected region for a focused view of the visualisation. In addition, the interactivity of  Choropleth would be better with the graph being responsive to user clicks and selections on the map, allowing user to view information of only “what that is of interest”. All in all, these designs would help supplement the analytical value of the Choropleth map.<br>
  
[[File:Arisaig Asia zoom.png|500px|center]]<br><br>
+
[[File:Arisaig Asia zoom.png|400px|center]]<br><br>
  
 
'''Visual Representation of Temporal Data'''<br>
 
'''Visual Representation of Temporal Data'''<br>
Line 46: Line 46:
 
The parallel coordinates do have other limitation as well, and one such example that could affect analytical exploration is that parallel coordinates plot is unable to tell the user the spread and distribution of the countries for the axes. In order to overcome this problem, the project introduced two enhancement that could be overlaid onto the parallel coordinates plot: box plot and histogram. The box plot provides statistical insights such as the median and interquartile range, while the histogram allows the users to see the frequency distribution of each varying axes. The overlays are also responsive to events hence could be a useful tool for the users on their exploration journey. <br>
 
The parallel coordinates do have other limitation as well, and one such example that could affect analytical exploration is that parallel coordinates plot is unable to tell the user the spread and distribution of the countries for the axes. In order to overcome this problem, the project introduced two enhancement that could be overlaid onto the parallel coordinates plot: box plot and histogram. The box plot provides statistical insights such as the median and interquartile range, while the histogram allows the users to see the frequency distribution of each varying axes. The overlays are also responsive to events hence could be a useful tool for the users on their exploration journey. <br>
  
[[File:Arisaig Boxplotandhistogram.png|500px|center]]<br>
+
[[File:Arisaig Boxplotandhistogram.png|400px|center]]<br>
  
 
Nevertheless, these graphs do have their limitations. For example, Histogram is created for each axis, but the range of the histogram vary for each axis. Hence a comparison cannot be made across dimensions directly. As such, it is necessary to do standardization of all the axes to provide an equal and fair comparison. In addition, throughout the team’s data exploration process, the data shows signs of skewed distributions. which could be misleading and a poor interpretation of the actual situation. Therefore, the enhancement of providing transformations techniques (such as logarithmic, square and square root) are introduced. These enhancements are optional which caters to the analytical level of the users. <br>
 
Nevertheless, these graphs do have their limitations. For example, Histogram is created for each axis, but the range of the histogram vary for each axis. Hence a comparison cannot be made across dimensions directly. As such, it is necessary to do standardization of all the axes to provide an equal and fair comparison. In addition, throughout the team’s data exploration process, the data shows signs of skewed distributions. which could be misleading and a poor interpretation of the actual situation. Therefore, the enhancement of providing transformations techniques (such as logarithmic, square and square root) are introduced. These enhancements are optional which caters to the analytical level of the users. <br>
Line 54: Line 54:
 
Another visualisation the project adopted for the display of multi-dimensional data is the scatter plot. The purpose of the scatter plot is to reach for a better balanced between the provision of overall picture and in details representation. The characteristics of the scatter plot allows the users to make a simple comparison of the countries with the axes based on their positioning on the plot. The use of temporal data alongside the graph as animation can help to highlight patterns of periodic behaviour of the data between non-time related dimensions. Admittedly, the animation may not be able to tell the overall trend as easily as the use of time-series graph. Hence, the project incorporated time series related data as an overlay for each country upon click. The purpose is to inform the user of the trajectory of the growth of each country over time, while still maintaining the two axes variables.<br>
 
Another visualisation the project adopted for the display of multi-dimensional data is the scatter plot. The purpose of the scatter plot is to reach for a better balanced between the provision of overall picture and in details representation. The characteristics of the scatter plot allows the users to make a simple comparison of the countries with the axes based on their positioning on the plot. The use of temporal data alongside the graph as animation can help to highlight patterns of periodic behaviour of the data between non-time related dimensions. Admittedly, the animation may not be able to tell the overall trend as easily as the use of time-series graph. Hence, the project incorporated time series related data as an overlay for each country upon click. The purpose is to inform the user of the trajectory of the growth of each country over time, while still maintaining the two axes variables.<br>
  
[[File:Arisaig Scatter selected.png|500px|center]]<br><br>
+
[[File:Arisaig Scatter selected.png|600px|center]]<br><br>
  
 
Through exploration of the data, linear scaling of the axes proves to be a challenge and a poor visualisation tool for the scatter plot. This is due to the data being wildly skewed to the lower scale with huge outliers (e.g. USA in terms of Total GDP). As such, hardly any form of analytical insights could be derived from the use of linear scale. Nevertheless the use of linear scale is still necessary as it informs of the raw situations which could be used to inform of major players in the dataset. However, what this showcases is that there is the need to provide additional scaling options that could spread out the data points to improve both visibility as well as usability. In this case, the project adopted the use of logarithmic scaling that places greater emphasis on the lower scales. <br>
 
Through exploration of the data, linear scaling of the axes proves to be a challenge and a poor visualisation tool for the scatter plot. This is due to the data being wildly skewed to the lower scale with huge outliers (e.g. USA in terms of Total GDP). As such, hardly any form of analytical insights could be derived from the use of linear scale. Nevertheless the use of linear scale is still necessary as it informs of the raw situations which could be used to inform of major players in the dataset. However, what this showcases is that there is the need to provide additional scaling options that could spread out the data points to improve both visibility as well as usability. In this case, the project adopted the use of logarithmic scaling that places greater emphasis on the lower scales. <br>

Revision as of 17:11, 20 April 2015

Appannalogo.png Home Project Proposal Project Management Project Progress Project Final Progress Final Deliverable


ANALYSIS APPROACH & CHALLENGES FACED

Data Understanding & Representation
As mentioned of the nature of the project, it involves the use of multi-dimensional data, the following are the main categories that the project uses:

  • Spatial (geographical) data
  • Temporal (time-series) data

This creates challenges in terms of understanding how the various data interact and work together as well as the representations of the findings of the patterns and relationships within the data. In the following parts, the main challenges addressed in this thesis are:

  1. Visual Representation of Spatial Data
  2. Visual Representation of Temporal Data
  3. Visual Representation of High Dimensional Data
  4. Interactivity of Multi-Linked Views
  5. Context Provision

Visual Representation of Spatial Data

The spatial data provides the opportunity to provide representation of how geographical proximities could have inter-dependency that would give the investors better insights for decision-making. This opportunity creates the problem of how to represent information such that it is still comprehensive to budding analysts/investors. In order to cater to this need, the project implemented Choropleth for the visualization of spatial data.

Choropleth provides an overall map view to see the performance of the various countries and regions with respect to a certain variable at question. The limitation is that Choropleth which showcases the countries rankings through the use of color intensity is only able to showcase one variable at a time (univariate). In order to improve the functional use of Choropleth in representation of the such data, it could be done through the integration of zoomable user interface (ZUI). It is developed to allow user to zoom in on selected region for a focused view of the visualisation. In addition, the interactivity of Choropleth would be better with the graph being responsive to user clicks and selections on the map, allowing user to view information of only “what that is of interest”. All in all, these designs would help supplement the analytical value of the Choropleth map.

Arisaig Asia zoom.png



Visual Representation of Temporal Data

The characteristics of Temporal data is the time period/time intervals values. The main use of such data is to treat time as another numerical variable and the representation of time is then performed using a linear time axis. With respect to this project, the exploration of time is incorporated through the use of a time slider to represent the various time periods used in the dataset. As such, instead of following the typical approach of representing time via assignment of a specific axis, the project approach the representation of temporal data through the use of animation. What this means is that an additional variable could be added to the axes for more insightful analysis. The gist of this approach is that graphical representations at any one time are representations of a specific year only and the scrolling of the time slider would cause the visualisations to adapt, showcasing the new time’s data.

Visual Representation of High Dimensional Data

Parallel Coordinates

The data characteristics after the collection phase shows that there is a continuous format to the data, which means that the data have the opportunity to be made for comparison and sorting. However, the representation of the high dimensional data is not easy and could be overwhelming to average layman users. This is a consideration that the project has to bear in mind. In order to showcase such data, the project employed the use of Parallel Coordinates plot to view multiple dimensions at once.

The benefits of parallel coordinates plot is that one can see the overall patterns across various selected attributes (e.g. high birth rate, high life expectancy and low population density). In addition, it also fits the bill of allowing the view of how a country “stands” in comparison to other countries, in regional and worldwide. However, since parallel coordinates show everything to the users, it may be perceive as a “messy chunk of lines” to the untrained users. As such, there possesses a difficulty of making sure that there are sufficient guides and walkthroughs to train the users to see the patterns the graph is capable of offering. In addition to provision of training, the project has also incorporated the use of interactivity to make data exploration easier. Some ideas adopted are brushing to show/hide, highlighting of selected country and also shifting of axes to the comfort of the user.

The parallel coordinates do have other limitation as well, and one such example that could affect analytical exploration is that parallel coordinates plot is unable to tell the user the spread and distribution of the countries for the axes. In order to overcome this problem, the project introduced two enhancement that could be overlaid onto the parallel coordinates plot: box plot and histogram. The box plot provides statistical insights such as the median and interquartile range, while the histogram allows the users to see the frequency distribution of each varying axes. The overlays are also responsive to events hence could be a useful tool for the users on their exploration journey.

Arisaig Boxplotandhistogram.png


Nevertheless, these graphs do have their limitations. For example, Histogram is created for each axis, but the range of the histogram vary for each axis. Hence a comparison cannot be made across dimensions directly. As such, it is necessary to do standardization of all the axes to provide an equal and fair comparison. In addition, throughout the team’s data exploration process, the data shows signs of skewed distributions. which could be misleading and a poor interpretation of the actual situation. Therefore, the enhancement of providing transformations techniques (such as logarithmic, square and square root) are introduced. These enhancements are optional which caters to the analytical level of the users.

Scatterplot

Another visualisation the project adopted for the display of multi-dimensional data is the scatter plot. The purpose of the scatter plot is to reach for a better balanced between the provision of overall picture and in details representation. The characteristics of the scatter plot allows the users to make a simple comparison of the countries with the axes based on their positioning on the plot. The use of temporal data alongside the graph as animation can help to highlight patterns of periodic behaviour of the data between non-time related dimensions. Admittedly, the animation may not be able to tell the overall trend as easily as the use of time-series graph. Hence, the project incorporated time series related data as an overlay for each country upon click. The purpose is to inform the user of the trajectory of the growth of each country over time, while still maintaining the two axes variables.

Arisaig Scatter selected.png



Through exploration of the data, linear scaling of the axes proves to be a challenge and a poor visualisation tool for the scatter plot. This is due to the data being wildly skewed to the lower scale with huge outliers (e.g. USA in terms of Total GDP). As such, hardly any form of analytical insights could be derived from the use of linear scale. Nevertheless the use of linear scale is still necessary as it informs of the raw situations which could be used to inform of major players in the dataset. However, what this showcases is that there is the need to provide additional scaling options that could spread out the data points to improve both visibility as well as usability. In this case, the project adopted the use of logarithmic scaling that places greater emphasis on the lower scales.

Arisaig Scatterplot scaling.png



In order to further improve the scatterplot ease of use for the users, the project also move away from the use of D3 dynamic color assignment. As such, colors are predefined for the various regions, and will always be consistent even after reloading of the graphs. The reason is because D3 dynamic color assignment can change upon loading of graphs and what this means is that one moment the green could be used to represent Asia Pacific and the next moment it is yellow color. This would be very confusing to the users and distract the users from being able to find out more valuable insights.

Interactivity of Multi-Linked Views

Each graphical representation has its limitations and this is can be overcome through the supplement of other visualisations that are capable of providing the lacking information. This is achieved through the use of multi-linked views in this project, where the events trigger on a visualisation will affect the other visualisations. The technique adopted is “dynamic queries”, which are highly interactive systems that enable visualizations to be manipulated when the user dynamically interacts with the visualizations or the enhancements the project have added. For instance, what was once a faded region on the scatter plot can be immediately changed into visible and colored upon click on the Choropleth map. Such an approach, will provide a coordinated views where changes are reflected in real-time, hence helping the analyst/users in their exploration process.

In addition, the use of 3 varying types of visualisation, Univariate (Choropleth), Bivariate (Scatter plot) and Multivariate (Parallel coordinates) provide three varying views that display different aspects of the data. This is useful in the progressive approach of data exploration that allows users to gain insights based on their levels of expertise while making sure not to overwhelm them with information. Similarly, the navigation of the data also benefit when the many visualizations are all display in preassigned locations, thus providing rapid exploration by saving the user from performing the same or similar operations multiple times.

Context Provision

The challenge to visualization analytics is “communication” with users to how to best utilizations the visualization aids. End of the day, despite the use of a comprehensive analysis tool, it would be useless should the user be unable to use the tool unguided. Hence, the importance of introducing the context of the data, in other words “visual data stories”. The idea of the storytelling feature is to help set the context of the user to understand how the graphs act and the information they can tell when use individual and/or together. It is so that when the users come to better understand how to tell stories with visualizations, will new possibilities also open up. One such example is the tracking of urban population percentage with the annual disposable income in the scenario of beer industry investment (3.1.2 Packaged Food: Investment), users are able to shortlist potential countries while also picking out outliers such as the muslim countries.

Arisaig Context ss.png



TECHNICAL CHALLENGES

Here is the report