Talk:Lesson04

From Visual Analytics for Business Intelligence
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Lucky us!

It has been already four classes but it is for sure that Visual Analytics is so huge that the number of classes won’t be enough to cover everything. All these data are an infinite resource we can play with. Still in the lesson 04, we have seen it. How the visualization is impressive and interesting just by playing with the different dimensions and categories. Of course, now, data is coming from everywhere and not everyone is seeing it from a good eye. But for my part, I think it as an opportunity. Opportunity for the world to improve itself by analyzing the data but also opportunity for me and us. Data is offering new jobs as data scientist. And this kind of jobs is not boring anymore. There is not only one type of data, we cannot even count them, they will be always something to do and no time to say that we don’t like our job because it is annoying. I don’t know about Singapore, but in my country, Belgium, all this data analyses are quite new. Of course there are experts but the number of data is so huge that they’re not sufficient. Companies are looking for people like us. Able to understand the data, interpret them, give them a story but always with a business side, knowing about what we are working with. But still, these people are difficult to find in Belgium because at school we are not specialize in it. We are talking about Big Data, we have Data Mining classes but that’s it. When I looked through the catalogue here it seems more developed. So I was wondering how does it work here? Do you think employers are searching more and more for data scientist or is it already the case in Singapore? For my part, I am happy to learn about all this techniques and programs because it will be for sure a real difference on my CV. I also think that the quicker entreprises are going to realize how it is important to have people able to deal with data (us), the better it will be for them. It means also that this field will still grow and that our qualifications will be definetely valuable.

Don't hesitate to comment to tell your point of view and if Singapore has already more offering this kind of jobs than in Belgium.

-Margot Stelleman

Multivariate Analysis Using Parallel Coordinates

This article discusses the benefits of using parallel coordinates to analyze multivariable data.

To someone who has never examined this graph, it would appear overwhelming and messy. The huge clutter of overlapping lines seems to offer little insight about any patterns of trends. However, after reading this article, I have a new appreciation for parallel coordinates as it offers a new perspective for comparison which I previously was not aware of.

I think that parallel coordinates allow users to get a quick sense of how the overall data is like and what are the general patterns. For example, cars with more cylinders tend to have the lowest MPG (miles per gallon). This can be done easily by simply brushing the lines with high cylinders, follow the highlighted lines and observe where do they connect to on the other axis.

Parallel coordinates are uncommon because its full benefits can only be realized when it is interactive. By hovering the mouse over the graph, the selected line is highlighted and the user can easily contrast against other data and identify patterns. This is not possible over hardcopy text. In fact, parallel coordinates are messy if there is no appropriate highlighting of important lines for comparison. Noting this, it is perhaps important to consider what medium the graphs will be presented on and decide if it is still suitable.

-Arnold Lee Wai Tong

Parallel Sets

Besides the Mosaic Plot, Parallel Set is also an interactive visualization application for displaying multidimensional categorical data. It is similar to the parallel coordinate plot whereby the same category is being “bundled” together.

In terms of how do we read the information from the parallel sets, for each dimension, a horizontal bar is shown for each of its possible categories. The width of the bar indicates the absolute number/frequency of matches for that category. Between the dimension bars are ribbons that connect categories that show us how the combination of categories is distributed and how a particular subset can be further subdivided. As such, this can be useful in exploring relationships in data that might be elusive if you are facing with many categories.

Although, parallel set does provide us with a nice and refreshing set of outlook in our chart, it also has its disadvantages. For instance, it is unable to show a strong degree of association between dimensions based on its nice regular pattern as compared to Mosaic Plot. Therefore, it is imperative to perform a thorough exploratory analysis on the data and to trial and error on various kind of visualizations before we are able to reap out the purposes and benefits of using a parallel set plot.

-Lim Kim Yong

An Introduction to Visual Multivariate Analysis

When we wish to understand the relationship between some set of variables that characterize a product, or how the profile formed the values of these variables for a particular product compares to the multivariate of other products, we call this process of innvestigation multivariate analysis.

Crosstab Arrangements of Small Multiples Small multiples is an arrangement of small graphs, all within eye span, which look precisely the same, except that each displays a different subset of a larger set of data.

Multiple Concurrent Views with Brushing Functionality Brushing allows us to select a subset of data in one of the displays to highlight it, resulting in that same subset of data being automatically highlighted where it appears in every one of the views.

Heatmap Matrix The term heatmap generally refers to any visual display that uses variations in color to encode a quantitative variable.

A heatmap matrix is a tabular arrangement of cells that each encodes a quantitative value as color corresponding to some categorical variable across the columns and another categorical variable down the rows.

Parallel Coordinates The real power of parallel coordinates comes from interactions with the display to filter out what doesn’t interest you and to find all of those entities that match a particular multivariate profile that does interest you.

Glyphs The term glyph refers to a graphical object that simultaneously represents the values of multiple variables. Each axis represents a separate variable with low values near the centre and high values near the perimeter.

Table Lens TableLens enables exploring multivariate data sets by arranging data into tabular rows and columns. TableLens allows "opening" up regions that also show textual values along with the graphical representation. This ability to see specific values in context means that there isn't a back and forth shuffling between different views.

- Alson Tan