CONTENT
_____6.2: Data Presentation for Visual Analysis __________6.2.1: Round Number for Degree of Precision __________6.2.2: Analyzing Categorical Data __________6.2.3: Analysing Interval Data __________6.2.4: Computing Survey Results - Using Frequency Tables

6.2: Data Presentation for Visual Analysis

Prior to deciding the graphic display to visualize the analysis, it is important (but often forgotten) to study the type of the data types again in the raw data table; and prepare the data such that it is easier for analysis later. This is important because, for any visual analysis that one has chosen, it is supported by the data behind the graphic.

In this assignment, an overview of the data that is likely to be presented on the visual, as below;

6.2.1: Round Number for Degree of Precision

For all datatypes, we will report the numerical values with round number (especially the survey results), eg: 50% - to avoid communicating a false degree of precision. This is important because many of the surveys represent only a fraction of the group of interest, hence; there is the possibility of errors when we are making inference on a population. Therefore, to report numbers with higher precision such as 12.34%, may communicate a false degree of precision to the audience.

6.2.2: Analyzing Categorical Data

From the 1st set of dataset, "Survey_Result"; the categorical data are identified as blow;

GENDER
DOMAIN
PHD
UNIVERSITY
UOC_POSITION
USERWIKI

The above fields are already recoded under the section of 5.4.1.2: Consistency of Data, and they are ready to be displayed on visual graphs. Therefore, no further work is required.

Categorical data are found in the second dataset of "Survey_QN_Master". Actual questions from the online survey are found under the field name of "Qn_Details". It was observed that the questions may be too lengthy to be displayed on the visual graph. Therefore, the questions can be shortened, and yet; not lose the meaning of it. This is completed under the section of 5.5.1: Rephrasing the Survey Questions.

6.2.3: Analysing Interval Data

Next, we identified that there are 2 interval data of "Age" and "Year of Experience" in the data set of "Survey Results". For ease of comparison, we will be creating bins for them, as below;

Version 1: Tableau (Automated Binning)	Version 2: Tableau (Automated Binning - Formatting the "Age")

Even though it is normally distributed (The shape), however; We can see that it is difficult to compare the age with 2 decimal places.	With formatting and rounding to a whole number, it is still difficult to compare across the bins of the age. Intuitively, human tends to select/think in multiple or 5 or 10. Thus, this makes the reader more difficult in doing any selection.

Since automate binning in tableau is not doing a good job, we will then create a manual binning instead. The details on how to derive the bin can be found in 5.5.2: Creating Manual Binning.

Version 3: Tableau (With 5 Bins)	Version 4: Tableau (With 5 Equal Bins)

Intuitively, we tend to set equal size interval so as to show a fair distribution. It is common to use the scale of "<=30", "31-40", "41-50", "51-60", ">60". However, with this scaling ; it is noticed that to the reader, it may not be clear on what is the age range of the respondents for "<=30" & ">60". This might be misleading that there are much younger or older participants. Thus, an improvement can be done with "equal size interval".	With age range of 23 to 69, we can set equal size interval using the scale of "20-30", "31-40", "41-50", "51-60", "61-70". Even through the graphic looks the same as previous, however; we can see clearly that the age range of the respondent fall between 20 to 70 this time round.

With this, we will also be using Tableau (With 5 Equal Bins) for the field of "Year of Experience". The results as below;

6.2.4: Computing Survey Results - Using Frequency Tables

The next, and most importantly; we will need to present the results of the survey by computing the total frequency of each of the responses. All the survey results will be designed with a frequency table, and this will help the users to see the distribution of results directly. As mentioned in 5.2.1: Common Mistake 1 - Using Arithmetic Mean, we will intentionally avoid using "average" so that the reader is able to focus on the distribution, the real story; instead on the "summary", basing on "average".

To support the creation of this frequency table, we will need to

Add in a new column field of #ID into the dataset of "Survey Results".With the added #ID field, we are able to pivot the dataset, such as to compute the total frequency of each response.
This is completed in 5.5.3: Adding #ID to the Survey Result's Dataset
Join the dataset of "Survey Master", and "Survey Results" together; so that the category & the question can be displayed on the table, with its results. This is completed in 5.5.4: Joining Survey Results' Table and Survey Master's Table
In order to select the indicators to meet the objective, we have to group the category of the survey questions by the measurement indicator. As such, we need to create a new table and join this new table with the survey master table. This is completed in 5.5.5: Creating an "Indicator_Master" and Joining the Tables

Based on the results of the survey, the first iteration to generate a frequency table; as below;

As we can see, using a table to present the results is not ideal because it is difficult to make comparison across the row and column. This will be replaced with visual in section 6.3: Visualization Graphical and Tools.

Previous Sub-section - 6.1: Initial Layout

Next Sub-section - 6.3: Selection of Visuals and Tools

Previous Section: Data Preparation

Next Section: Results