Difference between revisions of "IS428 2016-17 Term1 Assign2 Nguyen Duy Loc"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==Abstract==
 
==Abstract==
  
Work safety is always one of the burning issues among the workers in Singapore. In fact, a large number of foreign workers come to Singapore to work. They get injured while working and have to go back home. This study attempts to understand the patterns of work injuries. During which time that accidents usually happen, how long does it take for the incidents to be reported to MOM, which are some of the major hospitals that handle victims, and what are the notorious industries in which accidents tend to happen more often.
+
Food variety around the world always fascinate people. Different countries have different food consumption patterns and eating habits. Having lived in three countries for a long period, I could easily point out the difference among the food from these places. British food is all about cheese. Singapore's food is very diverse, coming from all over the world. But some of the traditional dishes are oily. And Vietnam's food is more about the balance of spices and herbs. However, in today's world, people have more concern for the calories amount and nutrition inside their food. Thus, this study aims to explore the patterns of not only calories but also other characteristics of food around the world.
  
  
 
==Problem and Motivation==
 
==Problem and Motivation==
  
We often see the signs of workplace safety-related slogans on the construction sites or on governments' advertisements. But what we don't usually see is the actual accident cases and the suffering people have to go through because of these accidents. Why are there still many accidents in Singapore, a first world country that always stresses on the importance of safety at workplace. Is it due to human's errors or lack of safety practices.
+
Obesity is an issue in first world countries. The abundance of food in these countries may lead to the increasing trend. Some of the questions I have in mind approaching this study:
  
This study aims to understand the root causes of the problem and what factors increase the likelihood of an accident. The targets are the patterns of ages among the victims and of time distributions throughout the year. From these patterns, we can resolve if weather conditions have an effect in increasing the odds or if younger and less experienced workers are more prone to accidents. A side note is that among the major injuries, slipping/fall is the highest causes.
+
Do the characteristics of food traditions in some countries like US make their citizens fatter?
  
 +
Is there a difference in term of food nutritions and healthiness between Europe and Asia?
 +
 +
 
==Data Gathering and Data Processing==
 
==Data Gathering and Data Processing==
The data I used for this study is the Workplace Injuries Data 2014. It is a multivariate source. Each record contains the information on the accident such as date, time, type, and cause. Victim's details can be found on columns age, employment period, and occupation.
+
 
 +
The data I used for this study is the Open Food Facts from kaggle website. It is a multivariate source. Each record contains the information on a type of food. The fields are the details of each dish, nutritions, type, country, etc.
  
 
A snapshot of the data is as below:
 
A snapshot of the data is as below:
 
[[File:Loc assign2 data.png|700px|center]]
 
[[File:Loc assign2 data.png|700px|center]]
  
==Approaches==
+
The country_en column has a problem. Some records have multiple countries. Thus, I use the functions in JMP to edit it into one country only to guarantee the consistency of data.
 +
[[File:Loc country column.png|500px|center]]
 +
 
 +
Another problem with this data set is missing values. I run a report with SAS Enterprise Miner and the result is that most of the variables contains missing values.
 +
[[File:Loc assign2 missing data.png|700px|center]]
 +
 
 +
Some of the variables are categorical while some are numeric. However, the categorical attributes have a large number of classes and would be hard to be visualized on a map or to extract information from it. Thus, I attempted to convert some continuous attributes into categorical in order to explore some patterns inside them.
 +
 
 +
==Exploration and Analysis==
 +
Since the study is about discovering unknown unknowns. I apply all the techniques learnt in class to explore the data and find out any interesting patterns in it.
 +
 
 +
* Ternary diagram
 +
 
 +
The three indicators used for ternary diagram is the energy per 100g, carbohydrates per 100g, and proteins per 100g. This is to have a first gauge on how healthy the food in the world is. The majority of the food is high in energy and protein level while low in carbohydrates. There are some that are really high in carbohydrates.
 +
 
 +
[[File:Loc assign2 ternary.png|700px|center]]
 +
 
 +
However, the graph doesn't show  many records. So I chose other columns such as proteins per 100g, sugar per 100g, and fiber per 100g. Turn out the pattern is much clearer. Food that contains high portion as fiber is harder to find as compared to food with high protein or sugar proportion.
 +
 
 +
[[File:Loc assign2 ternary2.png|700px|center]]
 +
 
 +
* Parallel coordinates
 +
 
 +
[[File:Loc assign2 Parallel.png|700px|center]]
 +
 
 +
As we can see, the parallel graph of US has a denser and higher parts in fat and cholesterol categories. Thus, in comparing the two country, UK is more healthy than US.
 +
 
 +
* Trellis
 +
 
 +
[[File:Loc assign2 Trellis.png|700px|center]]
 +
 
 +
* Mosaic plot
 +
 
 +
[[File:Loc assign2 Mosaic.png|700px|center]]
 +
 
 +
* Divergent bar chart
 +
 
 +
[[File:Loc assign2 Divergent.png|700px|center]]
 +
 
 +
* Parallel Sets
 +
 
 +
[[File:Loc assign2 Parallel sets.png|700px|center]]
 +
 
 +
* TableLens
 +
 
 +
[[File:Loc assign2 TableLens.png|700px|center]]
 +
 
 +
* Treemap
 +
 
 +
[[File:Loc assign2 Treemap.png|700px|center]]
 +
 
 +
The link to interactive dashboard:
 +
 
 +
https://public.tableau.com/profile/nguyen.duy.loc#!/vizhome/Assign2_7/Dashboard1
  
==Policy Recommendations==
+
==Conclusion==
  
 +
Europe countries tend to have a higher portion of fat in their food than the rest of the world.
  
 
==Tools Utilized==
 
==Tools Utilized==
 
# Excel 2013 for data preparation
 
# Excel 2013 for data preparation
 
# Tableau for visualization
 
# Tableau for visualization
 +
# JMP Pro 12

Latest revision as of 06:50, 1 October 2016

Abstract

Food variety around the world always fascinate people. Different countries have different food consumption patterns and eating habits. Having lived in three countries for a long period, I could easily point out the difference among the food from these places. British food is all about cheese. Singapore's food is very diverse, coming from all over the world. But some of the traditional dishes are oily. And Vietnam's food is more about the balance of spices and herbs. However, in today's world, people have more concern for the calories amount and nutrition inside their food. Thus, this study aims to explore the patterns of not only calories but also other characteristics of food around the world.


Problem and Motivation

Obesity is an issue in first world countries. The abundance of food in these countries may lead to the increasing trend. Some of the questions I have in mind approaching this study:

Do the characteristics of food traditions in some countries like US make their citizens fatter?

Is there a difference in term of food nutritions and healthiness between Europe and Asia?


Data Gathering and Data Processing

The data I used for this study is the Open Food Facts from kaggle website. It is a multivariate source. Each record contains the information on a type of food. The fields are the details of each dish, nutritions, type, country, etc.

A snapshot of the data is as below:

Loc assign2 data.png

The country_en column has a problem. Some records have multiple countries. Thus, I use the functions in JMP to edit it into one country only to guarantee the consistency of data.

Loc country column.png

Another problem with this data set is missing values. I run a report with SAS Enterprise Miner and the result is that most of the variables contains missing values.

Loc assign2 missing data.png

Some of the variables are categorical while some are numeric. However, the categorical attributes have a large number of classes and would be hard to be visualized on a map or to extract information from it. Thus, I attempted to convert some continuous attributes into categorical in order to explore some patterns inside them.

Exploration and Analysis

Since the study is about discovering unknown unknowns. I apply all the techniques learnt in class to explore the data and find out any interesting patterns in it.

  • Ternary diagram

The three indicators used for ternary diagram is the energy per 100g, carbohydrates per 100g, and proteins per 100g. This is to have a first gauge on how healthy the food in the world is. The majority of the food is high in energy and protein level while low in carbohydrates. There are some that are really high in carbohydrates.

Loc assign2 ternary.png

However, the graph doesn't show many records. So I chose other columns such as proteins per 100g, sugar per 100g, and fiber per 100g. Turn out the pattern is much clearer. Food that contains high portion as fiber is harder to find as compared to food with high protein or sugar proportion.

Loc assign2 ternary2.png
  • Parallel coordinates
Loc assign2 Parallel.png

As we can see, the parallel graph of US has a denser and higher parts in fat and cholesterol categories. Thus, in comparing the two country, UK is more healthy than US.

  • Trellis
  • Mosaic plot
  • Divergent bar chart
  • Parallel Sets
  • TableLens
  • Treemap
Loc assign2 Treemap.png

The link to interactive dashboard:

https://public.tableau.com/profile/nguyen.duy.loc#!/vizhome/Assign2_7/Dashboard1

Conclusion

Europe countries tend to have a higher portion of fat in their food than the rest of the world.

Tools Utilized

  1. Excel 2013 for data preparation
  2. Tableau for visualization
  3. JMP Pro 12