IS428 2016-17 Term1 Assign2 Nguyen Duy Loc

From Visual Analytics for Business Intelligence
Revision as of 15:04, 29 September 2016 by Dlnguyen.2012 (talk | contribs)
Jump to navigation Jump to search

Abstract

Food variety around the world always fascinate people. Different countries have different food consumption patterns and eating habits. Having lived in three countries for a long period, I could easily point out the difference among the food from these places. British food is all about cheese. Singapore's food is very diverse, coming from all over the world. But some of the traditional dishes are oily. And Vietnam's food is more about the balance of spices and herbs. However, in today's world, people have more concern for the calories amount and nutrition inside their food. Thus, this study aims to explore the patterns of not only calories but also other characteristics of food around the world.


Problem and Motivation

Obesity is an issue in first world countries. The abundance of food in these countries may lead to the increasing trend. Some of the questions I have in mind approaching this study:

Do the characteristics of food traditions in some countries like US make their citizens fatter?

Is there a difference in term of food nutritions and healthiness between Europe and Asia?


Data Gathering and Data Processing

The data I used for this study is the Open Food Facts from kaggle website. It is a multivariate source. Each record contains the information on a type of food. The fields are the details of each dish, nutritions, type, country, etc.

A snapshot of the data is as below:

Loc assign2 data.png

The country_en column has a problem. Some records have multiple countries. Thus, I use the functions in JMP to edit it into one country only to guarantee the consistency of data.

Loc country column.png

Another problem with this data set is missing values. I run a report with SAS Enterprise Miner and the result is that most of the variables contains missing values.

Loc assign2 missing data.png


Exploration and Analysis

Since the study is about discovering unknown unknowns. I apply all the techniques learnt in class to explore the data and find out any interesting patterns in it.

  • Ternary diagram

The three indicators used for ternary diagram is the energy per 100g, carbohydrates per 100g, and proteins per 100g. This is to have a first gauge on how healthy the food in the world is. The majority of the food is high in energy and protein level while low in carbohydrates. There are some that are really high in carbohydrates.

Loc assign2 ternary.png

However, the graph doesn't show many records. So I chose other columns such as proteins per 100g, sugar per 100g, and fiber per 100g. Turn out the pattern is much clearer. Food that contains high portion as fiber is harder to find as compared to food with high protein or sugar proportion.

Loc assign2 ternary2.png
  • Parallel coordinates
Loc assign2 Parallel.png

As we can see, the parallel graph of US has a denser and higher parts in fat and cholesterol categories. Thus, in comparing the two country, UK is more healthy than US.

  • Trellis
  • Mosaic plot
  • Divergent bar chart
  • Parallel Sets
  • TableLens
  • Treemap
Loc assign2 Treemap.png

The link to interactive dashboard:

https://public.tableau.com/profile/nguyen.duy.loc#!/vizhome/Assign2_7/Dashboard1

Conclusion

Europe countries tend to have a higher portion of fat in their food than the rest of the world.

Tools Utilized

  1. Excel 2013 for data preparation
  2. Tableau for visualization
  3. JMP Pro 12