IS428 2016-17 Term1 Assign2 Heng Yi Teng Mabel

From Visual Analytics for Business Intelligence
Revision as of 10:07, 6 October 2016 by Mabel.heng.2012 (talk | contribs)
Jump to navigation Jump to search

Abstract

I decided to explore the Open Food Facts dataset. Being a foodie, I was curious to know how well each country was eating. My final visualization was created using the Parallel Coordinates plot on Power BI.

Question

I wanted to explore the nutrition profiles of each country, specifically with regards to nutritional guidelines provided by the National Health Service (NHS), UK. The following figures were obtained mainly from the NHS UK website and were used as a basis for reference in analysing the different countries' nutrition profiles:

Nutrient Low (per 100g) High (per 100g) Source
Sodium 0.1g 0.6g http://www.nhs.uk/Livewell/Goodfood/Pages/salt.aspx
Fat 3g 17.5g http://www.nhs.uk/Livewell/Goodfood/Pages/Fat.aspx
Saturated Fat 1.5g 5g http://www.nhs.uk/Livewell/Goodfood/Pages/Fat.aspx
Sugar 5g 22.5g http://www.nhs.uk/Livewell/Goodfood/Pages/sugars.aspx
Cholesterol - 0.1g https://en.wikipedia.org/wiki/List_of_cholesterol_in_foods

Visualization

Nutrition Profiles of Countries with row count > 100

Click here to see the interactive model.

The data exploration and visualization process will be further elaborated on below.

Data Exploration

JMP was used for all data cleaning.

Initial columns removed

The following columns were removed as they were irrelevant to the question I wanted to answer:

  • quantity
  • packaging
  • url, creator, created_t, created_datetime, last_modified_t, last_modified_datetime
  • image_url, image_small_url
  • product_name

The following columns were removed as similar information was already captured in other columns:

  • countries, countries_tags

Columns removed due to missing data

Missing data was explored using the "Explore Missing Values" function under "Cols".

Exploring missing data

The following columns were removed as more than 50% of the rows were missing data and available data had count < 1000:

  • origins, origins_tags
  • manufacturing_places, manufacturing_places_tags
  • labels, labels_tags, labels_en
  • emb_codes, emb_codes_tags
  • first_packaging_geo
  • cities, cities_tags
  • stores
  • allergens, allergens_en
  • traces, traces_tags, traces_en
  • serving_size
  • no_nutriments
  • additives_tags, additives_en
  • ingredients_from_palm_oil, ingredients_from_palm_oil_tags, ingredients_that_may_be_from_palm_oil, ingredients_that_may_be_from_palm_oil_tags
  • nutrition_grade_uk, nutrition_grade_fr
  • energy_from_fat_100g
  • omega_3_fat_100g
  • sucrose, glucose, fructose, lactose, maltose
  • starch
  • caffeine
  • carbon_footprint_100g

Reformatting country variables to create Country variable

I wanted to clean up the variable, countries_en. I noticed that some products were linked to more than one country under countries_en. This is likely because the Open Food Facts database is a French initiative and France is experimenting with mandatory country of origin labelling for milk and meat in processed foods. Hence foods might be labelled with more than one country. For simplicity's sake, I removed country's labelled with more than one country. Simply using the first country in the list of countries labelled would have caused some countries to be overly-represented since the countries were listed in alphabetical order.


Step 1: Using "Text to Columns" function
Using "Text to Columns" function


Step 2: Countries under countries_en column are now split into different columns
Countries under countries_en column split into different columns


Step 3: Sort by countries_en_2 and delete rows with two and more countries tagged to them
Sort and delete countries_en


Step 4: Rename countries_en_1 to countries_en_new and delete all other columns, countries_en_#, created by “Columns to Text”

Next, I looked through the countries_en_new column and cleaned up countries that were keyed in twice under different names.

Step 1: Using "Recode" function
Using "Recode" function


Step 2: Recoding countries and creating new column, Country
Creating Country variable

Recoding country variables to create continents column

Step 1: Using "Recode" function

Recoding Country column

Through this, I managed to create the Continent column.

Continent variable created

Visualization Process

First try: Trellis Plot in Tableau

I explored Tableau and tried to create a Trellis plot but was unable to get the plots to overlay. The visual below makes it difficult to have an overview of the different nutrition profiles.

Attempt at Trellis Plot

Second try: Parallel Coordinates chart in Power BI

To me the most logical visualization to use would be a Parallel Coordinates chart. I ended up importing a custom visualization for Parallel Coordinates from the Power BI Community. The resulting visualization is as seen in Section 3 Visualization.

Filter limitations

I also wanted to note that filtering does not appear to be as flexible on Power BI as on Tableau. I wanted to filter Countries by those with row count >= 100 to reduce clutter in the visual but could not. It was something I did easily in Tableau. For categorical data, it seems Power BI only allows manual selection of the data points you want. Hence, I had to go back to JMP, filter out the data by country count >= 100 and import it into Power BI again.