Difference between revisions of "ANLY482 AY2016-17 T2 Group13 - Project Findings"

From Analytics Practicum
Jump to navigation Jump to search
(Created page with "<!-- Logo --> <br> <!-- End Logo --> <!-- Start Nav Bar --> {| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0"...")
 
 
(8 intermediate revisions by the same user not shown)
Line 6: Line 6:
  
 
{| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
{| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;" width="15%" |
+
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;" width="14%" |
 +
&nbsp;[[ANLY482 AY2016-17 Term 2 | <font color="#ffffff" size="2"><strong>AP PROJECTS</strong></font>]]
 +
 
 +
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;" width="14%" |
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 | <font color="#ffffff" size="2"><strong>HOME</strong></font>]]
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 | <font color="#ffffff" size="2"><strong>HOME</strong></font>]]
  
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="14%" |  
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - About_Us | <font color="#ffffff" size="2"><strong>ABOUT US</strong></font>]]
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - About_Us | <font color="#ffffff" size="2"><strong>ABOUT US</strong></font>]]
  
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;border-left: 0px" width="15%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;border-left: 0px" width="14%" |  
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Project Overview | <font color="#ffffff" size="2"><strong>PROJECT OVERVIEW</strong></font>]]
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Project Overview | <font color="#ffffff" size="2"><strong>PROJECT OVERVIEW</strong></font>]]
  
| style="font-family:Century Gothic; font-size:100%; background:#f4f9fd; text-align:center;border-left: 0px" width="15%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#f4f9fd; text-align:center;border-left: 0px" width="14%" |  
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Project_Findings | <font color="#000000" size="2"><strong>PROJECT FINDINGS</strong></font>]]
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Project_Findings | <font color="#000000" size="2"><strong>PROJECT FINDINGS</strong></font>]]
  
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="14%" |  
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Project_Management | <font color="#ffffff" size="2"><strong>PROJECT MANAGEMENT</strong></font>]]
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Project_Management | <font color="#ffffff" size="2"><strong>PROJECT MANAGEMENT</strong></font>]]
  
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
+
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="14%" |  
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Documentation | <font color="#ffffff" size="2"><strong>DOCUMENTATION</strong></font>]]
 
&nbsp;[[ANLY482 AY2016-17 T2 Group13 - Documentation | <font color="#ffffff" size="2"><strong>DOCUMENTATION</strong></font>]]
  
Line 32: Line 35:
  
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:5px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Project Findings| <b>Methodology</b>]]
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:5px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Project Findings| <b>Methodology</b>]]
 +
 +
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Initial Visualizations & Findings| <b>Initial Visualizations</b>]]
  
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Data Exploration| <b>Data Exploration</b>]]
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Data Exploration| <b>Data Exploration</b>]]
  
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Initial Visualizations & Findings| <b>Initial Visualizations</b>]]
+
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Trade Analysis Dashboard| <b>Trade Analysis Dashboard</b>]]
 
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - Trade Analysis Dashboard| <b>Geospatial Dashboard</b>]]
 
  
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - References| <b>References</b>]]
 
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[ANLY482 AY2016-17 T2 Group13 - References| <b>References</b>]]
Line 48: Line 51:
 
<!------- Details ---->
 
<!------- Details ---->
  
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Preparation</strong></font></div></div>
+
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Pre-Processing</strong></font></div></div>
 +
 
 +
Before embarking on an Exploratory Data Analysis (EDA, the data had to be transformed for further analysis. Viewing the data based on 2D HS codes alone was insufficient because aggregated categories included too many possible unrelated items in the same category. These were the data cleaning steps that were attempted:
 +
 
 +
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>1. Combination of Data</strong></font></div></div>
 +
[[File:2.2_combine_csv.PNG|500px]]
 +
 
 +
All the .csv files were placed in a folder to ensure that concatenation of csv files would only be among datasets that we will be using for the project. Then the “copy *.csv” function was used to combine all the csv folders together though leaving the original folders intact.
 +
 
 +
[[File:2.2_rbind_data.png|500px]]
 +
 
 +
R was also used to combine folders together. This was done by loading the .csv files into variables. After which, the “rbind” function was used to combine the datasets together.
 +
 
 +
 
 +
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>2. Removal of Unnecessary Columns</strong></font></div></div>
 +
[[File:2.2_Raw_Data_2d_1.png|500px]]
 +
 
 +
[[File:2.2_Raw_Data_2d_2.png|500px]]
 +
 
 +
Many of the columns in the raw data file did not have values in them. Due to this, we decided to remove the columns in Microsoft Excel because they would not have been useful to our analysis.
 +
 
 +
[[File:2.2_Removed_Columns_2d_1.png|500px]]
 +
 
 +
[[File:2.2_Removed_Columns_2d_2.png|500px]]
 +
 
 +
Above is the data set after removing the unnecessary columns. Variables containing weight appeared in 4d and 6d data and were not as apparent for 2d data. Due to this, we decided to leave the variables for the dataset.
 +
 
 +
 
 +
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>3. Sectors and Dimension Tables</strong></font></div></div>
 +
We received the names for 18 sectors that the DIT focuses on, listed below.
 +
 
 +
{| class="wikitable"
 +
! s/n
 +
! High Value Campaign
 +
! High Priority Volume
 +
! Low Priority Volume
 +
! Unclassified
 +
|-
 +
| 1
 +
| Aerospace
 +
| Retail/Consumer
 +
| Advanced Manufacturing (Excluding aerospace and automotive)
 +
| Others - Raw  Materials
 +
|-
 +
| 2
 +
| Food and Beverage
 +
| Education
 +
| Automotive
 +
| Others - Manufacturing
 +
|-
 +
| 3
 +
| Infrastructure (Water and Environment)
 +
| Energy
 +
| Bio-economy (Agri-tech)
 +
| Others
 +
|-
 +
| 4
 +
| Infrastructure (Rail)
 +
| Financial and Professional Business Services
 +
| Bio-economy (Chemicals)
 +
| Financial and Professional Business Services - Others
 +
|-
 +
| 5
 +
| Technology
 +
| Healthcare
 +
| Sports
 +
|
 +
|-
 +
| 6
 +
| Food and Beverage
 +
| Life Sciences
 +
|
 +
|
 +
|-
 +
| 7
 +
|
 +
| Infrastructure (Airports)
 +
|
 +
|
 +
|}
 +
 
 +
 
 +
<u>Dimension Table for Sectors</u><br>
 +
A separate dimension table was created to individually assign HS codes to the sectors.
 +
 
 +
[[File:2.2_Raw_HS.png|500px]]
 +
 
 +
Raw data file for the HS codes only includes section, commodity code and commodity.
 +
 
 +
[[File:2.2_Clean_HS.png|500px]]
 +
 
 +
Cleaned HS data includes sector and sector category. This data was based on the important sectors of our sponsor. Sector category includes information of what our sponsor considered more important commodities or less important commodities. Then, the sectors and sector categories were assigned to the commodity based on approval by our sponsor.
 +
 
 +
 
 +
[[File:2.2_vlookup_HS.png|500px]]
 +
 
 +
The “vlookup” function was used to combine the import data and commodity data based on commodity variable to add Section, Sector and Sector Category.
 +
 
 +
 
 +
<u>Dimension Table for Countries</u><br>
 +
[[File:2.2_Raw_Geog_Data.png|500px]]
 +
 
 +
The initial geographic data found on the UN website only included name of the country (Partner) and its respective country code.
 +
 
 +
We referred to the [https://unstats.un.org/unsd/methods/m49/m49regin.htm United Nations Statistics Division] to assign countries to geographic regions, continents and political organisations.
 +
 
 +
[[File:2.2_Clean_Geog_Data.png|500px]]
 +
 
 +
The information on the UN website was used to manually add in Continent and Region. Economic Unions for particular countries were then assigned to their respective countries.
 +
 
 +
 
 +
[[File:2.2_vlookup_Geog.png|500px]]
 +
 
 +
The “vlookup” function was used to combine the import data and geographic data based on Partner variable to add Continent, Region and Economic Union.
 +
 
 +
 
 +
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>4. Calculated Fields</strong></font></div></div>
 +
 
 +
<u>Conversion of “Period” to date variable</u><br>
 +
[[File:2.2_Tableau_DateParse.png|500px]]
 +
 
 +
“Dateparse” function was used to convert “Period” into a date variable in Tableau.
 +
 
 +
 
 +
<u>Calculation of “Benchmarks”</u><br>
 +
For the UK Benchmark, this value was used as the UK reference line in the bullet chart as a comparison to commodities within the selected country.
 +
 
 +
 
 +
For the Average Benchmark, this value was used in the UK reference line in the bullet chart as a comparison to countries within the selected sector/commodity.
  
XXX
+
 +
<u>Calculation of “Market share of UK imports relative to the world”</u><br>
 +
This value determines the market share of UK imports relative to the world, and is used in the initial UK overview treemap. The equation used is as follows:
 +
[[File:2.2_Calculation_of_Market_Share_of_UK_Imports_Relative_to_World.PNG|100px]]

Latest revision as of 15:36, 20 February 2017



  AP PROJECTS

  HOME

  ABOUT US

  PROJECT OVERVIEW

  PROJECT FINDINGS

  PROJECT MANAGEMENT

  DOCUMENTATION



Data Pre-Processing

Before embarking on an Exploratory Data Analysis (EDA, the data had to be transformed for further analysis. Viewing the data based on 2D HS codes alone was insufficient because aggregated categories included too many possible unrelated items in the same category. These were the data cleaning steps that were attempted:

1. Combination of Data

2.2 combine csv.PNG

All the .csv files were placed in a folder to ensure that concatenation of csv files would only be among datasets that we will be using for the project. Then the “copy *.csv” function was used to combine all the csv folders together though leaving the original folders intact.

2.2 rbind data.png

R was also used to combine folders together. This was done by loading the .csv files into variables. After which, the “rbind” function was used to combine the datasets together.


2. Removal of Unnecessary Columns

2.2 Raw Data 2d 1.png

2.2 Raw Data 2d 2.png

Many of the columns in the raw data file did not have values in them. Due to this, we decided to remove the columns in Microsoft Excel because they would not have been useful to our analysis.

2.2 Removed Columns 2d 1.png

2.2 Removed Columns 2d 2.png

Above is the data set after removing the unnecessary columns. Variables containing weight appeared in 4d and 6d data and were not as apparent for 2d data. Due to this, we decided to leave the variables for the dataset.


3. Sectors and Dimension Tables

We received the names for 18 sectors that the DIT focuses on, listed below.

s/n High Value Campaign High Priority Volume Low Priority Volume Unclassified
1 Aerospace Retail/Consumer Advanced Manufacturing (Excluding aerospace and automotive) Others - Raw Materials
2 Food and Beverage Education Automotive Others - Manufacturing
3 Infrastructure (Water and Environment) Energy Bio-economy (Agri-tech) Others
4 Infrastructure (Rail) Financial and Professional Business Services Bio-economy (Chemicals) Financial and Professional Business Services - Others
5 Technology Healthcare Sports
6 Food and Beverage Life Sciences
7 Infrastructure (Airports)


Dimension Table for Sectors
A separate dimension table was created to individually assign HS codes to the sectors.

2.2 Raw HS.png

Raw data file for the HS codes only includes section, commodity code and commodity.

2.2 Clean HS.png

Cleaned HS data includes sector and sector category. This data was based on the important sectors of our sponsor. Sector category includes information of what our sponsor considered more important commodities or less important commodities. Then, the sectors and sector categories were assigned to the commodity based on approval by our sponsor.


2.2 vlookup HS.png

The “vlookup” function was used to combine the import data and commodity data based on commodity variable to add Section, Sector and Sector Category.


Dimension Table for Countries
2.2 Raw Geog Data.png

The initial geographic data found on the UN website only included name of the country (Partner) and its respective country code.

We referred to the United Nations Statistics Division to assign countries to geographic regions, continents and political organisations.

2.2 Clean Geog Data.png

The information on the UN website was used to manually add in Continent and Region. Economic Unions for particular countries were then assigned to their respective countries.


2.2 vlookup Geog.png

The “vlookup” function was used to combine the import data and geographic data based on Partner variable to add Continent, Region and Economic Union.


4. Calculated Fields

Conversion of “Period” to date variable
2.2 Tableau DateParse.png

“Dateparse” function was used to convert “Period” into a date variable in Tableau.


Calculation of “Benchmarks”
For the UK Benchmark, this value was used as the UK reference line in the bullet chart as a comparison to commodities within the selected country.


For the Average Benchmark, this value was used in the UK reference line in the bullet chart as a comparison to countries within the selected sector/commodity.

Calculation of “Market share of UK imports relative to the world”
This value determines the market share of UK imports relative to the world, and is used in the initial UK overview treemap. The equation used is as follows: 2.2 Calculation of Market Share of UK Imports Relative to World.PNG