Difference between revisions of "ANLY482 AY2016-17 T2 Group13 - Data"

From Analytics Practicum
Jump to navigation Jump to search
 
Line 124: Line 124:
 
|}
 
|}
 
<br>
 
<br>
[[File:2.1_Extraction_-_6D_Query.PNG|l500px]]
+
[[File:2.1_Extraction_-_6D_Query.PNG|500px]]
  
  

Latest revision as of 11:22, 20 February 2017



  AP PROJECTS

  HOME

  ABOUT US

  PROJECT OVERVIEW

  PROJECT FINDINGS

  PROJECT MANAGEMENT

  DOCUMENTATION



As our sponsor does not have an internal pool of data, we were instructed to scrape publicly available data sources from websites such as Trademap.org, Comtrade.un.org and data.oecd.org. These websites housed all the data that our sponsor currently uses. We were specifically told what sectors of data to extract to form the data source we would be working on. We chose to extract data solely from comtrade.un as they seemed to provide the data in its rawest form.


Data Source

UN Comtrade is a repository of official international trade statistics and relevant analytical tables. It provides global import and export data for goods and services, with 2-digit, 4-digit and 6-digit HS codes for commodities.

As DIT specifically looks at imports from UK into Singapore, import trade flows are extracted using a query of Reporter: Singapore and Partner: UK. Furthermore, to analyse competitors within the region as well as globally, import trade flows using a query of Reporter: Singapore and Partner: World are also extracted.

Through initial data exploration, our team found that there is a discrepancy in the valuation of Singapore imports from the world and world exports to Singapore. Ceteris paribus, these trade statistics should be the same. However, the principal reasons for inconsistent statistics on destination and origin for a given shipment are differences in 1) classification concepts and detail, 2) time of recording, 3) valuation, and 4) coverage, as well as 5) processing errors. Hence, our team decided to focus on the valuation of Singapore imports from the world as this data set would be a standardised valuation. Furthermore, since the DIT is looking to bring in British businesses into Singapore, having Singapore's valuation of its imports would provide a benchmark to compare against other competing countries.

Data Dictionary

The interpretation of Goods_2016_4D.csv is as follows:

Group13 DataDictionary.png
Summary of Extraction Process and Errors

Data was extracted from comtrade.un. The objective was to obtain a dataset which includes the following characteristics:
- Trade value of imports from all countries in the world into Singapore
- Full names of commodities and respective 2D, 4D and 6D HS codes
- Full names of services and respective EBOPS codes
- All months from 2011 – 2015


We utilised the download interface on comtrade.un to select the type of data to extract. This included selecting specific variables for each query on the interface. As comtrade.un limits .csv downloads for general users to 50,000 rows per file, the data for commodities with 2D, 4D and 6D HS codes were each extracted using a different query, as each data set for commodities with 2D, 4D and 6D HS codes exceeded 50,000 rows.


For all downloads for goods, these were the fixed queries:

Type of Product “Goods”
Frequency “Monthly”
Reporters “Singapore”
Trade Flows “Import”

Goods - 2D HS Codes
Commodities represented with 2D HS codes were extracted using this query:

Periods (year, month) Each file downloaded included 5 months’ worth of data
Partners “All”
HS (as reported) commodity codes “AG2 - All 2-digit HS commodities”


2.1 Extraction - 2D Query.PNG


Goods - 4D HS Codes
Commodities represented with 4D HS codes were extracted using this query:

Periods (year, month) Each file downloaded included only 5 months each
Partners “All”
HS (as reported) commodity codes “AG4 - All 4-digit HS commodities”


2.1 Extraction - 4D Query.PNG


Goods - 6D HS Codes
Commodities represented with 6D HS codes were extracted using this query:

Periods (year, month) Each file downloaded included only 5 months each
Partners Each file downloaded included only 5 countries each
HS (as reported) commodity codes “AG6 - All 6-digit HS commodities”


2.1 Extraction - 6D Query.PNG


Services
As services do not follow the HS codes, a separate query was used for services.

Type of Product ! “Services”
“Annual”
“EBOPS - 2002”
2011 to 2015
“Singapore”
“All”
“Import”
“ALL - All EBOPS 2002 Services”


2.1 Extraction - Service Query.png