IS415 Geospatial Analytics Project 2010-11: Visualizing and Analyzing the Geography of Uneven Development in Vietnam

From Geospatial Analytics for Business Intelligence
Jump to: navigation, search

The4Geographers: Visualizing and Analyzing the Geography of Uneven Development in Vietnam

Final Deliverables

4geographers final.png

  • We will like to thank Professor Kam Tin Seong for his guidance and the class of IS415 for their valuable feedbacks regarding our project.

Contact us

  • Jacob Selvan Muthu S/O Silvaraju <>
  • Janice ChuaJian Fern <>
  • Lim Chee Ning <>
  • Thia Kai Xin <>


This project introduces a new approach to the analysis of regional development dynamism over space and time. Drawing on the recent advances in web 2.0, spatial statistics and data visualization, we suggest a spatiotemporally explicit view of regional development. It is motivated by the dissatisfaction with the overly restrictive nature of existing regional growth theories that are largely at odds with the rich set of spatial dynamics encountered in empirical work. Based on the integration of an interactive data visualization and location quotient, our proposed framework aims to provide greater insights on the role of spatiotemporal dependence in regional development. The potential values and contribution of the proposed approach will be demonstrated in a case study of provincial level development of agriculture in Vietnam over the 1998-2008 period.


Agriculture is one of the main contributors to Vietnam’s GDP, contributing a significant 20.6% (indexmundi, 2011). With the success of policies such as Doi Moi, the agricultural sector in Vietnam has been undergoing modernization over the years and there are numerous studies done to understand how these affects the shift from traditional crops towards cash crops. However, what these studies lack is the solid data analysis and visualization to support it. On the other hand, The General Statistics Office of Vietnam has a good collection of Vietnam’s agricultural data but the trends are buried deep within the datasets. The true value of these data are not fully utilized or exploited except for those who own a desktop based Geographical Information System (GIS).

Thus, we seek to design a web-based application for government officials, investors or businessman to explore and analyze the agricultural data from General Statistics Office with the ultimate goal of supporting policy making and realizing business opportunities in the agricultural sector.

Related Work

There have been numerous studies done in the agricultural sector of Vietnam but most lacking extensive data analysis and visualization using Geospatial techniques and time series. One such paper is by Man Quang Huy, titled “Building a Decision Support System for Agricultural Land Use Planning and Sustainable Management at the District Level in Vietnam”. (Huy, 2009) Numerous other studies have similar limitations as well, where they focus only on one province or for a short time period. For example, in “Land use dynamics in the central highlands of Vietnam: a spatial model combining village survey data with satellite imagery interpretation”, only data of Dak Lak province in year 1975, 1992, 2000 is used for the study (Müller & Zeller, 2002) while “Mapping paddy rice agriculture in South and Southeast Asia using multi-temporal MODIS images” (Xiao, et al., 2005) only covered a generic overview of Vietnam’s paddy production without any provincial level analysis. Thus, we can see that these studies lack provincial level comparison and analysis or cover too short of a time period to establish trend. We will like to mitigate these limitations in our application.

Analysis based on our application

The visualization of data on the map allows us to make interesting observations. From the visualization of the data, we observe that aquatic product makes up the large majority of agricultural production in Vietnam. We will expect aquatic production to be high along the coast but this is proven otherwise in Vietnam. The inland provinces in both the south and north of Vietnam like Yen Bai, Phu Tho, Vinh Phuc, Thai Nguyen, Dong Thap and Lam Dong actually have a higher agricultural output of farm fish as compared to the provinces along the coastline. With three-quarter of the country mainly consisting of a massive network of rive deltas, little wonder there is such high level of aquatic production in Vietnam. 15 major rivers, together with the Mekong and Red River with a drainage basin of 72,300km2 in the south and 60,960km2 in the north respectively are the major agricultural areas and centers of population (Linh, 2001).

The Mekong Delta is noted among the great rice-producing regions of the world and is the dominant agricultural region of the south. Excess grain from the area is shipped to the northern parts of the country. However, interestingly data from the last few years show otherwise. The fishery production has increased due to the government’s call and initiatives in the recent years. Foreign importers have supported Vietnamese farmers to develop clean fish farming projects. Even the prime minister urged Ca Mau to develop aquatic products, particularly create a breakthrough in shrimp farming and increase the shrimp rising. Thus, it is important to note that this large production in aquatic products is very much driven by government policy and external investments.

With the use of time series and the map, we are also able to observe the growth of many provinces in the production cash crop, for instance the province of Bac Kan and Cao Bang started producing more cash crops like peanuts over the time period 1998-2008. However, if we were to do a deeper analysis we will realize that Cao Bang specialization based on LQ is not in peanuts but rather, it is in farm fish. This allows us to infer that although some provinces might be producing high agricultural output in a certain crop, they should focus on their specialization to yield better returns and perhaps government can step in to provide incentives to tweak the agricultural production of the province. By understanding the different specialization of each province, resources such as water, land, machinery and technologies can also be better allocated.

In fact, the 2010 Vietnamese Aquaculture Fair was introduced to display developments in the aquatic technology and recognize outstanding contributions to the development of Vietnamese aquaculture. Aquaculture is responsible for billions of dollars of the nation’s exports and remains as its most promising and productive industries. (VietNamNet, 2010). However, we feel that Vietnam will soon face difficulties if she continues to focus solely on her fishery products. The EU and US for example, have stepped up regulation and will require all fishery products from Vietnam to have proof of legal exploitation. This means that if Vietnam is unable to meet the more stringent quality, safety and environmental sustainability levels set by EU and US, their aquatic exports will be blocked (Vietnam Business News). Furthermore, Vietnam is also missing out on opportunities in the production of other agricultural products as not all provinces have high specialization (LQ) levels for aquatic products.

Design Concerns

Case study: Manifold: What's lacking in current desktop based GIS

This is the thematic map for income level of Vietnam Provinces, in 2005, presented using manifold.

Vietnam manifold.png

While Manifold is a popular GIS software there are 4 distinct drawbacks of this software that prompted us to design our own application. Firstly, one glaring problem of manifold is the lack of functionality to display time series data. In comparison, in other rich internet applications built in Flash or Protovis, the map can be instantly updated whenever user drags across the time bar. This brings us to the next issue of usability and interactivity. There are so many buttons and dropdowns in Manifold that it can be very confusing for the end user; Manifold is definitely not the kind of software where users can simply operate without a user manual.

Furthermore, Manifold lacks the ability to easily construct a multi dimensional, coordinated view dashboard. While it is certainly possible to open multiple windows in manifold and there is some degree of coordinated view across the windows, the effect is less than ideal, as it is hard for users to lock the windows in place for brushing and highlighting. These features are important for users to quickly explore and understand the key aspects of the data. Finally, one of the biggest issue with Manifold is the lack of web accessibility. In this connected world, it is getting harder to convince users to download a desktop application and wait for the installation to complete before they can finally view the file. Especially with the myriad of alternatives out on the web, powered by lingo like "cloud computing" or "software as a service", people expect that the geospatial application be loaded on their browser instantly. This is a limitation of Manifold, which cannot deploy geospatial applications onto the web.

These factors ultimately influence our design of the application.


Prototype 1

Vietnam prototype1.jpg

The goal of our initial design is to have a simple, time based web application to display the Vietnam data. As you can see, at this stage, we have yet to focus on the specific topic of agriculture, instead we have 6 main categories of data to be displayed on 2 maps. Map 1 on the left display the agriculture data by color clusters. Map 2 on the right will focus on the provinces and users will see provincial level data when they mouse over the province. At the top, there are tabs that show the different key indicators. When users mouse over these indicators, the two maps will update with the relevant data. As user drags the time bar at the bottom, the map will update as well.

Prototype 2

Prototype2-1 details.jpg

In prototype 2, our key focus is to expand the types of visualization tool and allow for more user customization. The Map Display Area on the left now have 4 different thematic map for comparison of determinant’s pattern over time. The maps will auto refresh when user choose different determinant from the drop-down list at the top. At this stage, the team is working on “Education” and “Employment” as the key determinants.

When users click onto a particular province or select a group of provinces on the map, the graphs and data on the right will be filtered to show only the data from the selected province(s). While the Scatter Plot on the right allows for the comparison among multiple determinants.

There is also the Data Grid that function shows the raw data and these data can be exported out for user’s reference.

Prototype 3

Prototype3-1 details.jpg

In prototype 3, The Map Display Area has been simplified to cater better display for user. Also, the team has finally settled for agriculture as our key determine. The Category Tabs now show 3 main kinds of crops:

  • Cash crops: Peanut, Sugar Cane, Soya Bean, Cereals
  • Traditional crops: Maize, Paddy, Sweet Potato, Cassava
  • Aquatic products: Farm Fish, Farm Shrimp, Sea Fish

Prototype 3 also inherits the time bar and the mouse click show data effect from prototype 2.


Data Processing

The data for our application is obtained from the General Statistics Office of Vietnam under Statistical Data: Agriculture, Forestry and Fishery (GSO, 2009). We selected the yield of crops by province, which is organized according to the 63 provinces of Vietnam; 58 provinces and 5 centrally governed cities existing at the same level as provinces (Hanoi, Ho Chi Minh City, Da Nang, Can Tho and Hai Phong). The data is relatively clean, with a selection of data from year 1995 to 2009. We selected a data from the year 1998 to 2008 as certain province have missing data for the years before 1998 and in the year 2009. In order to covert the data into Json format for protovis to visualize, Mr. Data Convertor is used (shancarter, 2011).

Next, we downloaded the Vietnam Shapefile from United Nation’s website (UN, 2011). In order to simplify the Shapefile for faster loading on our web application, we run the Shapefile through manifold’s Normalize Topology function (geogeoreference, 2011). Finally, to convert the Shapefile into GeoJson format, we made use of OGR tools (OGR, 2011).

Protovis libraries used (Protovis, 2011)

  • Array: The built-in Array class.
  • Pv: The top-level Protovis namespace.
  • pv.Anchor: Represents an anchor on a given mark.
  • pv.Area: Represents an area mark: the solid area between two series of connected line segments.
  • pv.Bar: Represents a bar: an axis-aligned rectangle that can be stroked and filled.
  • pv.Behavior: Represents a reusable interaction; applies an interactive behavior to a given mark.
  • pv.Behavior.drag: Implements interactive dragging starting with mousedown events.
  • pv.Behavior.pan: Implements interactive panning starting with mousedown events.
  • pv.Behavior.point: Implements interactive fuzzy pointing, identifying marks that are in close proximity to the mouse cursor.
  • pv.Behavior.resize: Implements interactive resizing of a selection starting with mousedown events.
  • Implements interactive selecting starting with mousedown events.
  • pv.Behavior.zoom: Implements interactive zooming using mousewheel events.
  • pv.Color: Represents an abstract (possibly translucent) color.
  • pv.Geo.LatLng: Represents a pair of geographic coordinates.
  • pv.Geo.Projection: Represents a geographic projection.
  • pv.Geo.scale: Represents a geographic scale; a mapping between latitude-longitude coordinates and screen pixel coordinates.
  • pv.Label: Represents a text label, allowing textual annotation of other marks or arbitrary text within the visualization.

Design Architecture

4geographers Design Architecture.png

After both the data from General Statistics Office of Vietnam and Shapefile from United Nations is processed into JSON format and GeoJSON format respectively (refer to above for details), Protovis, a JavaScript graphical toolkit is then employed to visualize the data. Numerous libraries in Protovis are employed (refer to above for details) to control the data visualization and user interactions through the web client.

Visualization tool: Parallel Plot

Parallel coordinates are especially useful in the exploration of multidimensional data sets. Traditionally, parallel coordinates are used as a static plot for exploring general high-level trends among categorical attributes (Edsall, 2003) and thus the trend line connecting various attributes does not indicate change through time (Few, 2006). Our team, however, decided to challenge this perception by incorporating visualizing techniques such as brushing, highlighting and using time as the dimensions to increase the overall effectiveness of parallel coordinates.

For example, if we brush on the parallel coordinates, we can see that only the provinces within the selected range will be displayed. Furthermore, the states selected are highlighted on the thematic map with a yellow border and users can mouse over each line to see the province they represent (a small black info window will appear) and in the example above, Tra Vinh is shown. As each parallel coordinates dimension is actually a year, we can make deduction regarding the total agricultural trend of the province.

Taking Tra Vinh as an example, we can see from the graph above that total agricultural output fell from 1998 to 2000 but rose from 2000-2006 before falling again from 2006-2008. This trend is relatively unique, as we observe that the other provinces in this selected range have a general downward trend throughout the period of 2000-2008 (except for the province Hau Giang at the bottom, which was not producing any agricultural product from 1998-2002).

Thus, by using parallel coordinates, we allow for comparison between different province where each line has a different trend and character and makes it easier for users to digest large datasets. But the true power of parallel coordinate lies in its combination with the location quotient thematic graph for geospatial analysis.

Spatial Analysis technique: Location Quotient with thematic map

According to EMSI, “Location quotient (LQ) is basically a way of quantifying how concentrated a particular industry, cluster, occupation, or demographic group is in a region as compared to the nation. It can reveal what makes a particular region “unique” in comparison to the national average” (EMSI, 2011). In our application, LQ is used to show if a particular province is specialized in a particular crop. For example, to calculate the LQ for cereals for the province of An Giang in 2008 the formula is:

LQAG2008 = (PAG2008/TC AG2008)/(TC2008/TC2008)

  • PAG = Absolute tonnes of cereals produced at province An Giang in 2008
  • TCi = Total Crop produced at province An Giang in 2008
  • TP = Total absolute tonnes of cereals produced for all provinces in 2008
  • TC = Total absolute tonnes of crops produced for all provinces in 2008

In general (McDaniel, 2011), if

  • LQAG2008 > 1, An Giang is specialized in cereals production in comparison to national average.
  • LQAG2008 = 1, An Giang’s specialization in cereals production is comparable to national average.
  • LQAG2008 < 1, An Giang is less specialized in cereals production in comparison to national average.

The provinces on our thematic map are colored by LQ, where darker green equals greater specialization. (LQ>1) and darker blue equals Less specialization. (LQ<1). White represents a lack of data. We thus see that the LQ of An Giang in 2008 is actually 0.61, which is less than 1 and hence An Giang is less specialized in cereals production in comparison to national average.

Technical challenges

Protovis is a relatively new JavaScript library and there are numerous functionalities still in development stage. As such, there are few examples of dashboards build in Protovis, much less incorporating geospatial techniques. The team thus has to spend time exploring and experimenting with the various libraries, as well as making use of open source jQuery libraries to support the interface.

We also delayed the integration of PySAL libraries for further geospatial analysis, partially due to the rigid structure of our data in JSON format, which makes spatial and complex queries difficult. Also, the integration of PySAL and Protovis was met with difficultly although the team will be interested to further our exploration of PySAL and Protovis integration in the future.

Project Management

Project Schedule

Gantt chart showing timelines and milestones

4Geographers Milestones.png 4Geographers Schedule.png


Geospatial T2 Roles.png


Interim Project Proposal

File:IS415 4Geographers Proposal.pdf


Comments by Prof. Kam The country boundary layer is over-generalised

--Tskam 20:20, 15 March 2011 (SGT)