Kiva Project Findings Final

From Analytics Practicum
Jump to navigation Jump to search


 

Home

 

Project Overview

Project Findings

 

Project Management

 

Documentation

 

About Us

 

ANLY482 Main Page


Interim Final


Area of study

The bulk of its loans, in terms of both amount and quantity, are funded in the Philippines, thus being the country of focus in our analysis. The Republic of Philippines is made up of 7107 tropical islands, having a total square area of 300,000km2 and being 65% mountainous (Net Industries, 2018). Although the country’s official languages are Filipino (Tagalog) and English, it is a country that has diverse regional cultures, with three languages serving as regional lingua francas: Ilokano in Northern and Central Luzon, Tagalog in Central and Southern Luzon, and Cebuano in the Visayas and Mindanao.

Philippines are divided into three major island groups: Luzon, Visayas, and Mindanao.

Luzon, the most populous and largest island in the Philippines, home to the country’s capital and major metropolis Manila. It leads the country in agriculture and industrial manufacturing, and more than half of the Filipino population lives on Luzon (Britannica). Luzon also consists of Palawan Island, a large island southwest of Manila.

Visayas is an island group located in the centre of Philippines. It consists of seven large islands and several hundred smaller ones, and the region is famous for agriculture and fishing (Britannica).

Mindanao is the second largest main island after Luzon. The island has narrow coastal plains, with broad, fertile basins and extensive swamps(Britannica). Mindanao has the strongest Muslim presence in Philippines amongst the three islands, whose dominant religion is Roman Catholic, and is home to most of the ethnic minorities. Agriculture is a key industry like other islands, while its textile and timber industries are also important due to deposits of raw materials.

With vast ethnic, cultural, economic and religious differences between various provinces, geospatial analysis is used to identify how do Kiva’s loan attributes differ across the Philippines.


Analysis

Kernel Density Analysis

Methodology

Kernel density function is a non-parametric method of estimating the probability density function (PDF) of a continuous random variable, and is non-parametric as the underlying distribution for the variable is not assumed. Each sample point will have its own weight function which represents its influence of the density values in the surrounding neighbourhood, and each ‘bump” is centred at the datum and spreads out symmetrically to cover the neighbouring values. The size of the “bump” represents the probability assigned at the neighbourhood of values around that datum, and the estimated model is the summation of all the kernel function “bumps”.

G22 KDE formula.png

The Gaussian Kernel function, represent by k(u), follows a normal distribution curve to represent the intensity of different points. The density plot for the Gaussian function for Philippines is plotted below.

G22 F F1.png
Figure 1: Gaussian Kernel Density plot for Philippines


Findings

The kernel density plot for Philippines shows that the main signs of intensity lie in Visayas. However, there are many more loan records for Visayas compared to Luzon or Mindanao, thus the plot for the entire country as a whole is not meaningful. Given the different levels of urban development and transportation within each main island, further analysis and segregation is done based on each main island level and across the 3 sectors with the largest number of loan records, namely Agriculture, Food and Retail.

G22 F F2.png
Figure 2: Gaussian Kernel Density plot for Luzon Island

The graphs reported in Figure 2 for Luzon displays that areas of high intensity lie in Palawan Island. As this island is still largely underdeveloped, especially in comparison to Luzon’s main island, Kiva’s loans are highly attractive to many entrepreneurs there. The intensity of all three sectors, Agriculture, Food and Retail are high, meaning that this area is developing in all three sectors. However, in Metropolitan Manila, there are slight signs of intensity for Food, and high levels of intensity for Retail, likely to be linked to the strong growth in the apparel retail industry in the Philippines, with a compound annual growth rate (CAGR) of 10.8% between 2011 and 2015.

G22 F F3.png
Figure 3: Gaussian Kernel Density plot for Visayas

Visayas represents a more typical case where different regions specialise in different industries and sectors. Most areas of intensity are common between the 3 sectors. However, the Concepcion Islands as well as Bantayan Island (middle) have high intensity for Food and Retail but not Agriculture. The loans in these areas are meant for the tourism industry, with many resorts and beaches there. In contrast, there is strong Agriculture but weak Food and Retail presence in Tanjay city (middle bottom).

G22 F F4.png
Figure 4: Gaussian Kernel Density plot for Mindanao

All 3 sectors are prominent in Oroquieta City (middle area, upper) while Ozamis City has high intensity of Agriculture and Retail. Pagadian city also shows presence of Agriculture.

The kernel density plots above highlights that spatial clusters exist for different sectors in different regions for each main island. We then wish to investigate the existence of spatial clusters due to spatial dependence and autocorrelation in greater detail, using an area-based spatial autocorrelation analysis. Since Visayas is determined to have the most prominent existence of spatial clusters, with varying differences across different parts of the island across different sectors, further in-depth analysis is conducted solely on this island to gain more valuable insights.


Spatial Autocorrelation Analysis

Methodology

Spatial autocorrelation measures a correlation of a variable with itself through space. In Kiva’s context, a positive spatial autocorrelation indicates that similar borrowing patterns appear in neighborhood regions while negative spatial autocorrelation suggests that dissimilar borrowing patterns appear in neighborhood regions. In order to find out whether nearby geo-locations demonstrate similar loaning patterns, a spatial autocorrelation analysis is adopted to study the geographical distribution of the number of loans for different sectors. (Celebioglu, F. and Dall'erba, S., 2010)

First, we need to define neighborhood, the area surrounding target locations. During the analysis, the number of loans for one location will be compared with its neighbors so that a statistical conclusion about whether neighborhood area are correlated with each other can be drawn. Therefore, the criteria for selecting neighbors is of great importance and will largely affect the analysis results.

Spatial weight measures the intensity of relationships between different spatial units. In this study, spatial weight matrix will be applied to establish neighborhood structure and choosing appropriate weight matrix will be the main focus of this study. There are two basic approaches to construct a spatial weight matrix: contiguity based weight matrix and distance based weight matrix.

Contiguity Based Method (Queen)

A prerequisite for adjacency based weight matrix is that two areas can be considered as neighbours only if they share a common border. In the queen method, sharing only one boundary point will be considered neighbours as well. The queen contiguity weights are defined in the following formula. A weightage of 1 in the matrix means the nearby location neighbour with the target location while 0 means that nearby location is not an adjacent neighbour. (Andy Mitchell, 2005)

G22 Q formula.png

Distance Based Method

Another way to calculate the spatial weight is based on the actual distance between two centroids of geographical area.

1. K Nearest Neighbor Method K is a user defined variable. The neighbours for a specific spatial unit are selected based on the distance and K. The KNN weights are calculated based on the following formula. Nk(i)represents a list of K nearest neighbor for spatial unit i. If spatial unit j falls under this list, a weightage of 1 is assigned to it. Otherwise, 0 is assigned.

G22 K formula.png

2. Inverse Distance Weighting Based Method Inverse Distance Weighting (IDW) makes the assumption that the influence of a spatial unit on other area will decrease as the distance increase. Thus, nearer neighbor will be assigned heavier weight. The inverse distance weight is calculated based on the following formula. dij denotes the distance between geolocation i and j. In this study, we assume 𝛼 equals to 1. (Smith, T, n.d.)

G22 IDW formula.png

Row Standardization

After establishment of the neighborhood structure, we will standardize the weight matrix by assigning a proportional weight to each neighbor location based on the total number of neighbors for that target location. The row standardization will be applied to minimize the influence of unequal number of the neighbors. However, the row standardization will not be applied to inverse distance weight matrix. If the neighbor units are very close to or even at the same location as the target spatial unit, row standardization will assign an dominant weight to the neighbors regardless of those relatively distant object, which will distort the result. (Andy Mitchell, 2005)

Local Indicator For Spatial Association (LISA)

Local Indicator for spatial association (LISA) is a kind of statistics which measures the extent to which similar values are clustered together. (Luc Amelin, n.d.) Local Moran I is used as a LISA statistics in this study to determine the significant level of clustering results and the formula used is as shown below. wij denotes the weight between geolocation i and j. xi and xj denotes the target value in region i and j. X bar represents the mean value across different areas.

G22 Lisa formula.png

A large positive value for Ii indicates that the similar values appeared in the neighborhood region while the negative value indicates that dissimilar values appeared in the adjacent regions. (Andy Mitchell, 2005)


Findings

For Visayas, the loans are much more scattered across the island, and the areas of high concentration are likewise located in many different areas.

When using contiguity to define the neighbourhood, there are 20 cities/municipalities deemed to have no neighbours as they share boundaries with other cities that have no loan records with Kiva. However, there are nearby cities that are not immediately adjacent to these cities and hence are not defined as neighbours, but they could have influence due to their relatively close proximity.

Based on the Queen contingency matrix, the average number of neighbours for a city is 3.03. Thus, k, which represents the number of nearest neighbours of each other, is set to 3 when determining the indices of points for the matrix.

G22 F F6.png
Figure 6: Queen, K Nearest Neighbour, Inverse Distance Weighting based Neighbours, Visayas

Although most of the islands in Visayas are not adjacent to each other, the transportation by boat across water is well developed in Visayas, thus it is likely that the nearby cities could strong influence to each other even if they were not directly connected and bounded by land. In this case, Queen’s case would not sufficiently address the neighbourhood relationship in Visayas due to the inability to detect the relationship between the areas separated by water, unlike KNN and IDW.

When examining the k nearest neighbour plot, areas of high concentration are linked, and the distances between links are not far even for those across water. This suggests that the KNN method is able to identify neighbours effectively if there is an even spread of areas with high concentration across smaller distances. The inverse distance weighted method has a similar linkage of areas to KNN, except that it does not encompass the top right corner unlike the KNN.

G22 F F7.png
Figure 7: Local Moran I Values, Local Moran’s I p-values, LISA Cluster Map, based on Queen Method in Visayas
G22 F F8.png
Figure 8: Local Moran I Values, Local Moran’s I p-values, LISA Cluster Map, based on KNN Method in Visayas
G22 F F9.png
Figure 9: Local Moran I Values, Local Moran’s I p-values, LISA Cluster Map, based on IDW Method in Visayas

As shown in Figures 7 to 9 above, the regions highlighted in green represent areas of higher Moran I statistic values , thus indicating signs of clustering spreaded across different islands within Visayas. The p-values were then determined to identify areas of significance, denoted by areas shaded in blue. Based on the p-value statistic, insignificant areas were removed, leaving only the remaining areas is reflected in the LISA cluster map.

Based on the points that are significant, the autocorrelation is then interpreted based on the p-value of the current location and the surrounding locations. The autocorrelation is positive if the location has a high value and is associated with relatively high values of the surrounding locations (High-High), or if the location has a low value and is associated with relatively low values of the surrounding locations (Low-Low). Vice versa, the autocorrelation is negative if the location having a high value is being surrounded by low neighbouring values (High-Low), or if the location having a low value is surrounded by high neighbouring values.

All the 3 methods show that the bottom left area consists of a high-high region, although IDW indicates it to be both a high-high and high-low. Given the strength of the concentration and the relatively near distances between the regions, a high-high autocorrelation would be a more accurate representation of that area. The low-high representation for IDW surrounding the bottom left high-high area (Negros) is rather inaccurate as there are no regions with particularly high concentrations and being surrounded by regions of lower concentrations. Thus, KNN is the most accurate method for analysing Visayas.

G22 F F10 1.pngG22 F F10 2.pngG22 F F10 3.png

Figure 10: Loan distribution for different sectors in Visayas

Breaking down the loan distribution by the top 3 sectors, Food, Retail and Agriculture, we noticed that Food and Retail have similar areas of concentration, as proven by the kernel density analysis above. All 3 sectors have many records across the island of Negros (second from left, middle to bottom area). However, records for Retail are highest in Tablacan City, which is also fairly high for Food, whose highest concentration is Concepcion near tourist islands of Tago Island and Quiniluban Island, whereas Agriculture has highest concentration within Bais City south of Negros.

G22 F F11 1.pngG22 F F11 2.pngG22 F F11 3.png
Figure 11: LISA Cluster Maps using KNN Neighbourhood for different sectors in Visayas

As KNN was determined to be the best among the three methods for defining the neighbourhood, we then derived the LISA cluster maps based on the Local Moran I statistics and p-values for KNN.

On Negros Island, HH-type autocorrelation for Food and Retail begin from the central region of Isabela, and for Food it ends at Bantayan (the sharp-edged region) and covering Tanjay City, whereas for Retail it covers a slightly larger area stretching downwards to Sibulan. Agriculture on the other hand, has a small area of autocorrelation at San Carlos City, and a larger cluster at the southern part of the island, starting from Kabankalan City and going southwards all the way to Basay, and ending at Sibulan as well.

It is perhaps most notable that all 3 sectors begin at Bacolod, an important city in Negros Occidental that provides inter-island services to Iloilo and will frequently be passed through by travellers moving between islands (Roxas, 2016), thus highlighting its strategic importance across all the sectors. In Eastern Visayas, the lack of spatial autocorrelation in loans could be attributed to the region already being well-established in Agriculture, particularly its crop production for palay (cereal grain) and coconut (CountrySTAT Philippines, 2018). Both crops constitute the highest production levels for both regions, but Eastern Visayas has far greater numbers than Central Visayas, 955709 to 269801 for palay and 1165867 to 274069 for coconut (CountrySTAT Philippines, 2018).

For the Food and Retail sectors, both show signs of spatial autocorrelation from Catbalogan City and Tacloban City, while Food also includes cities like Calbayog City, the regional center of Eastern Visayas, as well as the underdeveloped Maasin City. The wider coverage for Food is satisfied by the wide range of suppliers which promotes stability and growth for this sector, as well as high accessibility of distribution channels (Marketline, 2016), especially for the well-connected island of Visayas with many major shipping routes available internally.


Conclusion

Exploratory cluster analysis using kernel density was conducted to determine the areas of spatial clusters throughout the 3 main islands in the Philippines. With significant differences found within the main island of Visayas, we investigated this complex region of islands further using area-based analysis by means of Moran’s I using contiguity-based and distance-based weight matrices for defining the neighbourhood, namely Queen’s case, K-Nearest Neighbours and Inverse Distance Weighting.

KNN performed best in explaining the significant and larger areas of spatial autocorrelation in the regions of Western and Central Visayas, reflecting the well-spread and large presence of Kiva loans in this area, and indicating that the geographical location of provinces within the central region of Negros Island influences the amount of loans taken. LISA statistics confirm the significant presence of local spatial autocorrelation for the southern part of Negros which overlaps both Western and Central Visayas.

We then further analysed the breakdown by the top three sectors for Kiva’s loans, namely Agriculture, Food and Retail, and found that in Eastern Visayas, there is completely no presence of spatial autocorrelation for Agriculture, likely to due to the maturity and much higher production levels of the sector there, while Food and Retail are present in Catbalogan City and Tacloban City, due to the continued economic development of these two important cities within Eastern Visayas.

Overall, our results confirm the presence of spatial autocorrelation across all the major sectors for Kiva’s loans.

Reference

1. Net Industries. (n.d.). Philippines - History Background. Retrieved April 01, 2018, from http://education.stateuniversity.com/pages/1197/Philippines-HISTORY-BACKGROUND.html http://histclo.com/country/oce/phl/phl-reg.html

2. Celebioglu, F. and Dall'erba, S (2010) "Spatial disparities across the regions of Turkey: an exploratory spatial data analysis", Annals of Regional Science, Vol. 45, No. 2, p. 379-400

3. Yu, D and Wei (2008) "Spatial data analysis of regional development in Greater Beijing, China, in a GIS environment", Papers in Regional Science, Vol 87, No. 1, pp 97-117

4. Elias, M and Rey, S.J. (2011) "Educational Performance and Spatial Convegence in Peru

5. Andy Mitchell,(2005), The ESRI Guide to GIS Analysis: Volume 2: Spatial Measurements & Statistics,”, p. 136-145 http://region-developpement.univ-tln.fr/fr/pdf/R33/Elias.pdf

6. Luc Amelin,(n.d.). Local Indicators of Spatial Association-LISA, Retrieved from https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1538-4632.1995.tb00338.x

7. Smith, T. (n.d.). Spatial Weight Matrices. Retrieved April 08, 2018, from: https://www.seas.upenn.edu/~ese502/lab-content/extra_materials/SPATIAL%20WEIGHT%20MATRICES.pdf

8. Britannica. (2016, October 03). Mindanao. Retrieved April 08, 2018, from https://www.britannica.com/place/Mindanao

9. Britannica. (2016, October 03). Visayan Islands. Retrieved April 08, 2018, from https://www.britannica.com/place/Visayan-Islands

10. Roxas, N.R. & Fillone, A.M. Transportation (2016) 43: 661. https://doi-org.libproxy.smu.edu.sg/10.1007/s11116-015-9611-4

11. Philippine Statistics Authority: CountrySTAT Philippines, 2018. Retrieved on 15 April 2018 from http://countrystat.psa.gov.ph/