Kiva Project Findings Final

From Analytics Practicum
Revision as of 15:14, 15 April 2018 by Qian.zhang.2014 (talk | contribs)
Jump to navigation Jump to search


 

Home

 

Project Overview

Project Findings

 

Project Management

 

Documentation

 

About Us

 

ANLY482 Main Page


Interim Final


Data Cleaning

Missing Value

G22F1.png
Figure 1: Screenshot of loan_themes_by_region.csv

The screenshot above of loan_themes_by_region.csv shows a snippet of the old geocode of the Kiva regions having many missing values (14536 out of 15736 records). As there is far too many missing records for this column to derive any meaningful information regarding shifts in location regions for particular loan themes, we removed this column entirely.

G22F2.png
Figure 2: Screenshot of kiva_mpi_region_locations

The table above shows the erroneous records of kiva_mpi_region_locations, where there is no location name (which is the main identifier/primary key for this table) and missing values of all other columns except the geocode, which only consists of (1000.0,1000.0) values and are not actual geocodes. Hence, all of these 1788 rows which had no useful information were removed.

Redundant data

There are were some data which we removed as they were either repeated information from other columns or were erroneous. In the file kiva_mpi_region_locations.csv, there was a column geo which was an addition of both latitude and longitude. As the individual columns were more useful for our analysis, we hence removed the geo column.

F3.png
Figure 3: Screenshot of kiva_mpi_region_locations.csv with geo column having duplicate information

Lastly, we removed a single invalid record from kiva_loans.csv, where the funded time was after the posted time, which should never be the case as the loan should not be funded even before it was posted.