Kiva Project Findings Final
Interim | Final |
---|
Data Cleaning
Missing Value
Figure 1: Screenshot of loan_themes_by_region.csv
The screenshot above of loan_themes_by_region.csv shows a snippet of the old geocode of the Kiva regions having many missing values (14536 out of 15736 records). As there is far too many missing records for this column to derive any meaningful information regarding shifts in location regions for particular loan themes, we removed this column entirely.
Figure 2: Screenshot of kiva_mpi_region_locations
The table above shows the erroneous records of kiva_mpi_region_locations, where there is no location name (which is the main identifier/primary key for this table) and missing values of all other columns except the geocode, which only consists of (1000.0,1000.0) values and are not actual geocodes. Hence, all of these 1788 rows which had no useful information were removed.
Redundant data
There are were some data which we removed as they were either repeated information from other columns or were erroneous. In the file kiva_mpi_region_locations.csv, there was a column geo which was an addition of both latitude and longitude. As the individual columns were more useful for our analysis, we hence removed the geo column.
Figure 3: Screenshot of kiva_mpi_region_locations.csv with geo column having duplicate information
Lastly, we removed a single invalid record from kiva_loans.csv, where the funded time was after the posted time, which should never be the case as the loan should not be funded even before it was posted.