Difference between revisions of "Group22 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 44: Line 44:
 
The original dataset includes 671,205 observations and 20 variables. We did data preparation to get a more essential dataset before we apply analysis which including deleting missing value, modifying data, extracting relative features and building new features. The final dataset we used to do analytics includes 671,044 observations with 22 variables including basic loan information, geographic data, purpose explain and time distribution.
 
The original dataset includes 671,205 observations and 20 variables. We did data preparation to get a more essential dataset before we apply analysis which including deleting missing value, modifying data, extracting relative features and building new features. The final dataset we used to do analytics includes 671,044 observations with 22 variables including basic loan information, geographic data, purpose explain and time distribution.
 
#Before delete missing value, we converted  “borrower_gender” variable first. The original data is a gender list of the loan applicant. So we count the frequency of each list.  
 
#Before delete missing value, we converted  “borrower_gender” variable first. The original data is a gender list of the loan applicant. So we count the frequency of each list.  
*when female sum and length is 1, then conver to single_female
+
&nbsp;when female sum and length is 1, then conver to single_female<br/>
*When female sum and length females_count is the same and different to zero, then mult_females
+
&nbsp;When female sum and length females_count is the same and different to zero, then mult_females<br/>
*when female sum is zero and the count of individuals is 1, then “single_male
+
&nbsp;when female sum is zero and the count of individuals is 1, then “single_male<br/>
*"mult_males" when female sum is zero and the count of individuals is greater than 1
+
&nbsp;"mult_males" when female sum is zero and the count of individuals is greater than 1<br/>
*"mixed_genders" when the sum of females is different to the count of individual  
+
&nbsp;"mixed_genders" when the sum of females is different to the count of individual <br/> 
 
#Translate all the money amount to USD currency. Original data use the local currency for each loan application which is not useful for our analysis.
 
#Translate all the money amount to USD currency. Original data use the local currency for each loan application which is not useful for our analysis.
 
#Create a new column named Continent.
 
#Create a new column named Continent.

Revision as of 10:14, 13 August 2018

Charity.jpg SMALL LOAN, BIG DIFFERENCE

Proposal

Application

Poster

Report

 


Motivation of the Application

Poverty is one of the biggest long-term issues confronting the world today. Millions of people struggle to maintain the most basic standard of living for themselves and their families, and face a daunting uncertainty in many parts in their lives. Microfinance is one strategy and it attempts to address the issue of poverty by providing small-scale loans to the poor so that they can either start or expand a small business to improve their income, thus helping to bring them out of poverty.

Kiva is an international non-profit, founded in 2005 and based in San Francisco, with a mission to connect people through lending to alleviate poverty. By lending as little as $25 on Kiva, anyone can help a borrower start or grow a business, go to school, access clean energy or realize their potential. For some, it’s a matter of survival, for others it’s the fuel for a life-long ambition. And Kiva lenders have provided over $1 billion dollars in loans to over 2 million people in 87 countries.

That's the reason why we choose to build a Rshiny application to do visualization about Kiva data. We would like to know the level of poverty of each borrower in order to set investment priorities, help inform lenders, and understand Kiva’s target communities.

  1. Explore loan difference among the world and detect the different patterns and poverty causes among different countries.
  2. Understand how online platform effect people’s real life.
  3. Detect the development potentials of Kiva, and help to provide the world with a better future with less impoverished people.

Review and Critic on Past Works

Design Framework

Data Preparation

The original dataset includes 671,205 observations and 20 variables. We did data preparation to get a more essential dataset before we apply analysis which including deleting missing value, modifying data, extracting relative features and building new features. The final dataset we used to do analytics includes 671,044 observations with 22 variables including basic loan information, geographic data, purpose explain and time distribution.

  1. Before delete missing value, we converted “borrower_gender” variable first. The original data is a gender list of the loan applicant. So we count the frequency of each list.
 when female sum and length is 1, then conver to single_female
 When female sum and length females_count is the same and different to zero, then mult_females
 when female sum is zero and the count of individuals is 1, then “single_male
 "mult_males" when female sum is zero and the count of individuals is greater than 1
 "mixed_genders" when the sum of females is different to the count of individual
  1. Translate all the money amount to USD currency. Original data use the local currency for each loan application which is not useful for our analysis.
  2. Create a new column named Continent.
  3. Calculate the time. 'total_time' means how much time passes from the moment the loan is posted to the moment it's disbursed; and 'giving_time' means how long does a loan take to get funded.
  4. Calculate the time difference between loan amount and funded amount.

Demonstration

Discussion

Future Work

Installation Guide

User Guide

The Shiny application has 6 tabs:

  1. First tab shows the work flow and the design logic of the shiny application.
  2. Second tab 'HeatMap_Cluster' can be used to cluster different country.
  3. Third tab 'Map' is used to overview what sectors the country care more. In addition, the size of pie chart indicates the amount of loan. You can focus on special country based on the result of cluster.
  4. Fourth tab 'TreeMap_Global' can overview how many loan records are recorded.
  5. Fifth tab 'TreeMap_Compare' is used to compare the purpose of loan of two different countries. You can click the tree map and go deeper.
  6. Sixth tab 'Time_Series_Line' is used to explore the change of loan amount and number of records over three years.