Difference between revisions of "Group14 proposal"

From ISSS608-Visual Analytics and Applications
Jump to navigation Jump to search
Line 16: Line 16:
  
 
== Data Description ==
 
== Data Description ==
In this dataset, each row represents a customer, each column contains customer’s attributes described on the column Metadata. The raw data contains 7043 rows (customers) and 21 columns (features). The “Churn” column is our target, which represents whether customers who left within the last month.  
+
We collect the dataset from IBM Community. This dataset contains five spreadsheets.
 +
They contain the information about the demographics, location, population, services and status about customers.
 +
Demographic is the information about customers’ gender, age range, and if they have partners and dependents.
 +
Location is the information about customers’ detail location such as country, city.
 +
Status is the information about customers’ status of churn and the reason about churn.
 +
There are 7043 entity instances in the dataset.
 +
Each customer is identified by Customer_ID column.  
 +
There are 42 columns with 40 attributes.  
 +
Customers who left within the last month is the column named Churn_Value. The churn customers are recorded as 1 and the non-churn customers are recorded as 0.
  
 
{| class="wikitable" style="width: 100%; height: 14em;"
 
{| class="wikitable" style="width: 100%; height: 14em;"
Line 24: Line 32:
 
| Customer ID  || Customer ID  || 7590-VHVEG  || Numeric
 
| Customer ID  || Customer ID  || 7590-VHVEG  || Numeric
 
|-
 
|-
| gender  || Whether the customer is a male or a female  || 1 || Binary
+
| gender  || Whether the customer is a male or a female  || Female || Binary
 
|-
 
|-
 
| SeniorCitizen  || Whether the customer is a senior citizen or not (1, 0)  || 0 || Binary
 
| SeniorCitizen  || Whether the customer is a senior citizen or not (1, 0)  || 0 || Binary
 
|-
 
|-
| Dependents || Whether the customer has dependents or not (Yes, No) || No|| Binary
+
| Partner  || Whether the customer has a partner or not (Yes, No)   || Yes || Binary
 
|-
 
|-
 +
| tenure || Number of months the customer has stayed with the company|| 1|| Numeric
 +
|-
 +
| PhoneService || Whether the customer has multiple lines or not (Yes, No, No phone service)  || No phone service || Categorical
 +
|-
 +
| MultipleLines  || Customer ID  || 7590-VHVEG  || Numeric
 +
|-
 +
| InternetService  || Customer’s internet service provider (DSL, Fiber optic, No)  || DSL  || Categorical
 +
|-
 +
| OnlineSecurity || Whether the customer has online security or not (Yes, No, No internet service)  || No || Categorical
 +
|-
 +
| OnlineBackup || Whether the customer has online backup or not (Yes, No, No internet service)  || No  || Categorical
 +
|-
 +
| DeviceProtection  ||Whether the customer has device protection or not (Yes, No, No internet service)  || No  || Categorical
 +
|-
 +
| TechSupport  || Whether the customer has tech support or not (Yes, No, No internet service)  || No  || Categorical
 +
|-
 +
| StreamingTV  || Whether the customer has streaming TV or not (Yes, No, No internet service)  || No  || Categorical
 +
|-
 +
| StreamingMovies || Whether the customer has streaming movies or not (Yes, No, No internet service) || No  || Categorical
 +
|-
 +
| Contract  || The contract term of the customer (Month-to-month, One year, Two year)  || Month-to-month  || Categorical
 +
|-
 +
| PaperlessBilling || Whether the customer has paperless billing or not (Yes, No)  ||Yes || Binary
 +
|-
 +
| aymentMethod || The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))  || Electronic check || Categorical
 +
|-
 +
| MonthlyCharges  || The amount charged to the customer monthly || 29.85  || Numeric
 +
|-
 +
| TotalCharges  || The total amount charged to the customer || 29.85  || Numeric
 +
|-
 +
| Churn  || Whether the customer churned or not (Yes or No)  || No  || Binary
  
 
== <big>Methodology and Approach</big> ==
 
== <big>Methodology and Approach</big> ==

Revision as of 10:42, 2 March 2020

Home - PicSource: https://medium.com/@timenalls/how-to-predict-customer-churn-with-pyspark-fb0d30f55253


Motivation and Objectives


Critique of Existing Visualization


Data Source


Data Description

We collect the dataset from IBM Community. This dataset contains five spreadsheets. They contain the information about the demographics, location, population, services and status about customers. Demographic is the information about customers’ gender, age range, and if they have partners and dependents. Location is the information about customers’ detail location such as country, city. Status is the information about customers’ status of churn and the reason about churn. There are 7043 entity instances in the dataset. Each customer is identified by Customer_ID column. There are 42 columns with 40 attributes. Customers who left within the last month is the column named Churn_Value. The churn customers are recorded as 1 and the non-churn customers are recorded as 0.

Data Fields Description Example Datatype
Customer ID Customer ID 7590-VHVEG Numeric
gender Whether the customer is a male or a female Female Binary
SeniorCitizen Whether the customer is a senior citizen or not (1, 0) 0 Binary
Partner Whether the customer has a partner or not (Yes, No) Yes Binary
tenure Number of months the customer has stayed with the company 1 Numeric
PhoneService Whether the customer has multiple lines or not (Yes, No, No phone service) No phone service Categorical
MultipleLines Customer ID 7590-VHVEG Numeric
InternetService Customer’s internet service provider (DSL, Fiber optic, No) DSL Categorical
OnlineSecurity Whether the customer has online security or not (Yes, No, No internet service) No Categorical
OnlineBackup Whether the customer has online backup or not (Yes, No, No internet service) No Categorical
DeviceProtection Whether the customer has device protection or not (Yes, No, No internet service) No Categorical
TechSupport Whether the customer has tech support or not (Yes, No, No internet service) No Categorical
StreamingTV Whether the customer has streaming TV or not (Yes, No, No internet service) No Categorical
StreamingMovies Whether the customer has streaming movies or not (Yes, No, No internet service) No Categorical
Contract The contract term of the customer (Month-to-month, One year, Two year) Month-to-month Categorical
PaperlessBilling Whether the customer has paperless billing or not (Yes, No) Yes Binary
aymentMethod The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic)) Electronic check Categorical
MonthlyCharges The amount charged to the customer monthly 29.85 Numeric
TotalCharges The total amount charged to the customer 29.85 Numeric
Churn Whether the customer churned or not (Yes or No) No Binary

Methodology and Approach


Proposed R Packages


Team Members


References