Difference between revisions of "Red Dot Payment - Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 33: Line 33:
 
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 
|-  
 
|-  
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #2e2e2e" width="150px"| [[Red Dot Payment_-_Project Overview| <span style="color:#3d3d3d">Background</span>]]
+
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #2e2e2e" width="150px"| [[R2Z_-_Project Overview| <span style="color:#3d3d3d">Background</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
  
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[Red Dot Payment_Data Source| <span style="color:#3d3d3d">Data Source</span>]]
+
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[R2Z_Data Source| <span style="color:#3d3d3d">Data Source</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
  
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[Red Dot Payment_Methodology| <span style="color:#3d3d3d">Methodology</span>]]
+
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[R2Z_Methodology| <span style="color:#3d3d3d">Methodology</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
|}
 
|}
 
<!------- Details ---->
 
<!------- Details ---->
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Sample Data</strong></font></div></div>==
+
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Introduction</strong></font></div></div>==
'''''*Due to sensitivity of data, sample data will not be shown here. Please refer to the submitted reports for more details.'''''
+
For any entity to conduct business online, they must be able to accept payments from customers, and this involves a third-party to facilitate the online transfer of funds. Such entities are called electronic Payment Gateways , and this includes PayPal, Stripe, and our project sponsor.  
<br>
+
 
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Metadata</strong></font></div></div>==
+
When a customer makes a purchase on a merchant’s website, our sponsor helps to process the credit card payment. This is done by transferring key information between the payment portal (e.g. merchant’s website) and the merchant’s registered bank account.  For each successful transaction that our sponsor processes, they apply a commission, called a Merchant Discount Rate (MDR) , from the transaction.  
Previously, we only have the transaction data of 2017 provided by RDP.  
+
 
Moving on, we managed to obtain transaction data of 2016 as well.
+
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Business Motivations and Objectives</strong></font></div></div>==
<br><br>
+
While our sponsor handles millions of transactions across the globe, the company has not fully been able to derive any conclusive analysis from its transaction data.
Below are the definitions of common terms used in their business model:
+
 
<br><br>
+
By providing our team with their data, our sponsor hopes to gather a deeper understanding of it. The project objectives include developing meaningful insights by performing exploratory data analysis. Using the results drawn from the findings, recommendations will be formulated to aid in future business decision making.
'''''MERCHANT''''' - A merchant is a business, often a retailer, that operates online. Each merchant has appointed RDP to be its payment processor to handle online transactions between them and their customers.  
+
 
<br><br>
+
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Meetings with Project Coordinator, Project Sponsor and Team</strong></font></div></div>==
''<b>CUSTOMER</b>'' - A customer is an entity that makes a transaction with the merchant. For a transaction to be made, a customer must make contact with a merchant and provide their payment details (e.g. Credit card number) to the merchant through RDP’s payment processing gateway.  
+
Our team aims to meet our Project Coordinator, Professor Kam Tin Seong on average of at least once every week in order to ensure that we are progressing on the right track. We also met our project sponsor once a month to update them on our progress and validate our analysis, while our internal team meetings are held at least twice a week. All minutes can be found under "Project Documentation".
<br><br>
+
 
The following table shows the list of variables in the data and their associated description:
+
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Project Objectives</strong></font></div></div>==
<br><br>
+
We aim to study the complexity of the online transactions carried out by our sponsor’s merchants. In particular, we hope to assess the performance of our sponsor’s merchants, by using number of approved transactions per merchant as a benchmark.
 +
 
 +
This project shows how we use funnel plots and line of best-fit graphs to compare the performance among entities in a group in an unbiased manner. Though both funnel plots and line of best-fit graphs are prevalent in medical research , they are under-utilized in e-commerce analytics. In our case study, the line of best-fit graphs provide a better model fit. By combining features of funnel plot with line of best-fit, we can identify star and laggard merchants.
 +
 
 +
Moving on, we want to study the characteristics of transactions carried out by key merchants. We use logistic fit to establish any correlation between independent variables and the number of approved transactions. For independent variables that exhibit correlation, we conduct further analysis by utilizing decision trees to map expected approved-to-rejected transaction ratio for key merchants.
 +
 
 +
In summary, we show how exploratory and confirmatory techniques can be used as source of business intelligence - setting performance benchmarks for each merchant and improving their approved-to-rejected transaction ratio.
 +
 
 +
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Scope of Project</strong></font></div></div>==
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! NAME !! TYPE !! DESCRIPTION
+
! Task !! Description 
|-
 
| date_created || Date Time || Data and time at which the transaction was carried out between merchant and customer
 
 
|-
 
|-
| period || Integer || Month of the year the transaction was carried out between merchant and customer
+
| Gather Requirements || Confirm and gather sponsor requirements
 
|-
 
|-
| hour || Integer || Hour in a day the transaction was carried out between merchant and customer
+
| Initial Research and Preparation || Conduct preliminary data exploration and define project objectives and scope
 
|-
 
|-
| day || Integer || Day of the week the transaction was carried out between merchant and customer
+
| Project Proposal || Prepare project proposal and Wikipage
 
|-
 
|-
| time || Time || Time at which the transaction was carried out between merchant and customer
+
| Data Exploration and Preparation || Ensuring that data is clean and can be analysed using analytical software; We have done data preparation, which include – Interactive Binning and finding the top reason code descriptions. After which, we conducted exploratory data analysis.
 
|-
 
|-
| name || Alphanumeric || Name of merchant
+
| Model Building || Through research findings and experience, we will attach suitable models to our data. We have used Interactive Binning, Line of Fit and Time Series Analysis to generate insights.
|-
 
| amount of money || Decimal || Total transaction amount - unstandardised currency
 
|-
 
| currency || CHAR(3) || Currency of the transaction amount - usually in SGD or USD
 
|-
 
| converted value || Decimal || Total transaction amount - standardised in SGD
 
|-
 
| reason_code || Alphanumeric || A unique code tagged to each reason for each approved or rejected transaction made by customer
 
|-
 
| reason_code_description || Alphanumeric || Description of the reason for each approved or rejected transaction made by customer
 
|-
 
| card_data1 || Integer || First 6 digits of customer’s card number - reveals details about the issuing bank and card type
 
|-
 
| card_data2 || Integer || Last 4 digits of customer’s card number
 
|-
 
| contact_ip_address || Alphanumeric||  IP address of the customer
 
|-
 
| contact_ip_country || Alphanumeric || IP country of the customer
 
|-
 
| Bin [based on log10] || Integer || Bin number each merchant belongs to
 
|}
 
<br>
 
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Data Cleaning</strong></font></div></div>==
 
For data cleaning, we carried out the following steps:<br><br>
 
'''<big>General Cleaning</big>'''
 
<br><br>
 
1. Compiled 2016-2017 data <br>
 
2. Standardised all merchants’ names and currency into uppercase: e.g. Realised that JMP Pro recognises characters with uppercase and lowercase separately<br>
 
3. Standardised currency of all transaction values: Converted all non-SGD values into SGD values, based on average monthly historical exchange rates found on OANDA.com<br>
 
4. Split ‘date_created’ column into ‘date’ and ‘time’ columns.<br>
 
5. Created a new column ‘period’ to indicate the respective month of each date in the  ‘date’ column (e.g. 01/01/16 → Period 1; 01/01/17 → Period 12)<br>
 
6. Created a new column ‘hour’ to indicate respective hour of each time in the ‘time’ column (e.g. 1:00:00 AM → 1; 1:00:00 PM → 13)<br>
 
7. Created a new column ‘day’ to indicate respective day in the week for each data in the ‘date’ column (e.g. Monday is assigned a value of 1, Sunday is assigned a value of 7)<br>
 
8. Removed deactivated or terminated merchants
 
  - 2016: 41 transactions removed;
 
  - 2017: 841 transactions removed<br>
 
9. Removed rows with ambiguous or illegible characters/ negative transaction value on JMP Pro:
 
  - Removal of 1563 rows with ambiguous or illegible characters;
 
  - Removal of 2 rows with negative transaction value
 
<br><br>
 
'''<big>Removal of Outliers</big>'''
 
<br><br>
 
There were transactions with transaction value as high as 5.41212e+15. Such transaction, as shown in Figure 1, represents top 0.5% of all transactions, and has a very large difference in value compared to the bottom 99.5%. This would result in a very skewed distribution of transaction value (see Figure 2a), leading to inaccurate results during exploratory data analysis. Thus, it is essential that we remove these outliers. In addition, through our client meetings, we realised that our data includes test cases - i.e. False transactions carried out by merchants to test RDP payment modes; While we are not able to fully determine whether a transaction is a test case or not (as our client cannot give us a benchmark for a transaction value that would define a test case), we can reduce the number of test cases by eliminating the bottom 0.5% of all transactions.
 
<br><br>
 
<center>[[File:Figure1RDP.jpg|200px]]<br>
 
<small>Figure 1: Summary Statistics for ‘Converted Value’</small></center>
 
<br/>
 
<br>
 
* Remove transactions with transaction value belonging to top 0.5% and bottom 0.5% of all transactions
 
* In addition, we also remove transactions with transaction value less than $1, as we realised that there were still many transactions with value <$1. Based on our team’ perspectives, these transactions are likely to be test cases, as it is rather impossible to have transaction value <$1 across two years. Thus, these transactions are necessary to be removed.
 
<br><br>
 
<center>[[File:Figure2RDP.jpg|400px]]<br>
 
<small>Figure 2a: Johnson SI Transformed Converted value distribution before removing outliers</small></center>
 
<br/>
 
<center>[[File:Figure3RDP.jpg|400px]]<br>
 
<small>Figure 2b:  Johnson SI Transformed Converted value distribution after removing outliers</small><br></center><br><br>
 
- '''Before Removing Outliers'''<br>
 
- Total number of transactions: 2,310,781<br><br>
 
 
 
- '''After Removing Outliers'''<br>
 
- Total number of transactions: 2,282,080<br>
 
- Excluded number of transactions: 28,701<br>
 
<br>
 
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Data Preparation</strong></font></div></div>==
 
We have identified the most common types of transaction in the following:<br>
 
1. Approved or Completed Transaction<br>
 
2. Bank Rejected Transaction<br>
 
3. Authentication Failed<br>
 
4. Duplicate Merchant_TranID Detected<br>
 
<br>
 
Moving forward, we will only be analysing the approved transactions, and the top 3 types of rejected transaction, in order to narrow down the scope of our work.
 
<br><br>
 
<big>'''Interactive Binning'''</big>
 
<br>
 
From our data, we realised that there is huge discrepancy in the total number of transactions carried out by each merchant from 2016-2017; The range varies from 1 to 1212019. As stated in our refined objectives, we hope to identify the over-performing (star) and underperforming (laggard) merchants through the use of Line of Fit graphs. We have decided to group the merchants into 5 bins (cut-points for binning at 20 percentiles) based on total number of transactions per merchant, as we realised that compared to the Line of Fit graphs for all merchants, grouping them into 5 bins provide better model fit.  
 
<br>Next, we rename each bin into the following:<br>
 
{| class="wikitable"
 
 
|-
 
|-
! Bin [Based on Log10]  (Group) !! Number of transactions per merchant (Range) !! Number of merchants
+
| Project Revision (Mid-Term) || Assisted by RDP through obtaining feedback during our sponsor meeting.
 
|-
 
|-
| 1 || <= 7 || 72
+
| Mid-term Preparation || Prepare mid-term report, presentation and Wikipage.
 
|-
 
|-
| 2 || 8 - 52 || 72
+
| Model Validation and Refinement || Conduct independent sample t-tests (e.g.  Ensure the results are similar when attached to different years of study) and refine analysis of data.
 
|-
 
|-
| 3 || 53 - 286 || 73
+
| Insights and Recommendations || Create visualisation from analysis results and formulate recommendations for our sponsor.
 
|-
 
|-
| 4 || 287 - 1479 || 69
+
| Project Revision || Assisted by sponsor through obtaining feedback during our sponsor meeting; Align our final deliverables with sponsor requirements.
 
|-
 
|-
| 5 || > = 1480 || 75
+
| Final Preparation || Prepare abstract and full paper, final Wikipage update and final presentation with necessary deliverables.
 
|}
 
|}

Latest revision as of 15:32, 15 April 2018

HOME

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT DOCUMENTATION

 

PROJECT MANAGEMENT

 

ANLY482 HOMEPAGE

Background Data Source Methodology

Introduction

For any entity to conduct business online, they must be able to accept payments from customers, and this involves a third-party to facilitate the online transfer of funds. Such entities are called electronic Payment Gateways , and this includes PayPal, Stripe, and our project sponsor.

When a customer makes a purchase on a merchant’s website, our sponsor helps to process the credit card payment. This is done by transferring key information between the payment portal (e.g. merchant’s website) and the merchant’s registered bank account. For each successful transaction that our sponsor processes, they apply a commission, called a Merchant Discount Rate (MDR) , from the transaction.

Business Motivations and Objectives

While our sponsor handles millions of transactions across the globe, the company has not fully been able to derive any conclusive analysis from its transaction data.

By providing our team with their data, our sponsor hopes to gather a deeper understanding of it. The project objectives include developing meaningful insights by performing exploratory data analysis. Using the results drawn from the findings, recommendations will be formulated to aid in future business decision making.

Meetings with Project Coordinator, Project Sponsor and Team

Our team aims to meet our Project Coordinator, Professor Kam Tin Seong on average of at least once every week in order to ensure that we are progressing on the right track. We also met our project sponsor once a month to update them on our progress and validate our analysis, while our internal team meetings are held at least twice a week. All minutes can be found under "Project Documentation".

Project Objectives

We aim to study the complexity of the online transactions carried out by our sponsor’s merchants. In particular, we hope to assess the performance of our sponsor’s merchants, by using number of approved transactions per merchant as a benchmark.

This project shows how we use funnel plots and line of best-fit graphs to compare the performance among entities in a group in an unbiased manner. Though both funnel plots and line of best-fit graphs are prevalent in medical research , they are under-utilized in e-commerce analytics. In our case study, the line of best-fit graphs provide a better model fit. By combining features of funnel plot with line of best-fit, we can identify star and laggard merchants.

Moving on, we want to study the characteristics of transactions carried out by key merchants. We use logistic fit to establish any correlation between independent variables and the number of approved transactions. For independent variables that exhibit correlation, we conduct further analysis by utilizing decision trees to map expected approved-to-rejected transaction ratio for key merchants.

In summary, we show how exploratory and confirmatory techniques can be used as source of business intelligence - setting performance benchmarks for each merchant and improving their approved-to-rejected transaction ratio.

Scope of Project

Task Description
Gather Requirements Confirm and gather sponsor requirements
Initial Research and Preparation Conduct preliminary data exploration and define project objectives and scope
Project Proposal Prepare project proposal and Wikipage
Data Exploration and Preparation Ensuring that data is clean and can be analysed using analytical software; We have done data preparation, which include – Interactive Binning and finding the top reason code descriptions. After which, we conducted exploratory data analysis.
Model Building Through research findings and experience, we will attach suitable models to our data. We have used Interactive Binning, Line of Fit and Time Series Analysis to generate insights.
Project Revision (Mid-Term) Assisted by RDP through obtaining feedback during our sponsor meeting.
Mid-term Preparation Prepare mid-term report, presentation and Wikipage.
Model Validation and Refinement Conduct independent sample t-tests (e.g. Ensure the results are similar when attached to different years of study) and refine analysis of data.
Insights and Recommendations Create visualisation from analysis results and formulate recommendations for our sponsor.
Project Revision Assisted by sponsor through obtaining feedback during our sponsor meeting; Align our final deliverables with sponsor requirements.
Final Preparation Prepare abstract and full paper, final Wikipage update and final presentation with necessary deliverables.