Difference between revisions of "ANLY482 AY2017-18 T2 Group 05 Project Findings"

From Analytics Practicum
Jump to navigation Jump to search
 
(23 intermediate revisions by the same user not shown)
Line 33: Line 33:
 
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 
{| style="background-color:#ffffff; margin: 3px auto 0 auto" width="55%"
 
|-  
 
|-  
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #2e2e2e" width="150px"| [[Red Dot Payment_-_Mid-Term| <span style="color:#3d3d3d">Mid-Term</span>]]
+
 
 +
|-
 +
! style="font-size:11px; text-align: center; border-top:solid #ffffff; border-bottom:solid #2e2e2e" width="150px"| [[R2Z_Initial Data Exploration and Analysis| <span style="color:#3d3d3d">Initial Data Exploration and Analysis</span>]]
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 +
 
 +
! style="font-size:11px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[R2Z_Deeper Data Exploration and Analysis| <span style="color:#3d3d3d">Deeper Data Exploration and Analysis</span>]]
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 +
 
 +
! style="font-size:11px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[R2Z_Setting Benchmarks for Merchant Performance|
 +
<span style="color:#3d3d3d">Setting Benchmarks for Merchant Performance</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
  
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[Red Dot Payment_Finals| <span style="color:#3d3d3d">Finals</span>]]
+
! style="font-size:11px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[R2Z_Limitations|  
 +
<span style="color:#3d3d3d">Limitations</span>]]
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
 
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
  
 +
! style="font-size:11px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="150px"| [[R2Z_Implications|
 +
<span style="color:#3d3d3d">Implications</span>]]
 +
! style="font-size:15px; text-align: center; border-top:solid #ffffff; border-bottom:solid #ffffff" width="20px"|
  
 
|}
 
|}
 
<!------- Details ---->
 
<!------- Details ---->
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri;  border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Data Exploration</strong></font></div></div>==
 
'''''Due to the sensitivity of the data, please refer to the interim report for more details.'''''<br><br>
 
We have identified the top merchants in terms of total number of approved transactions, and approved transaction monetary value.<br>
 
 
<br>
 
<br>
<big>'''General Descriptive Statistics'''</big>  
+
Using JMP Graph Builder, we plotted the total number of transactions against months of 2016-2017 (e.g. January 2016 coded as period ‘1’, January 2017 coded as period ‘13’)
 +
 
 +
From Figure 5, we discovered a sharp increase in number of approved transactions processed during the from January to October 2016. After that, there was an overall decrease in number of approved transactions, with sudden dips in November to December 2016, and April to May 2017. Additionally, the number of rejected transactions has been increasing towards the end of 2017. This signals potential issues, and exploratory analysis was conducted to assess performance of merchants contributing to the less optimistic trends towards end of 2017.
 +
 
 +
<br><br>
 +
<center>[[File:General5.jpg|700px]]<br>
 +
<small>Figure 5: Time-based pattern for each type of transaction across 24 months</small></center>
 
<br>
 
<br>
Moving on, we also find out the top merchants contributing to the other types of transaction in each bin.
+
'''<big>FUNNEL PLOT</big>'''<br>
<br><br>
+
We used funnel plots to assess the performance of merchants in terms of their number of approved transactions. Even though this is not a built-in feature on JMP Pro, Wolfe has created an add-in accessible via JMP’s user community, seen in Figure 6.
<big>'''One Way Anova'''</big> <br>
 
As shown in Figure 6a, there are significant difference between the mean number of transactions per merchant for each bin <br>
 
- e.g. Bin 1: average 3 transactions per merchant, Bin 2: average 25 transactions per merchant, Bin 3:  average 135 transactions per merchant, Bin 4: average 714 transactions per merchant and as expected, Bin 5: average 30818 transactions per merchant.<br>
 
[[File:Figure6aRDP.jpg|400px]]<br>
 
<small>Figure 6a: Oneway Anova Analysis of Number of Transactions per merchant by bins</small>
 
 
<br><br>
 
<br><br>
However, in Figure 6b, while there is a significant difference between the mean transaction monetary value among the 5 different bins, interestingly, we note that Bin 5 does not has the highest mean transaction monetary value (only $96.34). In fact, Bin 1 has the highest mean transaction monetary value of $3124.43.
+
<center>[[File:General6.jpg|400px]]<br>
This means that while Bin 5 consist of merchants with the highest volume of transactions, they are not necessarily the most valuable group of merchants. Moving forward, RDP will provide the total revenue it earns from each merchant, and we will create interactive binning and Line of Fit graphs again to identify the top merchants most valuable to RDP.<br>
+
<small>Figure 6: JMP Pro Funnel Plot Plug-in</small></center>
[[File:Figure6bRDP.jpg|400px]]<br>
+
<br>
<small>Figure 6b: Oneway Anova Analysis of ‘converted value’ by bins</small>
 
 
<br><br>
 
<br><br>
<big>'''Line of Fit'''</big><br>
+
<center>[[File:General7.jpg|400px]]<br>
As mentioned under ‘Objectives’, we want to visualise and compare the performance among merchants in the same bin. Initially, we wanted to use funnel plot to perform this visualisation. Funnel plots are a form of scatter plot in which number of approved transactions per merchant are plotted against number of transactions per merchant. Control limits, similar to confidence limits are then overlaid on the scatter plot. The control limits represent the expected variation in number of approved transactions assuming that the only source of variation is stochastic. Thus, those who lie outside of the control limits would allow us to identify the star and laggard merchants in the same bin.<br><br>
+
<small>Figure 7: Proportion of approved transactions in bin 5</small></center>
However, we were unable to obtain proper model fit for the funnel plots for each bin, as seen in the example under Figure 7, where we plot the funnel plot for Bin/Group 5.
 
 
<br>
 
<br>
[[File:Figure7RDP.jpg|400px]]<br>
+
Figure 7 shows the proportion of approved transactions out of total number of transactions from 2016-2017 in bin 5. The fit mean line depicts the predicted number of approved transactions for each merchant, based on their number of transactions. Merchants that lie outside of the 95% and 99.8% control limits can be segregated into two groups - Star merchants (performing better than predicted in terms of number of approved transactions) and Laggard merchants (performing worse than predicted). However, as can be seen, we were unable to obtain a proper model fit to derive any meaningful insights. The result was the same even after using Johnson SI transformation to transform the data, to makes the distribution of variables less skewed (as seen in Figure 8 and Figure 9).
<small>Figure 7: Funnel plot for Bin/Group 5s</small>
 
 
<br><br>
 
<br><br>
Thus, we decided to use Line of Fit graphs instead. We plot the Line of Fit graphs for each bin, by comparing the number of approved transactions per merchant against total transaction per merchant in that bin. As shown in Line of Fit graphs in Figure 8a to 8e, we were able to get more proper model fit, and realised that merchants that lie outside the confidence interval can be grouped into over-performing (star) and under-performing (laggard) merchants. These merchants can be identified using the JMP’s brush tool. Merchants that fall above the confidence interval and regression line are identified as star merchants. Our client should focus on identifying factors that have led to the higher than average approved transaction rates of the star merchants. Merchants that fall below the confidence interval and regression line are identified as laggard merchants.
+
<center>[[File:General8.jpg|400px]]<br>
 +
<small>Figure 8: Distribution of approved transactions per merchant (left) and distribution of all total transactions per merchant before transformation (right)</small></center>
 +
<br>
 
<br><br>
 
<br><br>
Merchants who lie within the confidence interval (blue shaded area) follow a stochastic trend and will not be analysed. In addition, we realised that the Line of Fit graph for Bin 1 does not provide a good model fit for analysis. Thus, the Line of Fit graph for Bin 1 will also not be analysed.
+
<center>[[File:General_9.jpg|400px]]<br>
<br><br>
+
<small>Figure 9: Distribution of approved transactions per merchant (left) and distribution of all total transactions per merchant after transformation (right)</small></center>
[[File:Figure8aRDP.jpg|400px]]<br>
+
<br>
<small>Figure 8a: Group 5 Line of Fit Graph</small>
 
<br><br><br>
 
[[File:Figure8bRDP.jpg|400px]]<br>
 
<small>Figure 8b: Group 4 Line of Fit Graph</small>
 
<br><br><br>
 
[[File:Figure8cRDP.jpg|400px]]<br>
 
<small>Figure 8c: Group 3 Line of Fit Graph</small>
 
<br><br><br>
 
[[File:Figure8dRDP.jpg|400px]]<br>
 
<small>Figure 8d: Group 2 Line of Fit Graph</small>
 
<br><br><br>
 
[[File:Figure8eRDP.jpg|400px]]<br>
 
<small>Figure 8e: Group 1 Line of Fit Graph</small>
 
<br><br>
 
While we have identified laggard merchants from Line of Fit graphs, we should also set individual benchmarks for each merchant. Based on each merchant’s volume of transactions, we will identify the respective average number of approved transactions they should have in order to be within the confidence level. Thus, our client can highlight how each individual laggard merchant compares to other merchants within the same bin and set benchmarks based on the regression line. This will motivate merchants to improve their own performance.
 
<br><br>
 
<big>'''Time Series Analysis'''</big><br>
 
 
<br>
 
<br>
[[File:Figure9aRDP.jpg|600px]]<br>
 
<small>Figure 9a: Time Series Pattern for each type of transaction across 24 months</small>
 
 
<br><br>
 
<br><br>
Based on the time-series graph in Figure 9a, we realised that the number of approved transactions increased drastically in the year of 2016, and generally decreased in the year of 2017. Nevertheless, we notice that the number of approved transactions are subjected to great fluctuation in 2017.
+
<center>[[File:General10.jpg|400px]]<br>
 +
<small>Figure 10: Proportion of approved transactions in bin 5</small></center>
 
<br>
 
<br>
In addition, we notice that there is an increasing number of rejected transactions towards the end of 2017. This could demonstrate a potential problem for RDP, as it would mean a decrease in RDP commission/increase in opportunity cost with an increase in rejected transactions.
+
As seen in Figure 10, we were still unable to obtain a proper model fit. This issue applies to the rest of the bins, and as such, a hybrid approach was adopted.
 
<br><br>
 
<br><br>
[[File:Figure9bRDP.jpg|600px]]<br>
+
<big>'''HYBRID APPROACH'''</big> <br>
<small>Figure 9b: Time Series Pattern for each type of transaction across 7 days in a week</small>
+
This section explains the hybrid approach; merging the characteristics of funnel plots with the line of best-fit graphs in identifying outliers.
<br><br>
+
 
Based on the time-series graph in Figure 9b, we also realised that approved transaction tend to peak on the Tuesday and Saturday of a week. On the other hand, rejected transaction tend to peak on Wednesday. There is no distinct relationship between authentication failed/duplicate merchant_train ID transactions and the day of the week.
+
By using JMP Graph Builder, the number of approved transactions was plotted against the total number of transactions, mapped by merchants. The line of best-fit graph provides the predicted number of approved transactions for each merchant, based on total number of transactions. Similar to a funnel plot, the line of best-fit graph has a confidence interval set at 95%. This allows us to be 95% confident that the two confidence bands (blue shaded regions) encloses the true best-fit linear regression line.
<br><br>
+
[[File:Figure9cRDP.jpg|600px]]<br>
+
Merchants who lie within the confidence interval follow a stochastic trend and will not be analysed. Merchants outside of the confidence interval are categorised into star and laggard merchants.
<small>Figure 9c: Time Series Pattern for each type of transaction across 24 hours in a day</small>
 
<br><br>
 
Based on the time-series graph in Figure 9c, we realised that approved transaction tend to rise from 6am onwards. Un contrast to our initial prediction that approved transaction will peak after working hours from 5-7pm, approved transaction actually drop during these hours. Insead, it rises from 7pm onwards. This could mean that merchants’ customers would usually transact at night/using home PC.
 
 
<br><br>
 
<br><br>
 +
<center>[[File:General11.jpg|900px]]<br>
 +
<small>Figure 11: Visualization of hybrid approach for all bins </small></center>
 +
<br>
 +
Figure 11 depicts the line of best fit graphs for Bin 1 to Bin 5, plotting the number of approved transactions per merchant against number of transactions per merchant through the two years. From the positive gradients of the line of best-fit graphs, we establish that as the number of transactions increases, the number of approved transactions also increases (positive correlation). Data points in green are identified as the star merchants, while data points in red are the laggards.
  
==<div style="background: #DD597D; line-height: 0.3em; font-family:calibri; border-left: #CFCFCF solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Moving Forward</strong></font></div></div>==
+
We used MCC to identify the most common industry sectors that these merchants belonged toThe table below depicts a summary of the star and laggard merchants in each bin.
After the interim presentation and this report submission, are looking to complete the following:
 
 
<br><br>
 
<br><br>
'''Merchant Categorisation Codes (MCC):''' To add a new column called “Merchant Categorisation Code (MCC)” into our data, that will categorise each merchant into a specific industry. This allows us to derive better analysis in terms of the common industries that star or laggard merchants belong to.
+
<center>[[File:General12.jpg|400px]]<br>
<br><br>
+
<small>Figure 12: Summary table for star and laggard merchants </small></center>
'''Remove Duplicated Merchants:''' There are certain merchants whose names are duplicated in various forms - e.g. City Harvest Church v.s. City Harvest Church (Virtual Terminal) actually belong to the same merchant. Our client hope to remove the duplicated names, so that each merchant name in our data is unique and belong to only one merchant.
+
<br>
<br><br>
 
'''Time Series Clustering:''' We can further cluster star or laggard merchants based on their time series patterns, through the use of time series clustering. This allows us to identify clusters with time series patterns that resemble the overall trends we have observed in our time series analysis so far. Through this method, we hope to better identify the characteristics of merchants who tend to lead the overall trends in approved/rejected transactions rates.
 
<br><br>
 
'''Visual Dashboard:'''  Aid in understanding some of our client’s current business issues, as well as the benchmark metrics and attributes that the company may not be currently analysing.
 

Latest revision as of 15:45, 15 April 2018

HOME

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT DOCUMENTATION

 

PROJECT MANAGEMENT

 

ANLY482 HOMEPAGE

Initial Data Exploration and Analysis Deeper Data Exploration and Analysis Setting Benchmarks for Merchant Performance Limitations Implications


Using JMP Graph Builder, we plotted the total number of transactions against months of 2016-2017 (e.g. January 2016 coded as period ‘1’, January 2017 coded as period ‘13’)

From Figure 5, we discovered a sharp increase in number of approved transactions processed during the from January to October 2016. After that, there was an overall decrease in number of approved transactions, with sudden dips in November to December 2016, and April to May 2017. Additionally, the number of rejected transactions has been increasing towards the end of 2017. This signals potential issues, and exploratory analysis was conducted to assess performance of merchants contributing to the less optimistic trends towards end of 2017.



General5.jpg
Figure 5: Time-based pattern for each type of transaction across 24 months


FUNNEL PLOT
We used funnel plots to assess the performance of merchants in terms of their number of approved transactions. Even though this is not a built-in feature on JMP Pro, Wolfe has created an add-in accessible via JMP’s user community, seen in Figure 6.

General6.jpg
Figure 6: JMP Pro Funnel Plot Plug-in




General7.jpg
Figure 7: Proportion of approved transactions in bin 5


Figure 7 shows the proportion of approved transactions out of total number of transactions from 2016-2017 in bin 5. The fit mean line depicts the predicted number of approved transactions for each merchant, based on their number of transactions. Merchants that lie outside of the 95% and 99.8% control limits can be segregated into two groups - Star merchants (performing better than predicted in terms of number of approved transactions) and Laggard merchants (performing worse than predicted). However, as can be seen, we were unable to obtain a proper model fit to derive any meaningful insights. The result was the same even after using Johnson SI transformation to transform the data, to makes the distribution of variables less skewed (as seen in Figure 8 and Figure 9).

General8.jpg
Figure 8: Distribution of approved transactions per merchant (left) and distribution of all total transactions per merchant before transformation (right)




General 9.jpg
Figure 9: Distribution of approved transactions per merchant (left) and distribution of all total transactions per merchant after transformation (right)





General10.jpg
Figure 10: Proportion of approved transactions in bin 5


As seen in Figure 10, we were still unable to obtain a proper model fit. This issue applies to the rest of the bins, and as such, a hybrid approach was adopted.

HYBRID APPROACH
This section explains the hybrid approach; merging the characteristics of funnel plots with the line of best-fit graphs in identifying outliers.

By using JMP Graph Builder, the number of approved transactions was plotted against the total number of transactions, mapped by merchants. The line of best-fit graph provides the predicted number of approved transactions for each merchant, based on total number of transactions. Similar to a funnel plot, the line of best-fit graph has a confidence interval set at 95%. This allows us to be 95% confident that the two confidence bands (blue shaded regions) encloses the true best-fit linear regression line.

Merchants who lie within the confidence interval follow a stochastic trend and will not be analysed. Merchants outside of the confidence interval are categorised into star and laggard merchants.

General11.jpg
Figure 11: Visualization of hybrid approach for all bins


Figure 11 depicts the line of best fit graphs for Bin 1 to Bin 5, plotting the number of approved transactions per merchant against number of transactions per merchant through the two years. From the positive gradients of the line of best-fit graphs, we establish that as the number of transactions increases, the number of approved transactions also increases (positive correlation). Data points in green are identified as the star merchants, while data points in red are the laggards.

We used MCC to identify the most common industry sectors that these merchants belonged to. The table below depicts a summary of the star and laggard merchants in each bin.

General12.jpg
Figure 12: Summary table for star and laggard merchants