Be Customer Wise or Otherwise - Project Overview
Contents
About the Project
Our sponsor, GLC, is an international postal and logistics company that has a global network spanning more than 220 countries and territories. Its product offerings include global freight forwarding, international express deliveries, warehousing solutions, and other customised logistic services.
International freight and logistics services have been expanding rapidly as a result of rising incomes across the globe especially in emerging economies like China and Vietnam, driving increased consumption levels. Coupled with the ubiquity of the Internet, changes in consumption habits such as the rise of e-commerce have also further pushed up the demand for freight services. Furthermore, with increasingly integrated global networks, supply chains have gone international with goods often needing to be shipped across continents from manufacturers to distribution centres to customers. Therefore, being able to do this efficiently is one main key to success in this industry.
Motivation
Having identified the potential of Asia-Pacific, GLC has been expanding its operations in the region. However, it has faced fierce competition from other players in the industry. In addition, GLC also has to contend with the 'new, globalised customer' (Cister, Ebecken, 2002) who is extremely demanding and has become accustomed to quick satisfaction with quality.
Objectives
GLC, like many companies, have been collecting and have stored a copious amount of data about customers, suppliers, business partners, etc. However, the inability to draw out valuable findings from the data prevents this information to be used in any meaningful way (Berson, et al., 2000). Through this project, the team aim to assess the relevance of the current data collected, discover insights on improving data collection and utilisation and also suggest how GLC can better utilise this to shape its business strategies.
In order to stay competitive in this market, the company believes that besides producing cutting-edge products, it also needs to understand the needs of its customers. GLC is thus seeking to profile their customer base so as to give them an edge in better tailoring their service offerings to meet market demands to boost revenue and increase market share.
With the motivation to maximise sales revenue and market share through devising appropriate product strategies and distribution channel policies, the team will try to uncover any information in the available data that may be useful in meeting the business objectives. The team will also assess the relevance of the data provided and suggest how GLC can make better use of the historical sales data to shape this aspect of its business strategy and operations, before proposing the recommendations to the management that follow from this.
Methodology
Customer Relationship Management
This direction is indicative of a larger shift towards Customer Relationship Management (CRM), which refers to the shift in focus from products to customers. This has been enabled by technological developments that allow companies to use capture, process, analyse and distribute data in order to understand and anticipate current customers’ needs. The impact of this is an increased ability for the company to fulfil these needs more efficiently as well as improve the retention rate of customers.
According to Rogers and Peppers, there are four basic strategies with regard to CRM: identify the customers, differentiate the customers, interact with the customers and personalise the relationship with the customer. Through this improved understanding of customers, companies are able to adjust their business strategies and customise their marketing approaches for different customers, thus improving “customer acquisition, customer retention, customer loyalty, and customer profitability” (Swift, 2001).
Recency, Frequency and Monetary Index
Research into CRM analytics also yielded the introduction of a derived metric in the form of a Recency, Frequency and Monetary (RFM) index. It is widely used in direct marketing to select customers to target offers to by using information about past behaviour (Mason, 2003). The purpose of this project is to apply descriptive analytics techniques to the sales and marketing data made available by GLC to yield knowledge discovery. From the data, we aim to create clusters of customers that can be profiled uniquely across different metrics, including a customised Recency, Frequency and Monetary (RFM) measure.
The RFM index has three components, mainly:
- Recency - refers to date of the customer’s last purchase
- Frequency - refers to the number of purchases made within a given time period
- Monetary - refers to the total amount of dollars spent by the customer within a certain time frame
Using these three measures, RFM analysis then teases out the most valuable customers. In doing so, the team aims to address the following questions:
- How can different types of customers be identified (e.g. high-value customers, high potential for growth, etc.)?
- How do customer segments differ in terms of characteristics and behaviour?
- How can the current data collection practices be improved to aid future analysis?
Tackling these issues above will pave the way for the management to devise actionable marketing strategies to improve revenues and gain market share.
Limitations of RFM
Although simple and easy to implement, the RFM model has its limitations as well. One limitation of the RFM index is that it relies on only three variables and hence, may neglect other important indicators. However, this index could potentially be used in conjunction with other variables during analysis. Alternatively, (Miglautsch, 2002) suggests creating additional variables to increase the complexity of the basic RFM segmentation system to draw out opportunities for viable profit-making from other groups of customers that the traditional RFM analysis might have overlooked.
Some studies have suggested that the three components are not equally important, with Recency and Monetary having more influence on customer behaviour. Hence, the team decided to apply a weighted RFM index for the analysis of the active customers, giving extra weights to the Recency and Monetary indexes, as seen from the following formula:
Traditional RFM Index = R + F + M
Weighted RFM Index = 2R + F + 2M
Having said that, there is still a possibility that there may exist two customers with the same RFM index (be it Traditional or Weighted) but with different individual values. To simplify things, let's look at an example using the Traditional RFM index. One of the customers may have a high value for Monetary(5) and Frequency(5) but low on Recency(1) while the other may have a high value for Recency(5) and Frequency(5) but low on Monetary(1). Both customers' RFM index adds up to a high value of 11 but they mean different things.
Therefore, due to the limitations of the RFM model, the team will focus on clustering techniques whereby the objective is to segment the customer base into several groups that are homogeneous within themselves, but heterogeneous between one another. This could be used to reveal common behaviour patterns within each cluster and segment them into profiles based on their similarities. From the outcome, different marketing strategies can then be applied to different target groups based on their clustered RFM behaviour.
Approach
K-means Cluster Analysis
Cluster analysis is a multivariate technique of grouping data objects based only on information found in the data that describes the objects and their relationships. The aim is for objects within a group to have maximal degree of association, and minimal from objects in other groups. The greater the similarity within a group and the greater the difference between groups, the better or more distinct the clustering.
K-means clustering is appropriate when large datasets are involved. It also allows objects to move from one cluster to another, which is not possible in hierarchical clustering. The process starts off by selecting k number of cluster centres, before iterating through objects and alternately assigning them to clusters and recalculating its cluster centres, until the cluster centres (and clusters) remain relatively stable.
Having tested out various cluster sizes between 3-12, the team agreed upon using k=7 for our cluster analysis for the active accounts. Although k=3 gave the largest Cubic Clustering Criterion (CCC) within this range, we decided to go with the second-highest CCC value where k=7 as we thought it would be able to give more granularity to the profiles by having a larger, but still manageable, number of customer groups.
The objective of this is so that GLC can better understand the characteristics and behaviours of the different types of customers that they have. With these information, GLC can then tailor strategies to better target these distinct groups so as to be more competitive in the market.
Multiple Linear Regression
A multiple linear regression aims to model the relationship between 2 or more explanatory (independent) variables and a response (dependent) variable by fitting a linear equation onto the data observed. Every value of the independent variable x is associated with a value of the dependent variable y. The regression line describes how the response variable y changes according to the various explanatory variables. This allows for the prediction of the response variable given a set of explanatory variables.
In addition, the statistical method (process) we used for multiple linear regression was that of stepwise. This was used to estimate how much potential revenue could have been earned if the zero revenue transactions had been charged for zero revenue accounts. Stepwise regression successively adds and removes variables based on the t-statistics of their estimated coefficients. The independent factors that included were billed weight, origin, destination, global product code and sales channel. Stepwise regression is a better option as compared to ordinary multiple regression in this case due to the large number of potential independent variables (many origins and destinations).
Due to the large amount of data done on the transaction level which included approximately 2,000,000 data points, we chose to use the forward method of starting with zero variables in the model and then proceeding forward by adding one variable at a time. At each step, the t-statistics for the estimated coefficient of each variable that is in the model is computed and compared against a threshold value. If it falls below, the variable is removed and it iterates again.