Difference between revisions of "Teppei Syokudo - Improving Store Performance: PPA Data Analysis Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 76: Line 76:
 
<font size =3 face=Georgia >
 
<font size =3 face=Georgia >
 
<p>In carrying out Market Basket Analysis certain considerations have to be made. One important factor is the software or tool used to carry out Market Basket Analysis. Based on the client requirements in this project, the tool used must be one that is open-source and easy to use. While the team understands that there are far greater utility in employing paid software such as Clementine (SPSS), Enterprise Miner (SAS), GhostMiner 1.0, Quadstone or XLMiner, this requirement essentially narrows down the tools that the team is able to use (Haughton et. al., 2003). The tools that are open-source are narrowed down into 3 tools: RapidMiner, R and Tanagra.</p>
 
<p>In carrying out Market Basket Analysis certain considerations have to be made. One important factor is the software or tool used to carry out Market Basket Analysis. Based on the client requirements in this project, the tool used must be one that is open-source and easy to use. While the team understands that there are far greater utility in employing paid software such as Clementine (SPSS), Enterprise Miner (SAS), GhostMiner 1.0, Quadstone or XLMiner, this requirement essentially narrows down the tools that the team is able to use (Haughton et. al., 2003). The tools that are open-source are narrowed down into 3 tools: RapidMiner, R and Tanagra.</p>
 +
<p>
 +
[[File:Ppa-table1.jpg|900px]]
 +
</p>
 +
<p>
 +
After evaluating the 3 tools, the team realized that though R provided measures and customizability, the learning curve to use R is extremely steep and may not be best for the client based on the non-programming nature of their background. Both RapidMiner and Tanagra is extremely lightweight and easy to use, however the presence of extensive interestingness measures caused the team to choose in favour of RapidMiner.</p>
  
 +
</font>
 +
</div>
  
 
+
==<div style="background: #000033; padding: 13px; font-weight: bold; text-align:center; line-height: 0.3em; text-indent: 20px;font-size:26px; font-family:Britannic Bold"><font color= #ffffff>Analysis Breakdown</font></div>==
 +
<div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow:    7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 +
<font size =3 face=Georgia >
 +
<p>
 +
The study will carry out the analysis in the following flow:
 +
#Products that are already in a set are first analysed. Since there is already an association between items found in a set, the confidence of set components are analysed to identify the most popular side dishes / drinks as well as the most unpopular ones.
 +
#For products that are not within a set, Market Basket Analysis is carried out to identify association rules between products.
 +
#Products with low profitability is identified using cost and revenue figures provided by the store; products that may contribute to the “profitable-product death spiral” is prevented from being dropped.
 +
</p>
 
</font>
 
</font>
 
</div>
 
</div>
 +
 +
==<div style="background: #000033; padding: 13px; font-weight: bold; text-align:center; line-height: 0.3em; text-indent: 20px;font-size:26px; font-family:Britannic Bold"><font color= #ffffff>Data Cleaning and Procedures</font></div>==
 +
<div style="margin:20px; padding: 10px; background: #ffffff; font-family: Trebuchet MS, sans-serif; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow:    7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 +
<font size =3 face=Georgia >
 +
<p>
 +
Most Point-of-Sales (POS) systems will provide transaction data that consists of information such as transaction id, product id, quantity, price per product, date etc. The data given by the POS system for this example is in the following format:
 +
</p>
 +
<p>
 +
[[File:Ppa-table2.jpg|1000px]]
 +
</p>
 +
<p>
 +
The team has formatted and exported the data in a “Comma Separated Values” (CSV) file where the "t_id" is the transaction id, the “p_id” is the product id and the "qty" is the quantity of such products sold in a given transaction. </p>
 +
<p>
 +
[[File:Ppa-table3.jpg|500px]]
 +
</p>
 +
<p>
 +
However, this data requires further transformation to transform from transactional data to market basket data. Ideally, market basket data should be represented in a binary format where each row is a separate transaction id and each column corresponds to a product or item sold. While the quantity is provided and analysed in this example, RapidMiner does not include quantity in the analysis of the result.  </p>
 +
<p>
 +
[[File:Ppa-fig2.jpg|500px]]
 +
</p>
 +
<font size = 2><i><p> Note: our team assumes that the data provided by the POS System is in a most basic relational format as shown above. If the data has been transformed in any particular way that makes certain transformation steps redundant e.g. the data is already in a binomial form, the respective steps can be skipped. </p></i></font>
 +
<p>
 +
In order to transform the data into a suitable format to apply the identification of frequent itemsets and the generation of the rule, the team carried out the following process (Deshpande, 2012) in RapidMiner: </p>
 +
<p>
 +
[[File:Ppa-fig3.jpg|800px]]
 +
</p>

Revision as of 22:06, 5 April 2016


Home   Product Portfolio Analysis   Evaluating Store KPIs   Project Management   Documentation   The Team
  Introduction Data Analysis Methodology Findings References  

Data Exploration

Looking at both MW and RP’s product sales, the sale of Main - Meal has a decreasing trend, together with Main - Drink. This is likely due to the introduction of Set Menus, where customers tend to prefer purchasing sets rather than ala carte. The indirect relationship between Main - Meal and Set Menu is a lot more obvious in RP. We can see that the moment Set Menu was introduced in November, Main - Meal sales started dropping.

The most popular Main - Meal in both outlets would be the Kaisendon, which is sold 60 times and 50 to 60 times daily on average in MW and RP respectively. It’s relating set is the Seafood Feast which averages 40 and 24 times daily in MW and RP respectively.

We can also see that the sale of Main - Onigiri and Main - Fried have relatively stagnant to decreasing trends in both outlets. This means that the onigiris and fried items may not be very popular items. In order to boost sales of onigiris and fried items, Teppei Syokudo may want to consider introducing onigiri sets and fried item sets.

1.png 2.png 3.png 4 apsm.png


Data

Traditionally data related to Market Basket Analysis is three-dimensional: Customers, Orders (i.e. item sets, purchases or baskets), and Items (Beery and Linoff, 2004). A sales order is a most essential and basic piece of information representing a single purchase or transaction made by a customer. Besides main information such as the product bought, the quantity of products bought and total amount of the purchase, the store number, cashier number, type of payment or even the cashier who served is also stored in the order data. The items or rather the contents of the order is most important and founds the basis of identification of association rules. Last but not least, customer information provides a deeper level of analysis by finding associations between certain customer traits and profiles and particular items, allowing the store to carry out market segmentation. (Ting, Pan and Chou, 2010).

A market basket database typically consists of a large number of transaction records. Each record lists all items purchased during a single customer transaction. The objective of this data mining exercise is to identify if certain groups of items are usually purchased together, providing meaningful association rules.

Analysis Tool Selection

In carrying out Market Basket Analysis certain considerations have to be made. One important factor is the software or tool used to carry out Market Basket Analysis. Based on the client requirements in this project, the tool used must be one that is open-source and easy to use. While the team understands that there are far greater utility in employing paid software such as Clementine (SPSS), Enterprise Miner (SAS), GhostMiner 1.0, Quadstone or XLMiner, this requirement essentially narrows down the tools that the team is able to use (Haughton et. al., 2003). The tools that are open-source are narrowed down into 3 tools: RapidMiner, R and Tanagra.

Ppa-table1.jpg

After evaluating the 3 tools, the team realized that though R provided measures and customizability, the learning curve to use R is extremely steep and may not be best for the client based on the non-programming nature of their background. Both RapidMiner and Tanagra is extremely lightweight and easy to use, however the presence of extensive interestingness measures caused the team to choose in favour of RapidMiner.

Analysis Breakdown

The study will carry out the analysis in the following flow:

  1. Products that are already in a set are first analysed. Since there is already an association between items found in a set, the confidence of set components are analysed to identify the most popular side dishes / drinks as well as the most unpopular ones.
  2. For products that are not within a set, Market Basket Analysis is carried out to identify association rules between products.
  3. Products with low profitability is identified using cost and revenue figures provided by the store; products that may contribute to the “profitable-product death spiral” is prevented from being dropped.

Data Cleaning and Procedures

Most Point-of-Sales (POS) systems will provide transaction data that consists of information such as transaction id, product id, quantity, price per product, date etc. The data given by the POS system for this example is in the following format:

Ppa-table2.jpg

The team has formatted and exported the data in a “Comma Separated Values” (CSV) file where the "t_id" is the transaction id, the “p_id” is the product id and the "qty" is the quantity of such products sold in a given transaction.

Ppa-table3.jpg

However, this data requires further transformation to transform from transactional data to market basket data. Ideally, market basket data should be represented in a binary format where each row is a separate transaction id and each column corresponds to a product or item sold. While the quantity is provided and analysed in this example, RapidMiner does not include quantity in the analysis of the result.

Ppa-fig2.jpg

Note: our team assumes that the data provided by the POS System is in a most basic relational format as shown above. If the data has been transformed in any particular way that makes certain transformation steps redundant e.g. the data is already in a binomial form, the respective steps can be skipped.

In order to transform the data into a suitable format to apply the identification of frequent itemsets and the generation of the rule, the team carried out the following process (Deshpande, 2012) in RapidMiner:

Ppa-fig3.jpg