Teppei Syokudo - Improving Store Performance: PPA Data Analysis Methodology

From Analytics Practicum
Jump to navigation Jump to search


Home   Product Portfolio Analysis   Evaluating Store KPIs   Project Management   Documentation   The Team
  Introduction Data Analysis Methodology Findings References  

Data Exploration

Looking at both MW and RP’s product sales, the sale of Main - Meal has a decreasing trend, together with Main - Drink. This is likely due to the introduction of Set Menus, where customers tend to prefer purchasing sets rather than ala carte. The indirect relationship between Main - Meal and Set Menu is a lot more obvious in RP. We can see that the moment Set Menu was introduced in November, Main - Meal sales started dropping.

The most popular Main - Meal in both outlets would be the Kaisendon, which is sold 60 times and 50 to 60 times daily on average in MW and RP respectively. It’s relating set is the Seafood Feast which averages 40 and 24 times daily in MW and RP respectively.

We can also see that the sale of Main - Onigiri and Main - Fried have relatively stagnant to decreasing trends in both outlets. This means that the onigiris and fried items may not be very popular items. In order to boost sales of onigiris and fried items, Teppei Syokudo may want to consider introducing onigiri sets and fried item sets.

1.png 2.png 3.png 4 apsm.png


Data

Traditionally data related to Market Basket Analysis is three-dimensional: Customers, Orders (i.e. item sets, purchases or baskets), and Items (Beery and Linoff, 2004). A sales order is a most essential and basic piece of information representing a single purchase or transaction made by a customer. Besides main information such as the product bought, the quantity of products bought and total amount of the purchase, the store number, cashier number, type of payment or even the cashier who served is also stored in the order data. The items or rather the contents of the order is most important and founds the basis of identification of association rules. Last but not least, customer information provides a deeper level of analysis by finding associations between certain customer traits and profiles and particular items, allowing the store to carry out market segmentation. (Ting, Pan and Chou, 2010).

A market basket database typically consists of a large number of transaction records. Each record lists all items purchased during a single customer transaction. The objective of this data mining exercise is to identify if certain groups of items are usually purchased together, providing meaningful association rules.

Analysis Tool Selection

In carrying out Market Basket Analysis certain considerations have to be made. One important factor is the software or tool used to carry out Market Basket Analysis. Based on the client requirements in this project, the tool used must be one that is open-source and easy to use. While the team understands that there are far greater utility in employing paid software such as Clementine (SPSS), Enterprise Miner (SAS), GhostMiner 1.0, Quadstone or XLMiner, this requirement essentially narrows down the tools that the team is able to use (Haughton et. al., 2003). The tools that are open-source are narrowed down into 3 tools: RapidMiner, R and Tanagra.

Ppa-table1.jpg

After evaluating the 3 tools, the team realized that though R provided measures and customizability, the learning curve to use R is extremely steep and may not be best for the client based on the non-programming nature of their background. Both RapidMiner and Tanagra is extremely lightweight and easy to use, however the presence of extensive interestingness measures caused the team to choose in favour of RapidMiner.

Analysis Breakdown

The study will carry out the analysis in the following flow:

  1. Products that are already in a set are first analysed. Since there is already an association between items found in a set, the confidence of set components are analysed to identify the most popular side dishes / drinks as well as the most unpopular ones.
  2. For products that are not within a set, Market Basket Analysis is carried out to identify association rules between products.
  3. Products with low profitability is identified using cost and revenue figures provided by the store; products that may contribute to the “profitable-product death spiral” is prevented from being dropped.

Data Cleaning and Procedures

Most Point-of-Sales (POS) systems will provide transaction data that consists of information such as transaction id, product id, quantity, price per product, date etc. The data given by the POS system for this example is in the following format:

Ppa-table2.jpg

The team has formatted and exported the data in a “Comma Separated Values” (CSV) file where the "t_id" is the transaction id, the “p_id” is the product id and the "qty" is the quantity of such products sold in a given transaction.

Ppa-table3.jpg

However, this data requires further transformation to transform from transactional data to market basket data. Ideally, market basket data should be represented in a binary format where each row is a separate transaction id and each column corresponds to a product or item sold. While the quantity is provided and analysed in this example, RapidMiner does not include quantity in the analysis of the result.

Ppa-fig2.jpg

Note: our team assumes that the data provided by the POS System is in a most basic relational format as shown above. If the data has been transformed in any particular way that makes certain transformation steps redundant e.g. the data is already in a binomial form, the respective steps can be skipped.

In order to transform the data into a suitable format to apply the identification of frequent itemsets and the generation of the rule, the team carried out the following process (Deshpande, 2012) in RapidMiner:

Ppa-fig3.jpg

Insert Data. Firstly, we loaded the CSV file into the RapidMiner repository by clicking on “Add Data”. RapidMiner should detect the file’s format and delimit the rows automatically by “,”. Once the file is successfully formatted and stored, the following result should be seen:

Ppa-fig4.jpg

Now that the data is in the local repository, we can drag it into the process and it depicts the start of the analysis process. A “Retrieve <name of file>” operator should appear. From the transaction data, we kept only the columns that we require. We connected the operator “Select attributes” to the “out” of our “Retrieve Data” operator and select the filter type “subset” to retain only the relevant columns “t_id”, “p_id” and “qty”.

Ppa-fig5.jpg

Pivot Transformation. Next, we have to group the transactions by the transaction id with products represented in subsequent columns; this operation produces the result that was previously shown in Figure 2. The “Pivot” ‘operator is used and it’s connected to the “out” of our “Select Attributes” operation; under the “group attribute” parameter, we select “t_id” and under “index attribute” we select “p_id”. The output from this set is

Ppa-fig6.jpg

Replacing Missing Values. Since there were some products that had 0 quantity for particular transactions, a “?” or a missing value is retained during our “Pivot” operator. We have to connect a new operator “Replace Missing Values” to the “exa” of the “Pivot” operator to replace these missing values with a “0” to correctly reflect the quantity of the product bought in each transaction. Under the “default” option, “zero” is selected.

Ppa-fig7.jpg

Set Role ID. After the above transformation, all the new columns produced are attributes with the same role. However, in order to better represent the data, and also to prepare it for the later parts of the process, we have to give some of the columns a certain “role”. This “role” sets the kind of part that an attribute plays in a data set or a process. In this example, we are specifically changing the role of “t_id” to id, since it is indeed used as a unique identifier for each role. This removes the attribute “t_id” from analysis and leaves it as an identification attribute in the later parts of the process. The “Set Roles” operator is selected and connected to the “exa” of the “Pivot” operator. Under the “attribute name” parameter, “t_id” is selected and the “target role”, “id” is selected.

Ppa-fig8.jpg

Binomial Representation. AUnfortunately, within RapidMiner the quantity of products purchased within a transaction is not relevant but just simply if a product is purchased or not. The algorithm used later requires the transaction data to be represented in “binomial” values – meaning the analysed attribute has only exactly two possible values, “true” if the product is purchased in a transaction and “false” if a product is not purchased in a transaction. In the “Numerical to Binomial” operator in RapidMiner, the attributes are transformed by checking if they are between a minimal and maximal value; if an attribute falls between the these values, it takes on the attribute “false”, otherwise “true”. By default, the minimal and maximal values are set as “0.0” and “0.0” respectively and hence if an attribute’s original value is 0, it will be transformed to “false”.

Ppa-fig9.jpg

Now that the data is prepared, we select the two main operators that carry out the analysis – “FP-Growth” and “Create Association Rules”. FP-Growth is one of the algorithms used in generating frequent itemset. “Create Association Rules” is used to find association rules between the frequent itemsets generated. The following is the completed process in Rapidminer:

Ppa-fig10.jpg