Difference between revisions of "Teppei Syokudo - Improving Store Performance: Data"

From Analytics Practicum
Jump to navigation Jump to search
Line 63: Line 63:
 
#Analysing the productivity of staff requires sales data to be in an hourly format, where the current data only stores the staff start and end work timings – new variables have to be created to indicate if a staff is working on a particular day in a particular hour.  
 
#Analysing the productivity of staff requires sales data to be in an hourly format, where the current data only stores the staff start and end work timings – new variables have to be created to indicate if a staff is working on a particular day in a particular hour.  
 
##Receipt data had to be recoded to rows of transaction data. The following dataset is prepared before data analysis can be carried out:  
 
##Receipt data had to be recoded to rows of transaction data. The following dataset is prepared before data analysis can be carried out:  
[[File:5.png|900px|]]  
+
[[File:5.png|900px|center]]  
 
##The data was segregated between set and non-set orders to try to find patterns in both groups of data. In the non-set data, sets and their components are treated as one entire item. For sets, we have chosen to analyse the components within the sets - identifying popular choices from non-popular choices.
 
##The data was segregated between set and non-set orders to try to find patterns in both groups of data. In the non-set data, sets and their components are treated as one entire item. For sets, we have chosen to analyse the components within the sets - identifying popular choices from non-popular choices.
 
#Before providing useful analysis that can ascertain the product portfolio mix, sales data has to be broken down to each order. This data is currently stored in a POS system and the team is in the midst of investigating if the data can be retrieved and used meaningfully.
 
#Before providing useful analysis that can ascertain the product portfolio mix, sales data has to be broken down to each order. This data is currently stored in a POS system and the team is in the midst of investigating if the data can be retrieved and used meaningfully.
Line 89: Line 89:
 
<u><p>Data Analysis Methodology</p></u>
 
<u><p>Data Analysis Methodology</p></u>
 
<p>Market Basket Analysis is broken down to two broad steps: frequent itemset generation and the creation of association rules. Popular algorithms employed are such as the Apriori and FP-Growth algorithms. </p>
 
<p>Market Basket Analysis is broken down to two broad steps: frequent itemset generation and the creation of association rules. Popular algorithms employed are such as the Apriori and FP-Growth algorithms. </p>
[[File:6.png|600px|center]]  
+
[[File:lattice.png|600px|center]]  
 
<p>Consider the above lattice – each of these are itemsets. Algorithms have to identify the most efficient way to traverse the lattice and identify if a particular itemset is frequent. There are various ways of generating candidates for frequent itemsets and pruning, and this is determined by the algorithm used to carry out association analysis. The way the itemsets are generated and association rules created determine how computationally complex the analysis will be. Therefore, considerations affecting the computational complexity of an algorithm have to be determined when dealing with mining association rules for large datasets. These include factors such as transaction width, number of products, minimum support level and max itemset size (Tan, Michael, Kumar, 2005). Since the transaction width and number of products are predetermined, the team has chosen to specifically focus on the latter 2 factors to refine for our analysis - association thresholds and the max itemset size. </p>
 
<p>Consider the above lattice – each of these are itemsets. Algorithms have to identify the most efficient way to traverse the lattice and identify if a particular itemset is frequent. There are various ways of generating candidates for frequent itemsets and pruning, and this is determined by the algorithm used to carry out association analysis. The way the itemsets are generated and association rules created determine how computationally complex the analysis will be. Therefore, considerations affecting the computational complexity of an algorithm have to be determined when dealing with mining association rules for large datasets. These include factors such as transaction width, number of products, minimum support level and max itemset size (Tan, Michael, Kumar, 2005). Since the transaction width and number of products are predetermined, the team has chosen to specifically focus on the latter 2 factors to refine for our analysis - association thresholds and the max itemset size. </p>
 
<p>An important aspect of association analysis is the generation of frequent itemsets (or the elimination of infrequent itemsets). The minimum support (minsup) and minimum confidence (minconf) is taken into account. These are thresholds used to determine if for A -> B whether the itemsets A and B are frequent itemsets and whether A -> B is an acceptable association rule.  While the team has explored algorithms to determine the optimal minimal support and minimal confidence levels such as applying Particle Swarm Optimization, the team has examined the data spread to determine appropriate minimum support and confidence levels. </p>
 
<p>An important aspect of association analysis is the generation of frequent itemsets (or the elimination of infrequent itemsets). The minimum support (minsup) and minimum confidence (minconf) is taken into account. These are thresholds used to determine if for A -> B whether the itemsets A and B are frequent itemsets and whether A -> B is an acceptable association rule.  While the team has explored algorithms to determine the optimal minimal support and minimal confidence levels such as applying Particle Swarm Optimization, the team has examined the data spread to determine appropriate minimum support and confidence levels. </p>

Revision as of 23:11, 28 February 2016


Home   Data   Project Management   Documentation   Findings   The Team


Data Exploration

In the very initial stages of the project, the problem is analysed and by looking at the available data and understanding the various aspects of the data. The main aim of this step is to ensure the following:

  1. Maximize insight into a data set
  2. Uncover underlying structure
  3. Extract important variables
  4. Detect outliers and anomalies
  5. Test underlying assumptions
  6. Develop parsimonious models
  7. Determine optimal factor settings. (“e-Handbook of Statistical Methods”, 2016)

Through these steps, ultimately the problem is determined and a solution model is developed with analysis methods identified and worked towards; necessary data preparation is also determined.

Labour Productivity

coming soon...

Product Portfolio Analysis

Looking at both MW and RP’s product sales, the sale of Main - Meal has a decreasing trend, together with Main - Drink. This is likely due to the introduction of Set Menus, where customers tend to prefer purchasing sets rather than ala carte. The indirect relationship between Main - Meal and Set Menu is a lot more obvious in RP. We can see that the moment Set Menu was introduced in November, Main - Meal sales started dropping.

The most popular Main - Meal in both outlets would be the Kaisendon, which is sold 60 times and 50 to 60 times daily on average in MW and RP respectively. It’s relating set is the Seafood Feast which averages 40 and 24 times daily in MW and RP respectively.

We can also see that the sale of Main - Onigiri and Main - Fried have relatively stagnant to decreasing trends in both outlets. This means that the onigiris and fried items may not be very popular items. In order to boost sales of onigiris and fried items, Teppei Syokudo may want to consider introducing onigiri sets and fried item sets.

1.png 2.png 3.png 4 apsm.png


Data Preparation

After preliminary examination of the data, the team has identified that necessary data preparation is required specifically in the following areas:

  1. Sales and labour data is currently stored separately; a joining of the data is required before the performance of staff can be analysed.
  2. Analysing the productivity of staff requires sales data to be in an hourly format, where the current data only stores the staff start and end work timings – new variables have to be created to indicate if a staff is working on a particular day in a particular hour.
    1. Receipt data had to be recoded to rows of transaction data. The following dataset is prepared before data analysis can be carried out:
5.png
    1. The data was segregated between set and non-set orders to try to find patterns in both groups of data. In the non-set data, sets and their components are treated as one entire item. For sets, we have chosen to analyse the components within the sets - identifying popular choices from non-popular choices.
  1. Before providing useful analysis that can ascertain the product portfolio mix, sales data has to be broken down to each order. This data is currently stored in a POS system and the team is in the midst of investigating if the data can be retrieved and used meaningfully.

Data Analysis Methods

Evaluation of Existing KPIs

Correlation Analysis will be used to evaluate the effectiveness of existing KPIs. The team will then look at a particular KPI variable with sales (eg: drinks% and sales) to find out whether sales really is affected by that KPI.

Proposing New KPIs

The team will be using Clustering and Conjoint Analysis to identify the key variables that impact sales. Clustering sales data with labour data could help identify clusters with high sales, and the combination of variables leading to such sales values. Conjoint analysis would help identify the individual variables that are important for hitting sales values. In doing so, the team would be able to propose new KPIs based on those key variables.

Setting Numerical Targets

Through Clustering, the team will also be able to find out the right numerical targets to set for individual staff. For example, clustering of sales data and labour data could help determine who are the better salespersons who are able to make x # or $x of sales. This could be the numerical target that is set for the lower performing salespersons.

Product Portfolio Analysis

Last but not least, a Market Basket Analysis will be carried out to identify product pairings with high affinity. This will help Teppei to optimize its product portfolio to include only the most popular products. It will also identify suitable product pairings for cross-selling. Furthermore, it will also help to identify hidden trends that can spur new product development.

Data Analysis

Market Basket Analysis

Data Analysis Methodology

Market Basket Analysis is broken down to two broad steps: frequent itemset generation and the creation of association rules. Popular algorithms employed are such as the Apriori and FP-Growth algorithms.

Lattice.png

Consider the above lattice – each of these are itemsets. Algorithms have to identify the most efficient way to traverse the lattice and identify if a particular itemset is frequent. There are various ways of generating candidates for frequent itemsets and pruning, and this is determined by the algorithm used to carry out association analysis. The way the itemsets are generated and association rules created determine how computationally complex the analysis will be. Therefore, considerations affecting the computational complexity of an algorithm have to be determined when dealing with mining association rules for large datasets. These include factors such as transaction width, number of products, minimum support level and max itemset size (Tan, Michael, Kumar, 2005). Since the transaction width and number of products are predetermined, the team has chosen to specifically focus on the latter 2 factors to refine for our analysis - association thresholds and the max itemset size.

An important aspect of association analysis is the generation of frequent itemsets (or the elimination of infrequent itemsets). The minimum support (minsup) and minimum confidence (minconf) is taken into account. These are thresholds used to determine if for A -> B whether the itemsets A and B are frequent itemsets and whether A -> B is an acceptable association rule. While the team has explored algorithms to determine the optimal minimal support and minimal confidence levels such as applying Particle Swarm Optimization, the team has examined the data spread to determine appropriate minimum support and confidence levels.

7 apsm.png 8 apsm.png

Based on the support levels seen in our dataset we can see that most products have a rather low support level of less than 0.05. This is because most customers of the store purchase particular products - namely the Kaisendon and sets containing the Kaisendon. Hence a minimum support level of 0.005 is selected as compared to conventional levels that are 0.1 and higher. Although having a low minimum support and confidence level might create a higher computational complexity, currently the computational complexity of the data mining is low due to the low transaction width and the number of products.

A relatively low minimum confidence level of 0.1 is also selected. A max itemset size of 2 is set since most transactions have a low transaction width of 1.51 (MW) and 1.49 (RP).

Data Analysis Tools

In carrying out MBA, certain considerations have to be made. One important factor is the software or tool used to carry out MBA. Based on the client requirements in this project, the tool used must be one that is open-source and easy to use. While the team understands that there are far greater utility in employing paid software such as Clementine (SPSS), Enterprise Miner (SAS), GhostMiner 1.0, Quadstone or XLMiner, this requirement essentially narrows down the tools that the team is able to use (Haughton et. al., 2003). The tools that are open-source are narrowed down into 3 tools: RapidMiner, R and Tanagra.

File:9 apsm.png

After evaluating the 3 tools, the team realized that though R provided measures and customizability, the learning curve to use R is extremely steep and may not be best for the client based on the non-programming nature of their background. Both RapidMiner and Tanagra is extremely lightweight and easy to use, however the presence of extensive interestingness measures caused the team to choose in favour of RapidMiner.

Data Analysis Measures

Using Rapidminer, six different interestingness measures is collected - Support, Confidence, LaPlace, p-s, Lift, Conviction.

By analysing the interestingness measures based on three key properties (Piatetsky-Shapiro, 1991) that determines a good measure:

  1. M = 0 if A and B are statistically independent;
  2. M monotonically increases with P(A, B) when P(A) and P(B) remain the same;
  3. M monotonically decreased with P(A) (or P(B)) when the rest of the parameters (P(A,B) and P(B) or P(A)) remain unchanged.

The analysis shows that lift is a good interestingness measure if the data is normalized and Leverage is generally a good interestingness measure. In essence both lift and leverage serves our purpose in interpreting the analysis results - to measure how many more units of an itemset is sold together than expected from the independent sales. Leverage identifies the difference of A and B occurring together in transactions in a dataset and what would be expected if they were statistically dependent (Piatetsky-Shapiro, 1991). Lift conversely measures directly how many times more often does A and B occur together than expected if they were statistically independent. Consequently, the two measures provides the same ranking or ordering of products since the meaningfulness of these two measures are essentially the same.

Analysis Results

After understanding the interestingness measure to analyse the result set, the team examines the analysis results. The results are broken into 2 sections: sets and non-sets.

Within sets, we look at the association between main courses and their drinks or side toppings. Since main dishes within a set does not vary, they’re the independent variable, the premise and the side dishes or drinks are the dependent variable and the conclusions. We’ve broken down set components to the various main dishes.

The most popular topping is the Hotate Topping and the Hot Green Tea. (We couldn’t analyse the data for RP since there is only one set’s data collected and this provides an inaccurate measure of the set’s components’ association with each other.

For non-sets, based on the association found, we can make the following recommendations for MW:

  1. Provide a set for Ebi Katsu and Hokkaido Scallop
  2. Provide a set for Salmon Caviar Onigir and Salmon Onigri
  3. Buddy Meals consisting the following:
    1. Aburi Salmon Don & Kaisendon
    2. Katsudon & Kaisendon
    3. Katsu Curry & Kaisendon

For non-sets, based on the association found, we can make the following recommendations for RP:

  1. Provide a set for Tuna Mayo Onigri and Salmon Onigri
  2. Provide a set for Konbu Onigir and Salmon Onigri
  3. Buddy Meals consisting the following:
    1. Tonkatsu & Kaisendon

We noticed varying product focus from the products associated in RP as compared to MW. For example, the salmon onigri has a higher association with the salmon caviar onigri in MW and for RP it has a higher association with the Konbu onigri. A general observation of the support of products purchased by customers in MW shows that the preference of food is generally more expensive and there’s more focus at quality of product while for RP there is more focus on sets and value for money. A guess would be that the customers at RP are more likely to be office staffs (RP is closed on weekends) and as for MW, casual shoppers that tend to have a higher spending power.

Recommendations

Ultimately the analysis of the data provides 3 recommendations:

  1. Highly associated items should be placed near each other or in a set to drive sales
  2. For items frequently bought together, giving a discount for an item will drive the sales of the other significantly.
  3. Avoiding the Profitable Product Death Spiral - we should not eliminate unprofitable products that are attracting profitable customers

For the items such as the onigris that we’ve identified earlier or even some of the fried dishes, by putting them nearby or giving discount to one or the other, an increase in sales can potentially be driven. Similarly, putting these items in a set will see an increase sales volume as well.

We see that some of the side dishes that were bunched together with the Kaisendon are rather unprofitable and are also low in sales volume. However these product are attracting customers to purchase the Kaisendon and hence we should not exclude these seemingly unprofitable items.