Difference between revisions of "ANLY482 AY2017-18 Group9: Project Findings/ Montecarlo"

From Analytics Practicum
Jump to navigation Jump to search
(create montecarlo page)
 
(update projfindings)
 
(One intermediate revision by the same user not shown)
Line 37: Line 37:
 
<!-------------------SubHeader--------------------->
 
<!-------------------SubHeader--------------------->
 
{|style="margin: 1em auto 1em auto;"
 
{|style="margin: 1em auto 1em auto;"
|style="vertical-align:top;width:30%;" | <div style="background:#40403E; width:220px; padding: 13px; font-weight: bold; line-height: 0.3em; text-align:center; text-indent: 15px">[[ANLY482_AY2017-18_Group9%3A_Project_Findings|<font face = "Open Sans" color="#FFFFFF" size=2><b> EDA </b></font>]]
+
|style="vertical-align:top;width:25%;" | <div style="background:#40403E; width:220px; padding: 13px; font-weight: bold; line-height: 0.3em; text-align:center; text-indent: 15px">[[ANLY482_AY2017-18_Group9%3A_Project_Findings|<font face = "Open Sans" color="#FFFFFF" size=2><b> EDA </b></font>]]
  
|style="vertical-align:top;width:30%;" | <div style="background:#40403E; width:220px; padding: 13px; font-weight: bold; line-height: 0.3em; text-align:center; text-indent: 15px">[[ANLY482_AY2017-18_Group9%3A_Project_Findings\ Clustering|<font face = "Open Sans" color="#ffffff" size=2><b> CLUSTERING ANALYSIS </b></font>]]
+
|style="vertical-align:top;width:30%;" | <div style="background:#40403E; width:270px; padding: 13px; font-weight: bold; line-height: 0.3em; text-align:center; text-indent: 15px">[[ANLY482_AY2017-18_Group9%3A_Project_Findings\ Clustering|<font face = "Open Sans" color="#ffffff" size=2><b> MULTILINEAR REGRESSION MODEL </b></font>]]
  
|style="vertical-align:top;width:30%;" | <div style="background:#FFA500; width:220px; padding: 13px; font-weight: bold; line-height: 0.3em; text-align:center; text-indent: 15px">[[ANLY482_AY2017-18_Group9%3A_Project_Findings\ Montecarlo|<font face = "Open Sans" color="#ffffff" size=2><b> MONTE CARLO </b></font>]]
+
|style="vertical-align:top;width:20%;" | <div style="background:#FFA500; width:220px; padding: 13px; font-weight: bold; line-height: 0.3em; text-align:center; text-indent: 15px">[[ANLY482_AY2017-18_Group9%3A_Project_Findings\ Montecarlo|<font face = "Open Sans" color="#ffffff" size=2><b> MONTE CARLO </b></font>]]
 
<div style="text-align: center;"></div>
 
<div style="text-align: center;"></div>
 
|}
 
|}
 
<!-------------------/SubHeader-------------------->
 
<!-------------------/SubHeader-------------------->
 +
 +
<br/>
 +
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Introduction</strong></font></div></div>==
 +
 +
Monte Carlo simulation is the process of generating independent, random draws from a specified probabilistic model. When simulating time series models, it generates a large number of random draws from an entire sample path.  In this case, the day of the week where products are being reordered.<br><br>
 +
Future quantity predictions of products for stores are often unstable and unable to be computed analytically as there are too many factors involved, such as future demand, demographics of customers around the store location and public holidays driving more crowd etc. Hence, the implementation of Monte Carlo simulation can be beneficial in such a scenario where the forecast cannot be derived from conditional expectations.<br><br>
 +
With the help of JMP Profiler, forecasting is done in the presence of random variation, running 5000 times. Monte Carlo simulations help us mirror real world combinations of variables, and this can give us dependable probabilities of the results of combinations. In our simulation, we have identified the dependent variable to be the quantity of product, categorized by each product and outlet. The potential independent variables that were used to construct in the model consists of <b>“Day of Week”</b> comprises Monday through Sunday, <b>“Month”</b> that the product was ordered, <b>“Holiday”</b> which corresponds to school, public or non-holiday. Lastly <b>“Lag”</b>, a computed variable basing on past 3 orders weighted-moving average. The set of independent variables will then be selected from the list of significant predictors obtained though multilinear regression model as seen in Table 4 above. <br><br>
 +
The accuracy of Monte Carlo simulation is usually dependent on two main methods. Firstly, increasing the number of simulations helps to ensure there are more information to base the results on, and thus reducing errors. Secondly, identifying more variables which affect the forecast can be relevant and ensure that the model is more fitted. To ensure the right variables to be selected, we have constructed our explanatory analysis utilizing multilinear regression model as explained above.
 +
 +
<br/>
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Evaluation of Model</strong></font></div></div>==
 +
<b><u>Analysis for Tampines MRT Outlet for 7MM/2KG Tapioca Pearls</u></b>
 +
<br>
 +
The multilinear regression model helps us to identify significant variables to run our Monte Carlo simulation for time series forecasting of each quantity needed based on the store and product. Firstly, we have divided the data into 3 different subsets to test our model accuracy. They are split into training, validation and testing sets which were assigned a value of 0, 2 and 1 respectively. Training consists of data where the created date is from January 2016 to October 2017, Validation consists of data with created date of November 2017, and Testing, which consists of data with created date of December 2017. The training dataset is the original set of data that are used for learning patterns based on supervised learning techniques to measure our outcome – reorder quantity. On the other hand, the validation dataset determines the performance of the chosen predictors identified in multilinear regression model previously. The testing dataset is lastly used as an unbiased evaluation of the final model fit on the training dataset.
 +
<br><br>
 +
[[Image:Crossvalidatetamp.PNG|left|500px|]]
 +
Upon running of our analysis of Fit Model with Profiler with lag as our only construct model effects, we observe that the r-square value for the test set of the fitted model for prediction is significantly high at 0.6207. This can be seen from Figure 17. <br><br>
 +
After which, we decide to consider if an additional variable, “day of week”, added would affect the fitted model significantly. As seen in figure 18, the r-square value of the test set is evidently lower at 0.6143, with a difference of 0.0064. Our conclusion aligns with the results obtained by multilinear regression, “day of week” is not a significant variable.
 +
 +
<br/><br><br><br><br><br><br>
 +
 +
<b><u>Analysis for Serangoon NEX Outlet for 7MM/2KG Tapioca Pearls</u></b>
 +
<br>
 +
To further justify our analysis, we have made comparison of our analysis with another outlet. Serangoon NEX Mall was chosen as one of the outlet with the most significant product delivery ordered.  From our multilinear regression evaluation, the significant variables identified for Tapioca Pearls at NEX mall are <b>“lag”</b> and <b>“day of week”</b>.
 +
<br><br>
 +
[[Image:Crossvalidatenex.PNG|left|500px|]]
 +
Upon running a model fit with profiler for the two independent variables “day of week” and “lag”. As illustrated in figure 19, the r-square value observed for the test set stood at 0.7355 which is a significant high positive value showing a fairly-fitted model. Comparing the fit model to that of the model with only lag selected as the construct model effect, the latter observed a smaller r-square value for the test set of 0.7353 (as illustrated in figure 20) and a smaller r-square value for the overall model, 0.7579 as compared to 0.76541.<br>
 +
 +
In conclusion, the multilinear regression evaluation helps to identify significant variables for each product and store for predictions with better accuracy.
 +
 +
<br/><br><br><br><br><br><br>
 +
==<div style="background: #40403E; line-height: 0.3em; font-family:helvetica;  border-left: #FFA500 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Results generated from Model</strong></font></div></div>==
 +
<br>
 +
[[Image:Mcmresults.PNG|left|500px|]]
 +
As illustrated in figure 21, the prediction profiler shows that with “lag” as an independent variable and the latest lag value of 5.433, employees at Tampines outlet would have to place 6 boxes, rounded up from 5.455, of Tapioca Pearls on the following order.
 +
<br>
 +
Next, from figure 22, it shows the prediction profiler for Serangoon NEX mall with both “lag” and “day of week” as independent variables for tapioca pearls. With the latest lag value of 7.682 and day as Monday, employees at the outlet would have to place 8 boxes, rounded up from 7.720, of tapioca pearls on the following order.
 +
 +
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>
 +
<br/>

Latest revision as of 01:19, 15 April 2018

Fablogo.png

TeamInsured Home.png   HOME

 

TeamInsured About Icon.png   PROJECT OVERVIEW

 

TeamInsured Findings.png   PROJECT FINDINGS

 

TeamInsured PM.png   PROJECT MANAGEMENT

 

TeamInsured Documentation.png   DOCUMENTATION

 

TeamInsured Documentation.png   MAIN PAGE


Introduction

Monte Carlo simulation is the process of generating independent, random draws from a specified probabilistic model. When simulating time series models, it generates a large number of random draws from an entire sample path. In this case, the day of the week where products are being reordered.

Future quantity predictions of products for stores are often unstable and unable to be computed analytically as there are too many factors involved, such as future demand, demographics of customers around the store location and public holidays driving more crowd etc. Hence, the implementation of Monte Carlo simulation can be beneficial in such a scenario where the forecast cannot be derived from conditional expectations.

With the help of JMP Profiler, forecasting is done in the presence of random variation, running 5000 times. Monte Carlo simulations help us mirror real world combinations of variables, and this can give us dependable probabilities of the results of combinations. In our simulation, we have identified the dependent variable to be the quantity of product, categorized by each product and outlet. The potential independent variables that were used to construct in the model consists of “Day of Week” comprises Monday through Sunday, “Month” that the product was ordered, “Holiday” which corresponds to school, public or non-holiday. Lastly “Lag”, a computed variable basing on past 3 orders weighted-moving average. The set of independent variables will then be selected from the list of significant predictors obtained though multilinear regression model as seen in Table 4 above.

The accuracy of Monte Carlo simulation is usually dependent on two main methods. Firstly, increasing the number of simulations helps to ensure there are more information to base the results on, and thus reducing errors. Secondly, identifying more variables which affect the forecast can be relevant and ensure that the model is more fitted. To ensure the right variables to be selected, we have constructed our explanatory analysis utilizing multilinear regression model as explained above.


Evaluation of Model

Analysis for Tampines MRT Outlet for 7MM/2KG Tapioca Pearls
The multilinear regression model helps us to identify significant variables to run our Monte Carlo simulation for time series forecasting of each quantity needed based on the store and product. Firstly, we have divided the data into 3 different subsets to test our model accuracy. They are split into training, validation and testing sets which were assigned a value of 0, 2 and 1 respectively. Training consists of data where the created date is from January 2016 to October 2017, Validation consists of data with created date of November 2017, and Testing, which consists of data with created date of December 2017. The training dataset is the original set of data that are used for learning patterns based on supervised learning techniques to measure our outcome – reorder quantity. On the other hand, the validation dataset determines the performance of the chosen predictors identified in multilinear regression model previously. The testing dataset is lastly used as an unbiased evaluation of the final model fit on the training dataset.

Crossvalidatetamp.PNG

Upon running of our analysis of Fit Model with Profiler with lag as our only construct model effects, we observe that the r-square value for the test set of the fitted model for prediction is significantly high at 0.6207. This can be seen from Figure 17.

After which, we decide to consider if an additional variable, “day of week”, added would affect the fitted model significantly. As seen in figure 18, the r-square value of the test set is evidently lower at 0.6143, with a difference of 0.0064. Our conclusion aligns with the results obtained by multilinear regression, “day of week” is not a significant variable.








Analysis for Serangoon NEX Outlet for 7MM/2KG Tapioca Pearls
To further justify our analysis, we have made comparison of our analysis with another outlet. Serangoon NEX Mall was chosen as one of the outlet with the most significant product delivery ordered. From our multilinear regression evaluation, the significant variables identified for Tapioca Pearls at NEX mall are “lag” and “day of week”.

Crossvalidatenex.PNG

Upon running a model fit with profiler for the two independent variables “day of week” and “lag”. As illustrated in figure 19, the r-square value observed for the test set stood at 0.7355 which is a significant high positive value showing a fairly-fitted model. Comparing the fit model to that of the model with only lag selected as the construct model effect, the latter observed a smaller r-square value for the test set of 0.7353 (as illustrated in figure 20) and a smaller r-square value for the overall model, 0.7579 as compared to 0.76541.

In conclusion, the multilinear regression evaluation helps to identify significant variables for each product and store for predictions with better accuracy.








Results generated from Model


Mcmresults.PNG

As illustrated in figure 21, the prediction profiler shows that with “lag” as an independent variable and the latest lag value of 5.433, employees at Tampines outlet would have to place 6 boxes, rounded up from 5.455, of Tapioca Pearls on the following order.
Next, from figure 22, it shows the prediction profiler for Serangoon NEX mall with both “lag” and “day of week” as independent variables for tapioca pearls. With the latest lag value of 7.682 and day as Monday, employees at the outlet would have to place 8 boxes, rounded up from 7.720, of tapioca pearls on the following order.