Difference between revisions of "Qui Vivra Verra - Methodology"

From Analytics Practicum
Jump to navigation Jump to search
(Created page with "<!-- Logo --> <br> <!-- End Logo --> <!-- Start Nav Bar --> {| style="background-color:#fff; color:#000000 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0"...")
 
 
(3 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
 
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
 
&nbsp;[[Qui Vivra Verra - About_Us | <font color="#ffffff" size="2"><strong>ABOUT US</strong></font>]]
 
&nbsp;[[Qui Vivra Verra - About_Us | <font color="#ffffff" size="2"><strong>ABOUT US</strong></font>]]
 +
 +
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;border-left: 0px" width="15%" |
 +
&nbsp;[[Qui Vivra Verra - Project_Overview | <font color="#ffffff" size="2"><strong>PROJECT OVERVIEW</strong></font>]]
  
 
| style="font-family:Century Gothic; font-size:100%; background:#f4f9fd; text-align:center;border-left: 0px" width="15%" |  
 
| style="font-family:Century Gothic; font-size:100%; background:#f4f9fd; text-align:center;border-left: 0px" width="15%" |  
&nbsp;[[Qui Vivra Verra - Project_Overview | <font color="#000000" size="2"><strong>PROJECT OVERVIEW</strong></font>]]
+
&nbsp;[[Qui Vivra Verra - Project_Findings | <font color="#000000" size="2"><strong>PROJECT FINDINGS</strong></font>]]
 
 
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center;border-left: 0px" width="15%" |
 
&nbsp;[[Qui Vivra Verra - Project_Findings | <font color="#ffffff" size="2"><strong>PROJECT FINDINGS</strong></font>]]
 
  
 
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
 
| style="font-family:Century Gothic; font-size:100%; background:#003464; text-align:center; border-left: 0px" width="15%" |  
Line 31: Line 31:
 
{| style="background-color:white; color:000000 padding: 5px 0 0 0;" width="100%" height=50px cellspacing="0" cellpadding="0" valign="top" border="0" |
 
{| style="background-color:white; color:000000 padding: 5px 0 0 0;" width="100%" height=50px cellspacing="0" cellpadding="0" valign="top" border="0" |
  
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Project Overview| <b>Summary</b>]]
+
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Project Findings| <b>Introduction</b>]]
  
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Motivation & Objectives| <b>Motivation & Objectives</b>]]
+
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:5px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Methodology| <b>Methodology</b>]]
  
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:5px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Methodology| <b>Methodology</b>]]
+
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Hypotheses & Findings| <b>Hypotheses & Findings</b>]]
  
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - Technology| <b>Technology</b>]]
+
| style="vertical-align:top;width:14%;" | <div style="padding: 3px; text-align:center; line-height: wrap_content; font-size:15px; border-bottom:2px solid #0163bd; font-family:Century Gothic"> [[Qui Vivra Verra - References| <b>References</b>]]
  
 
|}
 
|}
 
<!--/Sub Header-->
 
<!--/Sub Header-->
 
<!-- Please do not make changes to above -->
 
<!-- Please do not make changes to above -->
 +
 +
 +
<!------- Details ---->
 +
 +
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Preparation</strong></font></div></div>
 +
 +
Further analysis of the data set can be accomplished through market segmentation. The concept of k-means clustering can be applied on the Transaction Dataset, with the clustering parameters set as:
 +
 +
* Recency (number of days from last transaction to end of the FY)
 +
* Frequency (number of transactions performed within the FY)
 +
* Monetary (average number of books borrowed per transaction)
 +
 +
 +
Each patron will then be assigned to a cluster, with each cluster homogeneous within and heterogeneous across. From here, we can determine the dominant cluster of library member that each library caters to – which can provide some operational insights by understanding the demographics of the bulk of each library’s patrons.
 +
 +
 +
<div style="background: #dce6f9; line-height: 0.3em; font-family:Century Gothic;  border-left: #003464 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Application of the Huff's Model</strong></font></div></div>
 +
 +
An adaptation of the Huff’s Model (Huff, 1964) will be applied in the analyses. 
 +
 +
 +
To quote a paper by Okabe & Sugihara (2012):
 +
 +
To state a general form of the Huff model, we consider a space ''S'' (which may be a plane or a network), in which n stores are located at ''p<sub>1</sub>, …, p<sub>n</sub>''. Let a<sub>i</sub> be the attractiveness of store ''i'', which may be a function of its floor area, the number of items sold, its parking area and so forth; let ''d(p, p<sub>i</sub>)'' be the distance between a point ''p'' on ''S'' and the store at ''p<sub>i</sub>'', which may be the Euclidean distance or the shortest-path distance; and let ''F(d(p, p<sub>i</sub>))'' be a monotonically decreasing function of ''d(p, p<sub>i</sub>)'', referred to as a distance decay function or distance deterrence function. In these terms, the Huff model showing the probability of a consumer at ''p'' choosing the store at ''p<sub>i</sub>'' is generally written as:
 +
 +
 +
[[File:Huff's Model Formula.png|center|Huff's_Model_Formula.png]]
 +
 +
 +
Adapting the Huff’s Model to the context of our project, we would consider Singapore as space ''S'', in which n libraries are located at ''p<sub>1</sub>, …, p<sub>n</sub>''. Let a<sub>i</sub> be the attractiveness of library ''i'', which is estimated by a multinomial generalised linear regression equation, taking into account the following factors (non-exhaustive):
 +
* Size of the library’s collection
 +
* Gross floor area of the library
 +
* Type of facility the library is located in (i.e. mall, stand-alone etc)
 +
* Size of facility the library is in (i.e. if the library is located in a mall, this refers to the gross floor area of the mall)
 +
* Number of MRT stations within a set distance (to be determined) from the library
 +
* Number of bus stops within a set distance (to be determined) from the library
 +
* Number of bus routes within a set distance (to be determined) from the library
 +
* Opening hours of the library
 +
* Number of educational institutes (i.e. primary/secondary schools, junior colleges, polytechnics, ITE, universities) within a set distance (to be determined) from the library
 +
* Number of other libraries (only considering the list under NLB) within a set distance from the library
 +
 +
 +
Let ''d(p, p<sub>i</sub>)'' be the distance between an area (geographical subzone) ''p'' on ''S'' and the library at ''p<sub>i</sub>'', which may be the Euclidean distance or the shortest-path distance; and let ''F(d(p, p<sub>i</sub>))'' be a monotonically decreasing function of ''d(p, p<sub>i</sub>)'', referred to as a distance decay function or distance deterrence function. Therefore, the above-stated formula can be interpreted as the probability of a consumer at ''p'' choosing the library at ''p<sub>i</sub>''.
 +
 +
 +
Dividing the number of patrons in each subzone at ''p'' that visited a library ''p<sub>i</sub>'' by the total number of patrons in the subzone at ''p'', we can obtain a probabilistic model which estimates the proportion of time that a patron from subzone ''p'' will visit library ''i'' in any given FY. Then, by substituting the known values of ''a<sub>i</sub>''  (to be determined by the regression model) and ''d(p, p<sub>i</sub>)'' into the adapted Huff’s Model, we are able to derive possible values of the power parameter (∝) that govern the distance decay function. By doing this process iteratively, we can obtain an unbiased estimate for ∝ that is accurate to a certain significant level.

Latest revision as of 00:56, 31 August 2016



  HOME

  ABOUT US

  PROJECT OVERVIEW

  PROJECT FINDINGS

  PROJECT MANAGEMENT

  DOCUMENTATION



Data Preparation

Further analysis of the data set can be accomplished through market segmentation. The concept of k-means clustering can be applied on the Transaction Dataset, with the clustering parameters set as:

  • Recency (number of days from last transaction to end of the FY)
  • Frequency (number of transactions performed within the FY)
  • Monetary (average number of books borrowed per transaction)


Each patron will then be assigned to a cluster, with each cluster homogeneous within and heterogeneous across. From here, we can determine the dominant cluster of library member that each library caters to – which can provide some operational insights by understanding the demographics of the bulk of each library’s patrons.


Application of the Huff's Model

An adaptation of the Huff’s Model (Huff, 1964) will be applied in the analyses.


To quote a paper by Okabe & Sugihara (2012):

To state a general form of the Huff model, we consider a space S (which may be a plane or a network), in which n stores are located at p1, …, pn. Let ai be the attractiveness of store i, which may be a function of its floor area, the number of items sold, its parking area and so forth; let d(p, pi) be the distance between a point p on S and the store at pi, which may be the Euclidean distance or the shortest-path distance; and let F(d(p, pi)) be a monotonically decreasing function of d(p, pi), referred to as a distance decay function or distance deterrence function. In these terms, the Huff model showing the probability of a consumer at p choosing the store at pi is generally written as:


Huff's_Model_Formula.png


Adapting the Huff’s Model to the context of our project, we would consider Singapore as space S, in which n libraries are located at p1, …, pn. Let ai be the attractiveness of library i, which is estimated by a multinomial generalised linear regression equation, taking into account the following factors (non-exhaustive):

  • Size of the library’s collection
  • Gross floor area of the library
  • Type of facility the library is located in (i.e. mall, stand-alone etc)
  • Size of facility the library is in (i.e. if the library is located in a mall, this refers to the gross floor area of the mall)
  • Number of MRT stations within a set distance (to be determined) from the library
  • Number of bus stops within a set distance (to be determined) from the library
  • Number of bus routes within a set distance (to be determined) from the library
  • Opening hours of the library
  • Number of educational institutes (i.e. primary/secondary schools, junior colleges, polytechnics, ITE, universities) within a set distance (to be determined) from the library
  • Number of other libraries (only considering the list under NLB) within a set distance from the library


Let d(p, pi) be the distance between an area (geographical subzone) p on S and the library at pi, which may be the Euclidean distance or the shortest-path distance; and let F(d(p, pi)) be a monotonically decreasing function of d(p, pi), referred to as a distance decay function or distance deterrence function. Therefore, the above-stated formula can be interpreted as the probability of a consumer at p choosing the library at pi.


Dividing the number of patrons in each subzone at p that visited a library pi by the total number of patrons in the subzone at p, we can obtain a probabilistic model which estimates the proportion of time that a patron from subzone p will visit library i in any given FY. Then, by substituting the known values of ai (to be determined by the regression model) and d(p, pi) into the adapted Huff’s Model, we are able to derive possible values of the power parameter (∝) that govern the distance decay function. By doing this process iteratively, we can obtain an unbiased estimate for ∝ that is accurate to a certain significant level.