Difference between revisions of "Team Accuro Project Overview"

From Analytics Practicum
Jump to navigation Jump to search
Line 166: Line 166:
  
 
<div align="left">
 
<div align="left">
==<div style="background: #B22222; padding: 15px; font-family:Helvetica Neue; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #800000 solid 32px;"><font color="black">References</font></div>==
+
==<div style="background: #B22222; padding: 15px; font-family:Helvetica Neue; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #800000 solid 32px;"><font color="white">References</font></div>==
 
<div style="border-left: #EAEAEA solid 12px; padding: 0px 30px 0px 18px;  font-family:Helvetica Neue; font-size: 15px">
 
<div style="border-left: #EAEAEA solid 12px; padding: 0px 30px 0px 18px;  font-family:Helvetica Neue; font-size: 15px">
 
<references />
 
<references />

Revision as of 20:21, 4 September 2015

Accuro


HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 


Introduction and Background

We live today, in what could be best described as the age of consumerism, where, what the consumer increasingly looks for, is information to distinguish between products. With this rising need for expert opinion and recommendations, crowd-sourced review sites have brought forth one of the most disruptive business forces of modern age. Since Yelp was launched in 2005, it has been helping customers stay away from bad decisions while steering towards good experiences via a 5-star rating scale and written text reviews. With its vast database of reviews, ratings and general information, Yelp not only makes decision making for its millions of users much easier but also makes its reviewed businesses more profitable by increasing store visits and site traffic.

The Yelp Dataset Challenge provides data on ratings for several businesses across 4 countries and 10 cities to give students an opportunity to explore and apply analytics techniques to design a model that improves the pace and efficiency of Yelp’s recommendation systems. Using the dataset provided for existing businesses, we aim to identify the main attributes of a business that make it a high performer (highly rated) on Yelp. Since restaurants form a large chunk of the businesses reviewed on Yelp, we decided to build a model specifically to advice new restaurateurs on how to become their customers’ favourite food destination.

With Yelp’s increasing popularity in the United States, businesses are starting to care more and more about their ratings as “an extra half star rating causes restaurants to sell out 19 percentage points more frequently”. This profound effect of Yelp ratings on the success of a business makes our analysis even more crucial and relevant for new restaurant owners. Why do some businesses rank higher than others? Do customers give ratings purely based on food quality, does ambience triumph over service or do geographic locations of businesses affect the rating pattern of customers? Through our project we hope to analyse such questions and thereby be able to advice restaurant owners on what factors to look out for.


Review of Similar Work

1) Visualizing Yelp Ratings: Interactive Analysis and Comparison of Businesses:

The aim of the study is to aid businesses to compare performances (Yelp ratings) with other similar businesses based on location, category, and other relevant attributes.

The visualization focuses on three main parts:
a) Distribution of ratings: A bar chart showing the frequency of each star rating (1 through 5) for a single business.
b) Number of useful votes vs. star rating A scatter plot showing every review for a given business, with the x-position representing the “useful” votes received and y-position representing the for the business.
c) Ratings over time: This chart was the same as Chart 2, but with the date of the review on the x-axis
The final product is designed as an interactive display, allowing users to select a business of interest and indicate the radius in miles to filter the businesses for comparison. We will use this as a base and help expand on some of its shortcomings in terms of usability and UI. We will further supplement this with analysis of our own using other statistical methods to help derive meaning from the dataset.

2) Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction
This study focuses on the influence of geographical location on user ratings of a business assuming that a user’s rating is determined by both the intrinsic characteristics of the business as well as the extrinsic characteristics of its geographical neighbors.
The authors use two kinds of latent factors to model a business: one for its intrinsic characteristics and the other for its extrinsic characteristics (which encodes the neighborhood influence of this business to its geographical neighbors).
The study shows that by incorporating geographical neighborhood influences, much lower prediction error is achieved than the state-of-the-art models including Biased MF, SVD++, and Social MF. The prediction error is further reduced by incorporating influences from business category and review content.

We can look to extend our analysis by looking at geographical neighbourhood as an additional factor (that is not mentioned in the dataset) to reduce the variance observed in the data and improve the predictive power of the model.


Motivation

Our personal interest in the topic has motivated us to choose this as our area of research. When planning trips abroad, we explore sites like HostelWorld and TripAdvisor that make planning trips a lot faster and easier; not only is this helpful to customers planning trips but also to the businesses that have been given honest ratings. Since the team consisted students from a Management university, our motivation when choosing this project was more business focused. Our perspective on recommendations was more catered towards how a business can improve its standing on Yelp, and thereby improve its turnover through more visits by customers.

We believe that our topic of analysis is crucial for the following reasons:
1) It will make the redirection of customers to high quality restaurants much easier and more efficient.
2) It can encourage low quality restaurants to improve in response to insights about customer demand.

3) The rapid proliferation of users trusting online review sites and and incorporating them in their everyday lives makes this an important avenue for future research.


Project Scope and Methodology

  • Primary requirements (for “restaurants” and one city only):


Step 1: Descriptive Analysis - Analysing Restaurants specifically for what differentiates High performers, low performers and Hit or Miss restaurants. The analysis will further be segmented into for example region, review count, operating hours, etc. For each of the 3 segments mentioned, the following analysis will be done:
A. Clustering to analyse business profiles that characterize the market. Explore various algorithms and evaluate each of the algorithms to decide which works best for the dataset.
B. Time series analysis of whether any major trends have emerged in restaurants by region – further decipher the does and don’ts for success

Step 2: Key factors identification for prescriptive analysis (feature extraction) for new restaurants by region, in order to succeed. Regression will be used to identify the most important factors and the model will be validated so that we can analyse how good the model is.

Step 3: For each segment (i.e. high performers, low performers and Hit & Miss restaurants), our analysis will include the following:
o Regression to predict the rating for new restaurants regions (through analysis of success factors over time. For example, restaurants that started 2 years ago, and achieved high ratings a year later will be used to test for restaurants that started a year ago and have high ratings now to study patterns in determining a successful business)

Step 4: Build a visualization tool for client for continual updates on business strategy. Focus will be to build a robust tool that helps the client recreate the same analysis on tableau.

  • Secondary requirements:


Expand and recreate the analysis for all other cities. This analysis will be recreated to include other kinds of businesses eg. Bars, Salons, etc. For some businesses, new methods of analysis such as latent factorization will be employed (especially for those with minimal information on attributes)

  • Future research:


Evaluating the importance of review ratings for restaurants – Are they effective to improve ratings? Do restaurants that utilize recommended changes succeed?

Can the ratings and reviews of local experts be assimilated in feature extraction to help improve the predictability of ratings success? We realize that people are social entities and can be heavily influenced by reviews from local experts in their criticism on Yelp. Future research in this area can enrich our analysis for a business as well.

Limitations and Assumptions

In doing our analysis, we have overall concluded below some of the major limitations we can foresee from this project:
Limitations Assumptions
Limited data points on businesses and cities Project methodology will be scalable for looking at regional trends
Limited action-ability of insights since companies may not care about Yelp ratings Project findings will help set priorities for improvement for business owners
Businesses attribute may not be completely accurate Assuming that data has been updated as accurately as possible
Defining business categories Assuming business tags under categories are comprehensive for the competitive set


Deliverables

  • Project Proposal
  • Mid-term presentation
  • Mid-term report
  • Final presentation
  • Final report
  • Project poster
  • Project Wiki
  • Visualization tool on Tableau

Work Scope

Through this project we are hoping to build to an interactive dashboard as a solution to the ratings and recommendations system Dataset Challenge by Yelp. Some areas of research we would like to look into are:

  • Cultural Trends
  • Seasonal Trends
  • Location Mining
  • Change-points analysis
  • Hierarchical and Non-Hierarchical Clustering
  • Classification analysis
  • Explanatory Regression analysis
  • Predictive Regression analysis


References