ANLY482 AY2016-17 T2 Group20 Findings

From Analytics Practicum
Jump to navigation Jump to search

HOME

 

PROJECT OVERVIEW

 

FINDINGS

 

PROJECT DOCUMENTATION

 

PROJECT MANAGEMENT

Literature Review

Many studies have been done on the prediction of used car prices and one done by Pudaruth used a series of linear regressions, clustering and decision trees. It was discovered that the most important factors in the price of used cars are usually the “age of the car, its make (and model), the origin of the car (the original country of the manufacturer), its mileage (the number of kilometers it has run) and its horsepower.” The study has also shown that the main weakness of decision trees and Naïve Bayes prediction is their inability to handle output classes with numeric values. Another study done by Peerun et al. used artificial neural networks to predict car prices and concluded that it is a “risky enterprise but feasible”. A real-world application of linear and ridge regressions on used car models can be seen in an open source browser plugin created by Kostic D. He used the ads on Polovniautomobili to train a model to predict a price of cars. Another study done by Chen showed us that building a good linear regression model requires us to sample the data and train the model on the sample and see how it will perform outside of the training sample. There are studies that show how linear regression can be used to predict the prices of used cars but few researchers have dealt with the large growing amount of data for the US market.

Primary Research

To obtain the possible features that might affect the pricing from the business perspective, we conducted surveys, phone calls and focus group discussions with dealers. Our client, Automotive Ventures, gave us the list of dealerships in United States. There were 19 respondents to the online survey and the team engaged the sales management from Hyundai Ford Legacy Dealership for a focus group discussion. The primary research conducted were valuable in setting a basis for variable selection in the predictive modelling. Factors affecting pricing decision making I. Current inventory and competitor’s inventory Dealers will price similar used cars according to the number of similar cars present in the market eg. if the dealership has 3 Ford SE SUV cars, and their competitor is bringing in 5 Ford SE SUV cars of comparable type, the dealer will be more inclined to lower their prices for the car. II. Seasonality There are certain purchasing seasons for customers and wholesale customers which will result either in underpricing or overpricing their current inventory. Also, dealers tend to consider industry news and trends, competitive Intelligence reports, car shows that may influence overall pricing strategy but may not impact in terms of effecting in the dealer changing price for a certain inventory. III. Proximity In identifying their competitors in the used car market, all dealers have consensually agreed that location (proximity) is the type of factor that they consider most. When asked how close a proximity is, they benchmarked within 10 miles to be their direct competitors and up to 100 miles as competitors. One of the dealer mentioned that customers are willing to travel across states to get a similar car that is priced $500 cheaper. IV. Customer profile Customers are maker- sensitive, which means different makes attracts different profile of customers. “Hyundai customers are price sensitive. They are mainly shopping price and inventory.” - Keith, Used Car Sales manager.

Turning Insights into Actions

With these insights obtained, we incorporated it in our modelling process. I. Defining a market for similar cars Pricing is affected by market forces of demand and supply. Hence, we are creating another variable selection of “Number of similar cars” which is calculated by identifying similar cars in terms of model and trim within 50 and 100-mile radius and excluding the similar cars inventory of the dealership. The variables “Number of similar cars in 50 miles” and “Number of similar cars in 100 miles” will then be included in the modelling and regression analysis. II. Generalized model after filtering “Vehicle Make” of cars As mentioned by the dealers, the different make of cars attracts different type of customers and are priced differently. In terms of creating the model, we decided that make should be a filter for the generalized model as customers are aware of the make of car that they would like to purchase.

Secondary Research

We be looking at creating a model for car makes – BMW, Ford and Chevrolet, for the scope of this project. BMW targets the high-end market, Ford targets the lower end of the car market while Chevrolet targets the middle to high end of the car market as seen from Figure 1. We have decided on these car makes because they represent different customer segments in the car market and each model constitutes a high percentage of the total data.

Figure 1: Performance_of_Car_Brands_and_Cost_of_Ownership

Figure 1: Performance of Car Brands and Cost of Ownership

From figure 2 below, customers in the used car market have a high tendency to switch between dealers and are moderately price sensitive.

Figure 2: MarketLine Advantage on automotive aftermarket sector in United States

Figure 2: MarketLine Advantage on automotive aftermarket sector in United States These findings would give us a headstart in the variables needed and models to start with for prediction of used car prices in the US market.