Difference between revisions of "ANLY482 AY2016-17 T2 Group23 Silver Daisies Analysis and Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 59: Line 59:
  
 
===Durbin-Watson Autocorrelation Test ===
 
===Durbin-Watson Autocorrelation Test ===
 
+
[[File:Autotest|400px|frameless|none]]
Durbin- Watson Number of Observations AutoCorrelation Prob<DW
 
2.0283551 1682 -0.0170 0.6574
 
 
 
 
Based on Durbin-Watson Test, there is no significant evidence to conclude that the model is autocorrelated.
 
Based on Durbin-Watson Test, there is no significant evidence to conclude that the model is autocorrelated.
  
 
=== Model Fit ===
 
=== Model Fit ===
 +
 
Summary of Fit
 
Summary of Fit
RSquare 0.318637
+
[[File:Modelfit.PNG|400px|frameless|none]]
RSquare Adjusted 0.310849
 
Root Mean Square Error 2084.254
 
Mean of Response 14506.98
 
Observation (or Sum Weights) 1682
 
  
 
The Summary of Fit shows a R-square of 0.318, meaning that 31.8% of the relationship can be explained with the given parameters. The adjusted R-square is 0.311, which is only a little lower than R-square, showing little evidence that the model is overfitted.
 
The Summary of Fit shows a R-square of 0.318, meaning that 31.8% of the relationship can be explained with the given parameters. The adjusted R-square is 0.311, which is only a little lower than R-square, showing little evidence that the model is overfitted.
 +
[[File:Lackoffit.PNG|400px|frameless|none]]
 +
This is further supported by the high p-value in the Lack of Fit analysis. A high p-value shows that the model is close to being a saturated model. There is a 43.7% probability that this model can still be improved to a maximum R-square of 0.526. This usually means that exploration on a combination of variables is required.
 +
 +
Due to the nature of having many categorical variables, the points are very spread out vertically. Although the prediction is not that reliable, there is no obvious visible evidence that it is not following a linear regression model.

Revision as of 20:04, 8 April 2018

HOME

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

DOCUMENTATION

MAIN PAGE

Univariate Analysis

Appointment Duration

We performed univariate analysis on appointment duration to find out the current distribution and summary statistics and obtained the following results:

Duration.png
  • Mean duration: 1:22
  • Median duration: 1:12
  • Standard deviation: 0:47
  • Non-parametric distribution

Bivariate Analysis

We performed bivariate analysis on each independent variable to find out their individual effect on appointment duration and obtained the following main findings:

Appointment Clinic

The median duration differs from clinic to clinic. The longer average appointment duration includes SNEC (Singapore National Eye Center), NUH (National University Hospital), and NCC (National Cancer Centre). The shorter durations mainly come from neighborhood polyclinics, such as Geylang polyclinic, BMP (Bukit Merah Polyclinic), and CWC (Community Wellness Centre).

Appt clinic.png

Appointment Purpose

From the distribution analysis, we can see that Ophthalmologist takes the longest duration while Tests takes the shortest duration.

Appt Purpose.png

Escort

Escort.PNG

Analyzing the appointment durations when there is Next-of-kin accompaniment vs Family (more than 1 NOK) accompaniment vs Medical escort accompaniment, we see a difference in median appointment durations. We conducted a non-parametric paired significance test and conclude that with the accompaniment of a medical escort, the median appointment duration is lower than the accompaniment of NOK or family. TestEscort.PNG

Multivariate Regression

Stepwise Regression

As we have many independent variables and sub-variables, we will perform a stepwise multiple linear regression to select the relevant variables. The results also reveal how the partition splits are done.

Boxcox.png

Based on Box-Cox Transformations, the y variable (appointment duration), is transformed based on the suggested lambda value of 0.263. After which it is transformed again by 0.986, before reaching a suggested lambda value of 1.0, which signifies that no transformation is needed. The Residual by Predicted Plot is now show as below.

Residualbef.png
Residualaft.png

Durbin-Watson Autocorrelation Test

Based on Durbin-Watson Test, there is no significant evidence to conclude that the model is autocorrelated.

Model Fit

Summary of Fit

Modelfit.PNG

The Summary of Fit shows a R-square of 0.318, meaning that 31.8% of the relationship can be explained with the given parameters. The adjusted R-square is 0.311, which is only a little lower than R-square, showing little evidence that the model is overfitted.

Lackoffit.PNG

This is further supported by the high p-value in the Lack of Fit analysis. A high p-value shows that the model is close to being a saturated model. There is a 43.7% probability that this model can still be improved to a maximum R-square of 0.526. This usually means that exploration on a combination of variables is required.

Due to the nature of having many categorical variables, the points are very spread out vertically. Although the prediction is not that reliable, there is no obvious visible evidence that it is not following a linear regression model.