Difference between revisions of "Final Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 133: Line 133:
 
With the longitude and latitude points, we can also derive the distances of patient’s residence to each clinic. Firstly, we need to convert the coordinates from World Geodetic System, WGS 84 to Singapore Coordinate System, SVY21 before using the below formula to compute the distances.
 
With the longitude and latitude points, we can also derive the distances of patient’s residence to each clinic. Firstly, we need to convert the coordinates from World Geodetic System, WGS 84 to Singapore Coordinate System, SVY21 before using the below formula to compute the distances.
 
<center>
 
<center>
[[Image:AY2017_ZAN_Figure_3.png|500px]]
+
[[Image:AY2017_ZAN_Figure_3.png|600px]]
 
<br/>
 
<br/>
 
Figure 3: Formulation of Distance from Clinic A or Clinic B
 
Figure 3: Formulation of Distance from Clinic A or Clinic B
Line 149: Line 149:
 
<br/>
 
<br/>
 
<center>
 
<center>
[[Image:AY2017_ZAN_Figure_5.png|500px]]
+
[[Image:AY2017_ZAN_Figure_5.png|700px]]
 
<br/>
 
<br/>
 
Figure 5: Distribution of Patients around Singapore
 
Figure 5: Distribution of Patients around Singapore

Revision as of 13:20, 15 April 2017


HOME

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

ABOUT US

 


Mid-Term Progress


Final Progressnew!



NoDocument
01 Final Conference Paper
02 Final Presentation Slides
03 Final Poster


An excerpt from the Final Conference Paper is shown below. For more information on the analysis, please download the paper or contact the team.



Abstract


The healthcare industry has always been concerned with gathering insights on patients and their no-show appointments. The number of no-show appointments has an impact on the cost and clinic utilization. It creates an opportunity cost for another patient who is unable to make use of the no-show appointment slot to get a consultation from a doctor or an allied health professional. In this study, we aim to identify significant variables that affect no-show appointments in Hospital X. The data used for this model building process is provided to us by our sponsor, a medical consultant at Hospital X. Taking references from past literature review, we will select and derive relevant variables to be used for modelling. We will develop logistic regression and decision tree models to predict the probability of no-shows for Hospital X using both patient information and individual clinical appointment attendance records. We will then compare the different models and assess the results. Based on our findings, we will end the report with set of implications and results for Hospital X.

Introduction


With regards to the state of mental health disorders in children, there has been an increase of cases from 533 in 1980 to 3051 in 2010. A medical study (Woo, et al, 2007) has shown that one in eight children in Singapore has emotional disorders, and one in 20 has behavioural disorders, only 10% ever see a psychiatrist. Thus, it places an emphasis in understanding no- show appointments. Appointments are made for a reason. When patients default on their appointments, they miss the opportunity for a medical consultation and thus, place their health status at risk. No-show appointment is defined as when a patient does not attend for a scheduled clinic appointment or cancels with such minimal lead time that the slot cannot be filled (Huang & Hanauer, 2014). The impact of no-show appointments includes disruption of efficient operations of the clinics, provider productivity, decreased access to care and depriving other patients of the opportunity to see a medical professional during no-show appointments.

Project Background


Hospital X is a pioneer tertiary hospital that provides a comprehensive range of medical and rehabilitative services for children, adolescents, adults and the elderly. Patients are usually referred to Hospital X by other medical institutions or they booked an appointment directly. Patients can be categorised according to their appointments with a doctor, an allied health professional or even both.

AY2017 ZAN Figure 1.png
Figure 1: Flow Chart of Different Visit Types


A patient’s first appointment begins with a diagnosis by a doctor and subsequent appointments are made according to the patient’s mental health status. If a patient does not have any appointment for a year, any subsequent appointment will have to be diagnosed by a doctor again (FV). Our project sponsor is a medical consultant working for Hospital X. He specialises in tending to younger patients from the age of 18 years old and below. He hopes to tap into the under-utilised administrative data that is collected by the hospital daily. According to our project sponsor, Hospital X experiences high no-show appointments rate of about 21% for first visits and 19% for review visits. Our project sponsor is keen on improving the access to care as missed appointments lead to longer appointment lead times, idle time and an overall reduced quality of care. This paper seeks to explore the no-show patterns of the patients’ appointments in Hospital X from 2015 to 2016.

Literature Review


Ma, Seemanta, Wu and Ng (2014) developed logistic regression and recursive partitioning models, using SAP records to predict patients’ no-show probabilities for each of the three clinics. The study included external information such as financial debt and reminder responses as predictor variables for no-show probability of patients. The results showed that there were some variations in the main predictor variables for no-show appointments among the three clinics.

Allaeddini, Yang, Reddy, Yu (2011) developed a hybrid probabilistic model that combines logistic regression as a population-based approach along with Bayesian inference as an individual-based approach for the no-show prediction model. The model included the effect of appointment characteristics such as number of previous appointments, appointment types and lead times in the next scheduled appointment. The study also highlighted that there are other types of disruption such as cancellation of appointments and patient lateness that may have an impact on the performance of the scheduling system.

Huang and Hanauer (2014) developed an evidence-based predictive model for no-show appointments and to improve overbooking approaches in outpatient settings to reduce the negative impact of no-shows. Factors like distance to the clinic, appointment characteristics, general demographic information and insurance information have been considered. One unique variable that this study has taken into account is the number of people in the household of the patient.

William, M.S.W and BCD (2001) provided explanations to deepen practitioners’ understanding and management of no-show appointments. The study showed that no-show behaviour is positively correlated with lower income, lower socioeconomic status and lower age. Patients with more serious psychological difficulties are particularly taxed by long waiting times. Michael et al. (2016) described patterns of no-show variation by patient age, gender, appointment age, and type of appointment request, using eight years’ worth of individual-level records. A multifactor analysis of variance (ANOVA) was performed characterize no-show and attendance rates and the impact of certain patient factors. One of the findings showed that the longer a patient has to wait for an appointment to be scheduled, the less likely is the patient to keep the first appointment.

A key distinction between our project and the literature review is that our project’s appointments can be further broken down into consultation with a doctor or an allied health professional. The reference [Ma, Seemanta, Wu and Ng, 2014] is especially relevant and similar to this project as the study was also conducted on outpatient clinics for a public hospital in Singapore. While most references shared the general consensus that no-show patient appointments are defined as patients who neither kept nor cancelled scheduled appointments, Huang and Hanauer (2014) brought up an interesting point that a cancelled appointment should be considered as no-show if it was cancelled with minimal lead time that the appointment slot cannot be filled. These findings are useful as a starting base to give us an idea of what is essential for the analysis as well as adding on to what other research studies had done. For example, the given dataset was lacking of some variables such as appointment age as seen in some of the secondary data. We can explore the data to determine if we could derive it instead. At the same time, only Huang and Hanauer (2014) accounted for the distance between the outpatient clinics and the patients’ residence as being a potential factor for no-show appointment. We can compute this variable and include it for our own analysis.

Data Cleaning & Preparation


The data had 77,205 records initially. The following diagram shows our team's general data cleaning procedures.

AY2017 ZAN Data Cleaning.png


New Variables Derived
As mentioned earlier, the given data does not have some variables, such as appointment age, that were highlighted by other research studies. Using Visit Date, we are able to compute the appointment lead time between a patient’s previous scheduled appointment and the next scheduled appointment. In addition, Clinic Switch is derived to study if there is any impact on the no-show rate of patients whose appointments are switched between the two clinics. There are 12,425 records of patients who have attended both clinics at least once.

After the data preparation process, we retained about 82% of the original data with 63,511 records left.

Geospatial Data Preparation


With two different clinics situated at different parts of Singapore, we realized that there are potential insights that could be gained by heading towards the geospatial direction. It adds two additional factors, location of the clinic and residence of patients into the analysis. Maps also make it easier for us to recognize patterns that were previously buried in rows and columns.
As the data only contains the postal districts and postal codes of the patients, we will need to derive the longitude and latitude points of each postal code. Other issues that arise were some patients have multiple postal codes as they have changed their residence over time and there were 2,503 records showing invalid postal district (denoted by 99).

AY2017 ZAN Figure 2.png
Figure 2: Distribution of Postal Districts


We cross-referenced patient records and managed to reduce the number of records showing invalid postal district to 216 records. We also updated the records to ensure that each patient will only have one postal code and postal district. With the advice of our project supervisor, we used Tableau 10 to generate the longitude and latitude points. With the longitude and latitude points, we can also derive the distances of patient’s residence to each clinic. Firstly, we need to convert the coordinates from World Geodetic System, WGS 84 to Singapore Coordinate System, SVY21 before using the below formula to compute the distances.

AY2017 ZAN Figure 3.png
Figure 3: Formulation of Distance from Clinic A or Clinic B


Postal District Analysis

AY2017 ZAN Figure 4.png
Figure 4: Postal District by Clinic Location


The postal district showed that the main bulk of the patients, in the dataset, resided in District 19. District 19 consists of general location around Serangoon Garden, Hougang and Punggol. Clinic B has a significant number of patients from District 19 due to the close proximity of its location. This can further be reflected by grouping the postal districts into the districts that the clinics are located, the next immediate districts and other districts.

AY2017 ZAN Figure 5.png
Figure 5: Distribution of Patients around Singapore


As seen in Figure 9, Clinic A is located in district 3 (highlighted in green) while clinic B is located in district 19 (highlighted in purple). Districts 1, 2, 4, 5, 6, 9, 10, 13, 18, 20 and 28 (highlighted in blue) are the immediate neighbours around the respective clinic’s district. The other districts are highlighted in red.