Difference between revisions of "Come back after 30 days!/Findings"
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 24: | Line 24: | ||
<div style="background: #fdf5e6; padding: 13px; font-weight: bold; text-align: left; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"><font color= #3d3d3d>Summary of findings by week</font></div> | <div style="background: #fdf5e6; padding: 13px; font-weight: bold; text-align: left; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"><font color= #3d3d3d>Summary of findings by week</font></div> | ||
+ | |||
+ | This is the summary of pointers and major decisions the team have taken in this data mining project. Visit our documentation page for slides for more information. | ||
{| class="wikitable" style="font-size:110%; margin: 1em auto 1em auto; text-align: center" | {| class="wikitable" style="font-size:110%; margin: 1em auto 1em auto; text-align: center" | ||
Line 57: | Line 59: | ||
Week 5 | Week 5 | ||
|align="left"| | |align="left"| | ||
− | *Exploratory data analysis: Thought of the ways to handle 2 variables that have missing values: proposed regression imputation for the variable that has 40% values missing but was cautioned that it may introduce errors. | + | *Exploratory data analysis: Thought of the ways to handle 2 variables that have missing values: proposed regression imputation for the variable that has 40% values missing but was cautioned that it may introduce errors. Literature was obtained mostly from [http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html David Howell's page] |
+ | |||
*Consultation with Prof. Kam: | *Consultation with Prof. Kam: | ||
**For the variable that has 40% missing values, we were advised to conduct a two-pronged approach (i.e. a model without 40% of the data and a model without the variable entirely) in which the eventual models can be used to compare predictive power and therefore, able to make a judgment as to whether the variable was considered a predictor. Were advised that this sort of judgment can be considered as an eventual recommendation. | **For the variable that has 40% missing values, we were advised to conduct a two-pronged approach (i.e. a model without 40% of the data and a model without the variable entirely) in which the eventual models can be used to compare predictive power and therefore, able to make a judgment as to whether the variable was considered a predictor. Were advised that this sort of judgment can be considered as an eventual recommendation. | ||
Line 64: | Line 67: | ||
|- | |- | ||
+ | |- | ||
+ | |align="left"| | ||
+ | Week 6 | ||
+ | |align="left"| | ||
+ | *Exploratory data analysis: Informed of the need to test for statistical independence. There are 2 areas of concern: | ||
+ | **Patient who have more than 2 encounters at the hospital: Patients in this category skew the number of readmissions within 30 days higher as their earlier admissions have indicated "Readmitted". Using the Pearson's chi square test, the p-value arrived at is 0 (X^2 = 184035, degrees of freedom = 1), indicating that the null hypothesis of patients who are admitted more than once is statistically independent should be rejected. Therefore, we are only using 1 encounter per patient which the corresponding Readmitted variable indicates their successive readmission. | ||
+ | **14 variables that are deemed insignificant are removed from the model construction because their distributions are one-class only (e.g. acetohexamide has 99.9% of values of "No") | ||
+ | |- | ||
|+ | |+ |
Latest revision as of 14:27, 23 February 2015
[|Dashboard] | [|Project Overview] | [|Findings] | [|Documentation] | [|About Us] |
Summary of findings by week
This is the summary of pointers and major decisions the team have taken in this data mining project. Visit our documentation page for slides for more information.
Week 2 |
|
Week 3 |
|
Week 4 |
|
Week 5 |
|
Week 6 |
|