Difference between revisions of "ZAN Project Findings"
Jump to navigation
Jump to search
Line 31: | Line 31: | ||
<!--Sub Navigation--> | <!--Sub Navigation--> | ||
{|style="background-color:#ffffff; color:#000000; font-size:10pt" padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"| | {|style="background-color:#ffffff; color:#000000; font-size:10pt" padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"| | ||
− | |style="padding:0.4em; text-align:center; border-top:1px solid #ffffff; border-bottom:1.5px solid #005ae6; " width="10%" | [[ | + | |style="padding:0.4em; text-align:center; border-top:1px solid #ffffff; border-bottom:1.5px solid #005ae6; " width="10%" | [[ZAN_Project Findings |<font color="#000a1a"><b>Mid-Term Progress</b></font>]] |
Revision as of 12:27, 15 April 2017
Mid-Term Progress
|
Final Progressnew! |
Data Cleaning
The data had 77,205 records initially. The following diagram shows our team's general data cleaning procedures.
After the the data cleaning, the data now has 63,511 records.
Data Exploration
Due to the sensitivity and confidentiality of the data, please refer to the elearn dropbox or send us an email.
Data Modelling
Due to the nature of the data, our team has decided to prepare 3 separate analytical sandboxes for the models.
- Sandbox 1 (Per episode): The dependant variable has only 2 levels. Thus, we will run logistic regression and decision tree.
- Sandbox 2 (Per episode): The dependant variable has 3 levels. Thus, we will run multinomial logistic regression and decision tree.
- Sandbox 3 (Per patient): The dependant variable has 3 levels. Thus, we will run multinomial logistic regression and decision tree.