Difference between revisions of "ZAN Project Findings"

Revision as of 11:23, 22 February 2017

Data Cleaning

The data had 77,205 records initially. The following diagram shows our team's general data cleaning procedures.

After the the data cleaning, the data now has 63,511 records.

Data Exploration

Due to the sensitivity and confidentiality of the data, please refer to the elearn dropbox or send us an email.

Data Modelling

Due to the nature of the data, our team has decided to prepare 3 separate analytical sandboxes for the models.

Sandbox 1 (Per episode): The dependant variable has only 2 levels. Thus, we will run logistic regression and decision tree.
Sandbox 2 (Per episode): The dependant variable has 3 levels. Thus, we will run multinomial logistic regression and decision tree.
Sandbox 3 (Per patient): The dependant variable has 3 levels. Thus, we will run multinomial logistic regression and decision tree.

@@ Line 48: / Line 48: @@
 </center>
 <br/>
-After the the data cleaning, the data now has 63,511 records
+After the the data cleaning, the data now has 63,511 records.
 <br/>
 <div align="left">
 <div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #2E8B57 solid 32px;"><font color="##4682B4">Data Exploration</font></div>
+Due to the sensitivity and confidentiality of the data, please refer to the elearn dropbox or send us an email.
 <br/>
-Due to the sensitivity and confidentiality of the data, please refer to the elearn dropbox or send us an email.
 <div align="left">
 <div style="background: #F5FFFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #2E8B57 solid 32px;"><font color="##4682B4">Data Modelling</font></div>
+Due to the nature of the data, our team has decided to prepare 3 separate analytical sandboxes for the models.
+<br/>
+# Sandbox 1 (Per episode): The dependant variable has only 2 levels. Thus, we will run logistic regression and decision tree.
+# Sandbox 2 (Per episode): The dependant variable has 3 levels. Thus, we will run multinomial logistic regression and decision tree.
+# Sandbox 3 (Per patient): The dependant variable has 3 levels. Thus, we will run multinomial logistic regression and decision tree.
 <br/>
-Due to the nature of the data, our team has decided to prepare 3 separate analytical sandboxes for the models.

Difference between revisions of "ZAN Project Findings"

Revision as of 11:23, 22 February 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools