Difference between revisions of "AY1516 T2 Team13 Natasha Studio Project Overview Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 47: Line 47:
 
==<div style="background: #B0C4DE; padding: 10px; font-family:Arial; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 3px; padding 0px 18px 0 px 18 px;"><font color="white">METHODOLOGY</font></div>==
 
==<div style="background: #B0C4DE; padding: 10px; font-family:Arial; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 3px; padding 0px 18px 0 px 18 px;"><font color="white">METHODOLOGY</font></div>==
  
<b>Tools used:</b>
+
<b><u>Tools used</u></b>
The main tools that will be used are Microsoft Excel and R. Microsoft Excel will be used for the data preparation and cleansing process as it is preferred by our sponsor. R will be used mainly to build and evaluate the models. The open source nature of R would also allow our client to also use it with future data.  
+
<br>The main tools that will be used are Microsoft Excel and R. Microsoft Excel will be used for the data preparation and cleansing process as it is preferred by our sponsor. R will be used mainly to build and evaluate the models. The open source nature of R would also allow our client to also use it with future data.  
<br>
+
<br><br>
<b>Data Extraction:</b>
+
<b><u>Data Extraction</u></b>
At present, data from July 2010 – August 2012 is available to us on Microsoft Excel. At Fall 2012, the business owner decided to stop using the system. Thus, purchases and attendance were recorded on paper. As a result, work has to be done during the data assessment process to enter data into the spreadsheet so that there are more data to work with.
+
<br>At present, data from July 2010 – August 2012 is available to us on Microsoft Excel. At Fall 2012, the business owner decided to stop using the system. Thus, purchases and attendance were recorded on paper. As a result, work has to be done during the data assessment process to enter data into the spreadsheet so that there are more data to work with.
<br>
+
<br><br>
<b>Data Preparation:</b>
+
<b><u>Data Preparation</u></b>
In addition to the data that is missing from 2013-2015, the current data set that was presented to us would require significant efforts in data cleansing due to the inconsistency, missing data fields and duplicates that has been resulted from bad practices throughout the 3 years. We foresee that huge amount of time will be used for this process.
+
<br>In addition to the data that is missing from 2013-2015, the current data set that was presented to us would require significant efforts in data cleansing due to the inconsistency, missing data fields and duplicates that has been resulted from bad practices throughout the 3 years. We foresee that huge amount of time will be used for this process.
<br>
+
<br><br>
<b>Data Validity:</b>
+
<b><u>Data Validity</u></b>
As of the data that was presented to us, there were a total of 1717 members and 3044 purchases. The numbers are not final as data cleansing has yet to been done during this stage of the project. However, based on visual observation of the data, we are confident that it should not deviate too far away from the reported numbers as shown above.  
+
<br>As of the data that was presented to us, there were a total of 1717 members and 3044 purchases. The numbers are not final as data cleansing has yet to been done during this stage of the project. However, based on visual observation of the data, we are confident that it should not deviate too far away from the reported numbers as shown above.  
 
In order to perform a substantial analysis on the data, a rough estimate of 5000 data points is required for the proposed techniques shown below. The current data set is not sufficient as it does not meet the required sample size. However, the business owner has informed us that he has a rough estimate of 5300 members. As such, we are confident that after the data assessment process, we will have sufficient data to work on.
 
In order to perform a substantial analysis on the data, a rough estimate of 5000 data points is required for the proposed techniques shown below. The current data set is not sufficient as it does not meet the required sample size. However, the business owner has informed us that he has a rough estimate of 5300 members. As such, we are confident that after the data assessment process, we will have sufficient data to work on.

Revision as of 23:25, 9 January 2016

HOME

TEAM

PROJECT OVERVIEW

FINDINGS & ANALYSIS

PROJECT MANAGEMENT

DOCUMENTATION

BACKGROUND DATA METHODOLOGY


SCOPE OF WORK

Coming soon


METHODOLOGY

Tools used
The main tools that will be used are Microsoft Excel and R. Microsoft Excel will be used for the data preparation and cleansing process as it is preferred by our sponsor. R will be used mainly to build and evaluate the models. The open source nature of R would also allow our client to also use it with future data.

Data Extraction
At present, data from July 2010 – August 2012 is available to us on Microsoft Excel. At Fall 2012, the business owner decided to stop using the system. Thus, purchases and attendance were recorded on paper. As a result, work has to be done during the data assessment process to enter data into the spreadsheet so that there are more data to work with.

Data Preparation
In addition to the data that is missing from 2013-2015, the current data set that was presented to us would require significant efforts in data cleansing due to the inconsistency, missing data fields and duplicates that has been resulted from bad practices throughout the 3 years. We foresee that huge amount of time will be used for this process.

Data Validity
As of the data that was presented to us, there were a total of 1717 members and 3044 purchases. The numbers are not final as data cleansing has yet to been done during this stage of the project. However, based on visual observation of the data, we are confident that it should not deviate too far away from the reported numbers as shown above. In order to perform a substantial analysis on the data, a rough estimate of 5000 data points is required for the proposed techniques shown below. The current data set is not sufficient as it does not meet the required sample size. However, the business owner has informed us that he has a rough estimate of 5300 members. As such, we are confident that after the data assessment process, we will have sufficient data to work on.