Difference between revisions of "ANLY482 AY2017-18T2 Group14 Interim"

From Analytics Practicum
Jump to navigation Jump to search
Line 82: Line 82:
 
1. Unfamiliarity of MSSQL and Power BI:
 
1. Unfamiliarity of MSSQL and Power BI:
 
Prior to this project, we don’t have any prior experience on these tools, Thus, at the beginning of the project, we invested plenty of time in learning and familaring these tools.  
 
Prior to this project, we don’t have any prior experience on these tools, Thus, at the beginning of the project, we invested plenty of time in learning and familaring these tools.  
<br>
+
<br><br>
 
2. Lack of domain knowledge:  
 
2. Lack of domain knowledge:  
 
Domain knowledge is essential in understanding the dataset given, due to the incomplete data definition, we spent a lot of time figuring out the meaning of data, consolidating and documenting the data dictionary.  
 
Domain knowledge is essential in understanding the dataset given, due to the incomplete data definition, we spent a lot of time figuring out the meaning of data, consolidating and documenting the data dictionary.  
<br>
+
<br><br>
 
3. Communicate with users have non-IT background:  
 
3. Communicate with users have non-IT background:  
 
We found it is challenging to communicate with users that have limited IT background. Our project sponsor is from operation management. Thus, when we explain some technical complexity to project sponsor, we need to put it into simple and plain words.  
 
We found it is challenging to communicate with users that have limited IT background. Our project sponsor is from operation management. Thus, when we explain some technical complexity to project sponsor, we need to put it into simple and plain words.  
<br>
+
<br><br>
 
4. Data inconsistency (inconsistent data type, data columns, data values):  
 
4. Data inconsistency (inconsistent data type, data columns, data values):  
 
Data collected from the project sponsor is stored in different places with different formats. Besides, the variable type and variable values are highly inconsistent.  
 
Data collected from the project sponsor is stored in different places with different formats. Besides, the variable type and variable values are highly inconsistent.  
Line 95: Line 95:
 
<br>
 
<br>
 
<div style="background: #EAEAEA; padding: 10px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"><font color= #3d3d3d>Limitations</font></div>
 
<div style="background: #EAEAEA; padding: 10px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"><font color= #3d3d3d>Limitations</font></div>
 
+
<font face ="Open Sans" size=4>
 
<br>
 
<br>
 +
1. Existing excel report format is not replicable for ODD 2016 data, as some columns used to produce report are missing in 2016, such as, BU/AH, hasRDcp
 +
<br><br>
 +
2. The maximum data processing capacity of Power BI is 10 GB, the size of the two-year data is about 28 GB, which is much larger than the limitation. This means that the data has to broken down into smaller pieces to be processed in PowerBI.
 +
<br><br>
 
<div style="background: #EAEAEA; padding: 10px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"><font color= #3d3d3d>Next Phase</font></div>
 
<div style="background: #EAEAEA; padding: 10px; font-weight: bold; text-align:center; line-height: wrap_content; text-indent: 20px;font-size:20px; font-family:helvetica"><font color= #3d3d3d>Next Phase</font></div>
 
<br>
 
<br>

Revision as of 20:22, 25 February 2018

Anly4821718T2G14Logo.png

HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

ANLY482 Main Page

 

 

Understanding Data

In order to understand the structure of OPMS and ODD data sets, we acquired OPMS and ODD fields definition files from our client. However, we found several issues from the definition files provided, which are listed as following:
 1. certain variables are not being defined;
 2. some variables have different names;
 3. all data fields have no indicated data types.

As such, we decided to load sample data of OPMS and ODD into SAS Enterprise Guide to help us understand the metadata.

We do column summary (e.g. one way frequency) for each field and record the data format. Then we consolidate all information to build data definition files for each data set.


Data Preparation


Unify ODD Data Format

There are 2 separate data format present in the provided ODD Data.

Q1-Empty Reading Fig 1.1.png
Fig. 1-1




Data Exploration



Challenges


1. Unfamiliarity of MSSQL and Power BI: Prior to this project, we don’t have any prior experience on these tools, Thus, at the beginning of the project, we invested plenty of time in learning and familaring these tools.

2. Lack of domain knowledge: Domain knowledge is essential in understanding the dataset given, due to the incomplete data definition, we spent a lot of time figuring out the meaning of data, consolidating and documenting the data dictionary.

3. Communicate with users have non-IT background: We found it is challenging to communicate with users that have limited IT background. Our project sponsor is from operation management. Thus, when we explain some technical complexity to project sponsor, we need to put it into simple and plain words.

4. Data inconsistency (inconsistent data type, data columns, data values): Data collected from the project sponsor is stored in different places with different formats. Besides, the variable type and variable values are highly inconsistent.


Limitations


1. Existing excel report format is not replicable for ODD 2016 data, as some columns used to produce report are missing in 2016, such as, BU/AH, hasRDcp

2. The maximum data processing capacity of Power BI is 10 GB, the size of the two-year data is about 28 GB, which is much larger than the limitation. This means that the data has to broken down into smaller pieces to be processed in PowerBI.

Next Phase


In the next sprint, we will be continuously working on the excel report output0 to output3. As requested by project sponsor, we will keep the origin format for the management team and at the same time, polish origin report to make it more interactive. We aim to finish this by 15 Mar 2018.

Also, we will start working on insight discovery. TBC