Difference between revisions of "ANLY482 AY2017-18T2 Group18/TeamDAcct Project Data"

From Analytics Practicum
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 49: Line 49:
 
|}
 
|}
  
<div style="border-style: solid; border-width:0; background: #0000cd; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Data</font></div>
+
<div style="border-style: solid; border-width:0; background: #0000cd; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Data Provided</font></div>
 
The data given to us is extracted from the company’s Enterprise Resource Planning (ERP) system. Because of a change in the ERP system in June 2016, the data in the previous system is either incomplete or un-retrievable. Hence, we have decided to use data which primarily ranges from June 2016 to December 2017. We do not have the authorization to access the ERP system. LS 2 supplies the data-set only when we make requests. Hence, the document collection took a considerably long period of time and we performed Data Cleaning, Transformation and Integration as whenever we receive more data.
 
The data given to us is extracted from the company’s Enterprise Resource Planning (ERP) system. Because of a change in the ERP system in June 2016, the data in the previous system is either incomplete or un-retrievable. Hence, we have decided to use data which primarily ranges from June 2016 to December 2017. We do not have the authorization to access the ERP system. LS 2 supplies the data-set only when we make requests. Hence, the document collection took a considerably long period of time and we performed Data Cleaning, Transformation and Integration as whenever we receive more data.
 
The documents we received were in different formats with some of them being hard copy. We have categorized each file according to its nature and the kind of information it contains. A summary table of the data-sets we collected could be seen below.
 
The documents we received were in different formats with some of them being hard copy. We have categorized each file according to its nature and the kind of information it contains. A summary table of the data-sets we collected could be seen below.
Line 65: Line 65:
 
The main challenges faced are as follows:
 
The main challenges faced are as follows:
  
'''Challenge 1 - Absence of Integrated Data Management System ''': The main challenge came from the absence of integrated data management system. LS 2 uses an ERP system. However, many of the documents are maintained out of the ERP system. Often these documents have no proper connection established among themselves and with the documents extracted from the ERP system. The presence of disjointed data-sets necessitated the creation of a master list of projects with unique serial number assigned.
+
'''1) Absence of Integrated Data Management System ''': The main challenge came from the absence of integrated data management system. LS 2 uses an ERP system. However, many of the documents are maintained out of the ERP system. Often these documents have no proper connection established among themselves and with the documents extracted from the ERP system. The presence of disjointed data-sets necessitated the creation of a master list of projects with unique serial number assigned.
  
'''Challenge 2 - Data Maintained in Non-tabular / Excel Format''': The fact that some of the data was maintained in .doc and hard-copy made it difficult for us to create a data-set. As the data set was not in tabular or excel format, we had to manually key in a considerable amount of data. Thus, the challenge of having to transform text information into a quantifiable table format led to a rather time-consuming process.
+
'''2) Data Maintained in Non-tabular / Excel Format''': The fact that some of the data was maintained in .doc and hard-copy made it difficult for us to create a data-set. As the data set was not in tabular or excel format, we had to manually key in a considerable amount of data. Thus, the challenge of having to transform text information into a quantifiable table format led to a rather time-consuming process.
 +
 
 +
'''3) Incomplete Data-set''': During the process of creating a master list of projects, there were projects which we are unable to cross reference (have no matching reference instances) to different data sources. This implies that there are projects with incomplete data-set. This decreased the number of projects from which we could get insights.
  
'''Challenge 3 - Incomplete Data-set''': During the process of creating a master list of projects, there were projects which we are unable to cross reference (have no matching reference instances) to different data sources. This implies that there are projects with incomplete data-set. This decreased the number of projects from which we could get insights.
 
  
Other than these general challenges, more specific elaboration on the data preparation process are elaborated in the remarks of the following pages, such as treatment of missing values, treatment of duplicates and exclusion of projects.
 
  
  
 
</div><br>
 
</div><br>

Latest revision as of 22:06, 13 April 2018

TeamDAcctnew.png

Home About Us Project Overview Project Findings Project Management Documentation ANLY482 Homepage

 

Data Provided

The data given to us is extracted from the company’s Enterprise Resource Planning (ERP) system. Because of a change in the ERP system in June 2016, the data in the previous system is either incomplete or un-retrievable. Hence, we have decided to use data which primarily ranges from June 2016 to December 2017. We do not have the authorization to access the ERP system. LS 2 supplies the data-set only when we make requests. Hence, the document collection took a considerably long period of time and we performed Data Cleaning, Transformation and Integration as whenever we receive more data. The documents we received were in different formats with some of them being hard copy. We have categorized each file according to its nature and the kind of information it contains. A summary table of the data-sets we collected could be seen below.

Data grp18 1.PNG
Data grp18 2.PNG


Data Cleaning and Transformation

After examining, exploring and understanding each data file supplied to us by LS 2, to ensure the data is suitable for our analysis and model building, we performed the following data cleaning, transformation and integration. The following pointers below represent our general flow in the data preparation process.

DACCTdatacleaning.PNG

The main challenges faced are as follows:

1) Absence of Integrated Data Management System : The main challenge came from the absence of integrated data management system. LS 2 uses an ERP system. However, many of the documents are maintained out of the ERP system. Often these documents have no proper connection established among themselves and with the documents extracted from the ERP system. The presence of disjointed data-sets necessitated the creation of a master list of projects with unique serial number assigned.

2) Data Maintained in Non-tabular / Excel Format: The fact that some of the data was maintained in .doc and hard-copy made it difficult for us to create a data-set. As the data set was not in tabular or excel format, we had to manually key in a considerable amount of data. Thus, the challenge of having to transform text information into a quantifiable table format led to a rather time-consuming process.

3) Incomplete Data-set: During the process of creating a master list of projects, there were projects which we are unable to cross reference (have no matching reference instances) to different data sources. This implies that there are projects with incomplete data-set. This decreased the number of projects from which we could get insights.