Difference between revisions of "IISSS608 2017-18 T3 Assign Vigneshwar Ramachandran Vadivel- Data Cleaning"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 38: Line 38:
 
<!------- End of Secondary Navigation Bar---->
 
<!------- End of Secondary Navigation Bar---->
  
=Data Cleaning=
+
 
==Communication Data==
+
=Communication Data=
 
Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view.
 
Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view.
 
Using Tableau prep all the four data can be merged and transformed into our necessary format.<br>
 
Using Tableau prep all the four data can be merged and transformed into our necessary format.<br>
 
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,
 
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,
  
==Network Data==
+
=Network Data=
 
Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.
 
Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.
  
 
For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I  
 
For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I  
 
The following flow shows the data processing involved in this process using Tableau prep,<br>
 
The following flow shows the data processing involved in this process using Tableau prep,<br>
[[File:A1_MC3_Data_Flow.pn|centre|frameless|link=ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel
+
[[File:A1_MC3_Data_Flow.png|centre|frameless|link=ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel
 
]]  
 
]]  
  
===Aggregated Communication Data===
+
==Aggregated Communication Data==
 
To create the edges csv file:
 
To create the edges csv file:
 
<ol><li>Concatenate communication data for all different transactions into one source of information.</li>
 
<ol><li>Concatenate communication data for all different transactions into one source of information.</li>

Revision as of 10:44, 3 July 2018

MC3 Banner.png

Intro

Approach

Findings

Conclusion

Main Page

Data Cleaning

Data Organisation

Data Exploration


Communication Data

Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view. Using Tableau prep all the four data can be merged and transformed into our necessary format.
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,

Network Data

Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.

For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I The following flow shows the data processing involved in this process using Tableau prep,

A1 MC3 Data Flow.png

Aggregated Communication Data

To create the edges csv file:

  1. Concatenate communication data for all different transactions into one source of information.
  2. Change column data type of "From" and "To" from continuous to nominal.
  3. Rename columns "From" and "To" to "Source" and "Target".
  4. Create a column "Type" and ensure that all the fields of the column are filled with "Directed" - This helps to specify the direction of the communication.
  5. Create a new column with the formula "Source || Target", as this concatenates the Source and Target column.
  6. Use the tabulate function to count the number of times the concatenated cell appears in the column - this helps to find out the weight of the unique direction of communication.
  7. Update the table with the weight of the column. Change the column name to "Weight".

To create the nodes csv file:

  1. Merge the company index with the combined file together and find out the unique nodes present in the edges csv file.
  2. Name the column "id", so that Gephi can detect the ids of the nodes.
  3. Add the label as name of the employee involved in the transaction