Difference between revisions of "IISSS608 2017-18 T3 Assign Vigneshwar Ramachandran Vadivel- Data Cleaning"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 23: Line 23:
 
<center>
 
<center>
 
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px;" width="80%"|
 
{| style="background-color:#ffffff ; margin: 3px 10px 3px 10px;" width="80%"|
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; background-color: #f5f5f5" width="150px" | [[ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel_Approach|<font color="#3c3c3c"><strong>Main Page</strong></font>]]
+
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #f5f5f5; background-color: #f5f5f5" width="150px" |  
 +
[[ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel_Approach|<font color="#3c3c3c"><strong>About Dataset</strong></font>]]
  
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border-top:solid #f5f5f5; background-color: #fff" width="150px" |  
+
| style="font-family:Open Sans, Arial, sans-serif; font-size:15px; text-align: center; border:solid 1px #f5f5f5; background-color: #fff" width="150px" |   
   
+
[[IISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel-_Data_Cleaning|<font color="#3c3c3c"><strong>Data Cleaning</strong></font>]]
[[ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel-_Data_Cleaning|<font color="#3c3c3c"><strong>Data Cleaning</strong></font>]]
 
  
| }
+
|}
 
</center>
 
</center>
 +
 
<!------- End of Secondary Navigation Bar---->
 
<!------- End of Secondary Navigation Bar---->
  
  
=Communication Data=
+
=<div  style="font-family: Calibri;">Communication Data=
Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view.
+
<div  style="font-family: Calibri;">Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view.
 
Using Tableau prep all the four data can be merged and transformed into our necessary format.<br>
 
Using Tableau prep all the four data can be merged and transformed into our necessary format.<br>
 
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,
 
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,
 
[[File:A1_MC3_Timestamp.png|400px|centre|frameless|link=ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel
 
[[File:A1_MC3_Timestamp.png|400px|centre|frameless|link=ISSS608_2017-18_T3_Assign_Vigneshwar Ramachandran Vadivel
 
]]  
 
]]  
=Network Data=
+
=<div  style="font-family: Calibri;">Network Data=
Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.
+
<div  style="font-family: Calibri;">Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.
 
For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I created edges and nodes for our network visualization.<br>
 
For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I created edges and nodes for our network visualization.<br>
 
The following flow shows the data processing involved in this process using Tableau prep,<br>
 
The following flow shows the data processing involved in this process using Tableau prep,<br>
Line 47: Line 48:
 
]]  
 
]]  
  
==Aggregated Communication Data==
+
==<div  style="font-family: Calibri;">Aggregated Communication Data==
To create the edges csv file:
+
<div  style="font-family: Calibri;">To create the edges csv file:
 
<ol><li>Concatenate communication data for all different transactions into one source of information.</li>
 
<ol><li>Concatenate communication data for all different transactions into one source of information.</li>
 
<li>Change column data type of "From" and "To" from continuous to nominal.
 
<li>Change column data type of "From" and "To" from continuous to nominal.
Line 58: Line 59:
  
 
</ol>
 
</ol>
To create the nodes csv file:
+
<div  style="font-family: Calibri;">To create the nodes csv file:
 
<ol><li> Merge the company index with the combined file together and  find out the unique nodes present in the edges csv file.</li>
 
<ol><li> Merge the company index with the combined file together and  find out the unique nodes present in the edges csv file.</li>
 
<li>Name the column "id", so that Gephi can detect the ids of the nodes.</li>
 
<li>Name the column "id", so that Gephi can detect the ids of the nodes.</li>
 
<li>Add the label as name of the employee involved in the transaction</li></ol>
 
<li>Add the label as name of the employee involved in the transaction</li></ol>
 +
</div>

Latest revision as of 12:22, 8 July 2018

MC3 Banner.png

Intro

Approach

Findings

Conclusion

About Dataset

Data Cleaning


Communication Data

Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view.

Using Tableau prep all the four data can be merged and transformed into our necessary format.
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,

A1 MC3 Timestamp.png

Network Data

Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.

For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I created edges and nodes for our network visualization.
The following flow shows the data processing involved in this process using Tableau prep,

A1 MC3 Data Flow.png

Aggregated Communication Data

To create the edges csv file:
  1. Concatenate communication data for all different transactions into one source of information.
  2. Change column data type of "From" and "To" from continuous to nominal.
  3. Rename columns "From" and "To" to "Source" and "Target".
  4. Create a column "Type" and ensure that all the fields of the column are filled with "Directed" - This helps to specify the direction of the communication.
  5. Create a new column with the formula "Source || Target", as this concatenates the Source and Target column.
  6. Use the tabulate function to count the number of times the concatenated cell appears in the column - this helps to find out the weight of the unique direction of communication.
  7. Update the table with the weight of the column. Change the column name to "Weight".
To create the nodes csv file:
  1. Merge the company index with the combined file together and find out the unique nodes present in the edges csv file.
  2. Name the column "id", so that Gephi can detect the ids of the nodes.
  3. Add the label as name of the employee involved in the transaction