Difference between revisions of "IISSS608 2017-18 T3 Assign Vigneshwar Ramachandran Vadivel- Data Cleaning"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 61: Line 61:
 
<li>Use the tabulate function to count the number of times the concatenated cell appears in the column - this helps to find out the weight of the unique direction of communication.</li>
 
<li>Use the tabulate function to count the number of times the concatenated cell appears in the column - this helps to find out the weight of the unique direction of communication.</li>
 
<li>Update the table with the weight of the column. Change the column name to "Weight".</li>
 
<li>Update the table with the weight of the column. Change the column name to "Weight".</li>
 
To create the nodes csv file:
 
<ol><li> Use the tablulate function on JMP to count the number of times a certain ID has appeared in both the "Source" and "Target" column. Merge the files together and tabulate again to find out the unique nodes present in the edges csv file.</li>
 
<li>Name the column "id", so that Gephi can detect the ids of the nodes.</li></ol>
 
 
===Communication Data from 11am to 130pm on Sunday===
 
To create the edges csv file:
 
<ol><li>Carry out Steps 2-5 above for the communication data on Sunday.</li>
 
<li>Retain only the communication data between 11am and 1.30pm.</li>
 
<li>To create the Timestamp column, use the current timestamp format, mm/dd/yyyy hh:mm.</li>
 
  
 
</ol>
 
</ol>

Revision as of 10:40, 3 July 2018

MC3 Banner.png

Intro

Approach

Findings

Conclusion

Main Page

Data Cleaning

Data Organisation

Data Exploration

Data Cleaning

Communication Data

Initial part of our analysis requires us to explore the communication pattern of the company. In order to analyse the pattern, we have to combine the four-communication dataset of Kasios into one single data to have the holistic view. Using Tableau prep all the four data can be merged and transformed into our necessary format.
The date format of the data has been given in Unix epoch timestamp beginning from May 11, 2015 at 14:00. To transform this into our timestamp value we can derive the following calculated field,

Network Data

Once we transform our data we can explore the communication pattern and growth of the company with this merged dataset.

For our later part of the analysis the data has to be transformed in such a way that it would be readable on Gephi for basic network visualisations. Using the insider kasios suspicious data, I The following flow shows the data processing involved in this process using Tableau prep,

Aggregated Communication Data

To create the edges csv file:

  1. 1. Concatenate communication data for all different transactions into one source of information.
  2. Change column data type of "From" and "To" from continuous to nominal.
  3. Rename columns "From" and "To" to "Source" and "Target".
  4. Create a column "Type" and ensure that all the fields of the column are filled with "Directed" - This helps to specify the direction of the communication.
  5. Create a new column with the formula "Source || Target", as this concatenates the Source and Target column.
  6. Use the tabulate function to count the number of times the concatenated cell appears in the column - this helps to find out the weight of the unique direction of communication.
  7. Update the table with the weight of the column. Change the column name to "Weight".

To create the nodes csv file:

  1. Merge the company index with the combined file together and find out the unique nodes present in the edges csv file.
  2. Name the column "id", so that Gephi can detect the ids of the nodes.
  3. Add the label as name of the employee involved in the transaction