Difference between revisions of "ISSS608 2016-17 T1 Assign3 Liu Jialin"
Line 3: | Line 3: | ||
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3856; text-align:center;" width="25%" | | | style="font-family:Century Gothic; font-size:100%; solid #000000; background:#2B3856; text-align:center;" width="25%" | | ||
; | ; | ||
− | [[ISSS608_2016-17_T1_Assign3_Liu_Jialin| <font color="#FFFFFF"> | + | [[ISSS608_2016-17_T1_Assign3_Liu_Jialin| <font color="#FFFFFF">Data Preparation</font>]] |
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" | | | style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" | |
Revision as of 20:58, 26 October 2016
|
|
|
|
Data preparation using JMP:
• Use concatenate function to combine the communication records across 3 days into one file, name file Communication in 3 days.
• Sort ascending on “timestamp”, then sort ascending on “from”. Now the messages send by the same ID appear together and appear in time order.
• Hide and exclude all messages to and from 1278894 and 839736. Unexclude the rows accordingly when needed.
• Create a column, name it "Unique Combination", apply formula: “Char(:from) || Char(:to)”.
• Tabulate "Unique Combination" and N, make into data table, name file Unique direction count of messages.
• In Unique direction count of messages, change column name “N” to “weight”.
• Update Communication in 3 days from Unique direction count of messages, update with “weight” column.
• In Communication in 3 days, create a column called “Timestamp difference in min”, apply formula “Dif(:Timestamp, 1) / 60”.
• Save file as Edges for communication. Remove column “Timestamp difference in min”, sort ascending by “Unique Combination”.
• Unlock the “Unique combination” row, change row information from characters to numerical, continuous.
• Create a new column, name it “remove duplicates”, and apply formula “Dif(:Unique Combination, 1)”.
• Select all rows with remove duplicates = 0, these are duplicate rows, delete these rows.
• Delete columns “Location”, “Timestamp” and “remove duplicates”.
• Change “from” to “Source” and “to” to “Target”
• Save file, export as excel, name exported file Nodes for communication.
• In excel, copy all the Target nodes at the end of Source column. Remove duplicates for this column. Delete the Target Column.
• Change “Source” to “ID”.
Gephi:
• Import into Gephi using Nodes for communication and Edges for communication.
• In Gephi, using Hu Yifan layout, change optimal distance to 200, run the layout to obtain a satisfactory layout.
• Set nodes size depends on Degree and nodes colour to depend on Out-degree.
• Set colour of edges to depend on weight.
• In filter, select topology, drag Mutual Degree into filter. Change the filters to obtain the filtered layouts.
• In context, check the number of nodes remained using this filter.