ISSS608 2016-17 T1 Assign3 Chris Thng Ren Jing Reflections
|
|
|
This is for people who like me, always get lost in the beginning and need some reference point and general approach of doing things. Gephi does not have much information out there and this dataset is huge, not a great combination when attempting an assignment. Hence, hopefully these reflections help whoever is reading this as a reference point.
Learning a new Visualization Technique
Gephi is a great tool for dynamic network visualizations. This is a tool that I am unfamiliar with, it does not have any similarity to other programs such as Tableau, JMP that are similar to Excel. It was completely unfamiliar to me, however although it proved challenging, practise does make perfect along with reading up the reference links & notes provided by Prof Kam.
Personal reflections aside, the network and timeline functionality within Gephi allows many patterns to be identified, this helped me achieve the purpose of visualizing communication patterns of the Dinopark Communication datasets. The time interval creation was not easy in the beginning: some steps you need to take,
- have a dataset with timestamp/time interval
- make sure its in a proper format e.g. DDMMYYYY which Gephi recognizes
- you can have time variables in both your nodes & edges (the benefit of having it in your node, the nodes are interactive; appear and disappear based on time as compared to being static)
- navigate to data laboratory, click merge columns, for your node/edges (depends if you have time in both or one, do accordingly)
- click create time interval (you don't need a start/end column although you can have. Timestamp would be good enough as Gephi can create the time interval based on timestamp)
- Start/End: Timestamp or if you have a start/end datapoint, Start/End: Start Time/End Time
- choose the exact format of your timestamp so that you don't get errors
- click ok, your done, Enable timeline should appear (else go to the Window Tab>Timeline)
- In order to have your nodes increased based on degree/whatever your measuring go to "Appearance"> notice an infinity sign (horizontal 8..), click it "Enables auto transformation">Apply
- For color schemes and formatting, go figure!
However, some limitations faced were the UI. It is not predictive and there are few resources/tutorials online to be used as reference, hence a great amount of time was spent in experimenting different combinations and results. To add on, there is no undo function, hence each time I had to reload my dataset to continue experimenting. The dataset too had to be prepared in Gephi format, e.g. Nodes: ID, Label, Time Edges: Source, Target, Directed/Undirected, Time and etc. For others who are new and want to learn more, a good start would be: https://www.youtube.com/user/jengolbeck, she provides quite detailed step by step tutorials. If you have many nodes (millions), you can refer to: https://gephi.org/users/install/ to increase the memory allocation to have larger datasets and avoid constant "JVM Failed" messages.
In summary, Gephi:
+ Good for network visualizations: Emails, Phone calls, Database, any communication in general (as long as you can prepare the data in the required format)
- No undo, few tutorials, large datasets do not run very well
Approach to analyzing the dataset
The datasets for both communication and movement add up to over 40 to 50 million rows of data. It is inevitable to get lost at first, however the approach I took was to take a top-down approach, focus on the largest communication datas, drill down, focus on certain aspects, drill back up/down and iterate accordingly. The movement dataset, should definitely not be ignored, but be of a less priority. First identify meaningful communication patterns then try to link it with the movement data. Else start from the movement data and link it back to the communication data. This allowed me to build up my story based on the patterns I was getting.