ISSS608 2016-17 T1 Assign3 Wan Xulang

From Visual Analytics and Applications
Jump to navigation Jump to search

Abstract

During 2014, Jun, 6th to 8th, a modest-sized amusement park named DinoFun World was holding a ceremony named “Scott Jones Weekend”. However, things didn’t happen as planned before since there’s a crime was committed. While the problem was solved rapidly, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.

Problem

In this project, basically we’ve three problems to solve, they are:

  1. Identify those IDs that stand out for their large volumes of communication. For each of these IDs:
    1. Characterize the communication patterns you see.
    2. Based on these patterns, what do you hypothesize about these IDs? Note: Please limit your response to no more than 4 images and 300 words.
  2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime. Note: Please limit your response to no more than 10 images and 1000 words.
  3. From this data, can you hypothesize when the vandalism was discovered? Describe your rationale. Note: Please limit your response to no more than 3 images and 300 words.

Data Introduction & Preparation

Introduction

In this project, we are provided the movement and communication data for each person in this park within these three days. However, the size of these data sets is quite big for a personal laptop. So in further analysis, we may do some necessary reduction of these data sets. In the preparation part, we may only cover some basic solutions while we'll give further descriptions in specific approaches if needed.

Preparation

During the preparation, we should do these things first:

  1. Respectively, merge the data of communication and movement of different days together.
  2. Change two columns' name of communication data to source and target which will be helpful in doing network analysis.
Merged-comm.PNG

As shown above, we'll get something like this. For other small changes in analysis part, we may not cover them here.

Approaches

Task-1

To find out significant IDs and characterize them, we may first try to figure out those IDs with large volume of messages. So we calculate the distribution of sending and receiving messages among each person respectively.

Comm-distribution.PNG

As shown above, basically we can have three significant IDs here: 839736, 1278894 and external. However, we may not concern about ‘external’ here for a while. So, to get better understanding of ID-1278894 and ID-839736, we may try to mine their active patterns first. Basically, we would try to find their communication activities according to the time.

Sent-1278894.PNG

As shown above, we find that, ID-1278894 always keep a large volume of sending message during these days. We see that, within one day, the volumes of communications of different time are almost the same. It seems that ID-1278894 is sending messages from 8:00 am to 21:00 pm every day and never feel tired.

Sent-839763.PNG

Compare with ID-1278894, ID-839736 also sending messages from 8:00 am to 21:00 pm everyday while the volume is not that huge. An interesting point is that there’s peak time on the third day which began from June, 8th 12:01 which is marked by a red circle in the graph. However, we should take note of this since it will be very helpful for further analysis.

Sending-location.PNG

In the last stage, we may take a look about the location when these two people are sending messages. As shown above, they sent all their messages in the entry corridor which implies that they never leave this place during these three days! So, based what we’ve found above, we have such assumptions of these two IDs.

  1. ID-1278894 is an automatic message sender which is developed by the part. It’s used to send necessary information to all the people in the park.
  2. ID-839736 is a kind of official person of this park. He always sit in the entry corridor and send messages manually when needed.

Task-2

11:30 to 12:30 eigenvector centrality force atlas

Overall.PNG
Main-part.PNG
Travel-groups.PNG
Normalgroups.PNG
Lonelyfriends.PNG
Official.PNG

Task-3

Sent2external.PNG
Sent-to-839763.PNG
Happenlocation-1.PNG
Location.PNG

https://public.tableau.com/profile/xulang.wan#!/vizhome/MovementMap_0/Dashboard1

Conclusion & Summary

Tool Utilized

Software: JMP Pro, Tableau and Gephi