ISSS608 2016-17 T1 Assign3 Chris Thng Ren Jing Result
|
|
|
Results
1a) Identify those IDs that stand out for their large volumes of communication. For each of these IDs characterize the communication patterns you see.
This illustrates the degree of communication patterns for the 3 days in Dinopark.
Most notable IDs that stood out for large volumes of communication over the 3 days of communication data collected in Dinopark: 839736, external, 127889
Communication Pattern for ID 839736:
ID 839736 seems to be communicating with a lot of the visitors in Dinopark. It sends and receives messages at all times of the day. However, it is only communicating from the location of the Entry Corridor. It seems to be stationed at the Entry Corridor, appears in all 3 days, sends over 60,000 messages in the 3 days and lastly all its messages seemed to be a 1:1 send/receive.
ID 1278894 seems to be a message broadcast system. Sends messages only in the time frame from 12pm to 8:55pm. Does so every 5 minutes, every alternate hour. Based on the data it works in this manner, 12:00PM – 12:55PM, 2:00PM – 2:55PM and so forth. It does not send to all visitors, only some. It seems to be a question & response type of communication. Each time there is a communication made to the visitors ID, a response or no response will be made by the visitor (more outgoing messages than incoming).
Communication Pattern for ID external:
This ID seems to be just receiving messages from the IDs within Dinopark.
1b) Based on these patterns, what do you hypothesize about these IDs? Note: Please limit your response to no more than 4 images and 300 words.
1278894 is likely to be the Trivia Game application giving “hours of fun”. I can hypothesize that this Trivia Quiz application is not a mandatory application of use by visitors. They have the option to take part in this game. Every 5 minutes a new question / quiz will be communicated to the visitors who are taking part in the Trivia Game. The visitors (participants) who are taking part in it will receive the question and has the option to either respond or ignore. I used a validation formula to ensure this was the case, it is always questions (to) more than or equals to answers (from) not the opposite; more responses than questions. Additionally, 1278894 does not communicate with any external devices. Hence, I believe this ID represents the Park’s Trivia Game – Cindysaurus. It has a maximum of 60 questions and same goes for the responses.
839736 could possibly be manned by Dinopark employees who are feeding information to the visitors from their stationed booth (there are two information booths located in the Entry Corridor). It is unlikely to be an automated service welcoming visitors as over the 3 days not all visitors are communicated with, it communicates at all times, multiple times, to multiple locations not just the Entry-Exit areas and has a large volume of communication.
External is an ID which one deduce from both the name and communication pattern (only receiving messages) that the ID: external are unregistered IDs which Dinopark visitors are communicating with. In summary, they are just external people such as friends, families who are not within the park boundaries/have not registered their phone with Dinopark’s application.
2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime. Note: Please limit your response to no more than 10 images and 1000 words.
Communication to external parties on each day for all three days. We can see a pattern that; on average people, usually message a few hundred messages to external parties within a half an hour period. However, on Sunday we can see a spike by over 500%. 5480 messages were recorded from 1130AM to 1200PM. We can observe that majority of the external communication was made in the Wet Lands. However, Wetlands makes up only up to 25% of the Dinopark, why would so many people be messaging there and at such a large quantity of messages too? This could be due to something happening to “Scott Jones memorabilia displayed at the Creighton Pavilion for the public” as Creighton Pavilion resides in Wetlands. This pattern had not repeated itself on Friday or Saturday, except Sunday.
We then observe another strange pattern in the communication data for Sunday. ID 839736 has received a lot of messages from the Park Visitors at 1200PM to 1230PM this is 30 minutes after the “commotion” (the large quantity of messages sent to external IDs).
First, at 1130AM to 1200PM a large quantity of messages were sent from the visitors to external parties.
Second, at 1200PM to 1230PM a large quantity of messages were sent from the visitors to the Dinopark’s Customer Service System (ID 839736).
What we can hypothesize from this is in a sequential format, visitors first discovered something was not right at Creighton Pavilion and started texting their friends, family and others about the news. Once they had done so, they then communicated the news to Dinopark’s Customer Service System (ID 839736).
We can see that from 0800AM to 0930AM Creighton Pavilion had visitors, an increasing pattern can be observed, but this could just mean that it is correlated with the increasing number of visitors entering the park. 0930AM to 1000AM there was just a single visitor which seems odd as compared to the usual three digit figures of visitors.
From 1000AM to 1130PM there is zero check-in activity at the Creighton Pavilion. We could then hypothesize that ID 1502920 is someone who checked-in at the attraction the last. Possible scenarios: he/she could be someone who had checked-in the last at Creighton Pavilion or a Dinopark staff that was closing the Creighton Pavilion or a potential suspect.
Analyzed the Friday communication dataset. Identified those with highest out-going communication. The aim is to find potential groups. Why are these ID’s communicating with so many people at certain times?
We can observe 7 huge clusters/groupings. These represent the relationships between the IDs. Each group has a large number of communication between each other, hence the cluster. This was done using Force Atlas which works on the basis of pulling strongly correlated nodes together and weakly connected nodes apart.
Focusing on Group 1, ID: 825466 which is the largest node. We can identify a communication pattern. It seems this group had a hierarchy:
- Overall group leader
- Group leader
- Mini groups (Families, couples, friends)
Similarly, the other groups do have such hierarchies. We also can see that based on the communication data above for Friday, we can see that the 1. Overall group leaders communicate each other. However, not as much as they do communicate with the group they are in-charge of, resulting in the increase in distance from the respective nodes.
Identified IDs with same patterns. One group stood out, a group of 7 had the same movement patterns. Same check-in timings. Same movement pattern.
Similarly, they all had 0 communication.
We analysed the movement timing of the group of 7 to find out why they kept going to the Grinosaurus Stage. We can see that they first enter at 0929 and they stay at the location till 1130. Their next movement time is 1429 and they stay at the location till 1630.