ISSS608 2016-17 T1 Assign3 Frandy Eddy
Contents
Abstract
In this assignment, we are given the communication and movement data during the "Scott Jones Weekend" event. The purpose is to understand how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns. There are some key results and findings:
- IDs 1278894, 839736, and external stand out for their large volume of communication. It is hypothesized that ID 1278894 is the Cindysaurus Trivia Game and ID 839736 is the visitor information center.
- There is a spike of communication volume at around 11:30 AM - 12:30 PM.
- The vandalism was discovered on Sunday at around 11:45 AM - 12:00 PM by visitors at Wet Land.
Overview
DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events.
One event last year was a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past.
While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.
The Task
There are some tasks:
- Identify those IDs that stand out for their large volumes of communication. For each of these IDs
- Characterize the communication patterns you see.
- Based on these patterns, what do you hypothesize about these IDs?
- Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.
- From this data, can you hypothesize when the vandalism was discovered? Describe your rationale.
Results & Findings
IDs with large volumes of communication
There are some IDs with much larger volume of communication than others. The IDs are 1278894, 839736, and external.
1. ID 1278894
- Patterns:
- This ID has the highest volume of communication on all 3 days.
- The volume of communication from and to this ID is almost the same, with more communication coming from this ID than communication going to this ID. (~0.30% difference)
- It is found that the location of this ID is always on Entry Corridor.
- Communications from this ID only happen at certain times. It is sending out broadcast messages at 12:00 PM - 12:55 PM, 2:00 PM - 2:55 PM, 4:00 PM - 4:55 PM, 6:00 PM - 6:55 PM, and 8:00 PM - 8:55 PM with an interval of 5 minutes between each broadcast message.
- This ID never communicates with ID 839736 and external which has the second and third highest volume of communication on the 3 days.
- Based on these patterns, it can be hypothesized that this ID is the Cindysaurus Trivia Game, which is available from the DinoFun World app.
2. ID 839736
- Patterns:
- This ID always responds within 5 minutes after it receives a message.
- The volume of communication from and to this ID is almost exactly the same. (~0.01% difference)
- It is found that the location of this ID is always on Entry Corridor.
- Communications involving this ID happen at all times throughout the day.
- This ID never communicates with ID 1278894 and external which has the highest and third highest volume of communication on the 3 days.
- Based on these patterns, it can be hypothesized that this ID is the visitor information center.
3. External
- Patterns:
- This ID only receives messages and does not send any messages.
- Communications involving this ID happen at all times throughout the day.
- This ID never communicates with ID 1278894 and 839736 which are the two IDs with the highest volume of communication on the 3 days.
- Based on these patterns, it can be hypothesized that this ID is external party.
Communication patterns
As the data is very big, we will only analyze the communication patterns on subset of the data which we find interesting. There are some communication patterns found in the data.
1. Communication patterns of ID with high volume of communication
- Communications to ID 1278894
The graph above shows the volume of communication to ID 1278894 on Friday. Most of the communications to this ID occurred within 5 minutes after it sends the broadcast message. Similar patterns also occurred on Saturday and Sunday.
- Communications to ID 839736
The two graphs above show the volume of communication to ID 839736 by day. On Friday and Saturday, there are no ID with more than 10 messages sent to ID 839736 in a day. However, on Sunday, there are many IDs who sent messages to this ID many times, some of them even sent more than 300 messages to this ID on Sunday alone. The IDs are 1149894, 1601276, and 1217381. The second graph also shows that Sunday dominated most of the high volume of communication. We will look at the period when the three IDs stated above sent the messages to ID 839736.
The first peak in volume of communication to ID 839736 occurred at 12:00 PM - 12:05 PM when ID 1217381 and 1601276 sent about 5 messages per minute on average to ID 839736. ID 1601276 continued sending a lot of messages until 12:33 PM. The next peak occurred at 12:57 PM - 1:49 PM when ID 1149894 sent more than 300 messages to ID 839736 during that period. There is also another peak of communication volume from ID 1217381 occurring at 1:36 PM - 2:22 PM with a total of 271 messages sent during that period. The last peak occurred at around 2:45 PM - 3:20 PM. ID 1149894 sent 140 messages at 2:48 PM - 3:08 PM, while ID 1601276 sent 172 messages at 2:52 PM - 3:21 PM. From the pattern seen in the graph, it seems like these 3 IDs "take turns" in sending massive amount of messages to the information center. Therefore, it can be hypothesized that these IDs are the park security staffs.
The link chart above shows the communications to ID 839736 on Sunday 11:30 AM - 12:30 PM. It can be seen from the chart that during that period, this ID received a lot of messages from many people in the park.
2. Period when there is a spike in the volume of communication
- Communications to ID 839736 (Sunday)
There are two spikes in the volume of communication to the information center happening on Sunday. The first spike is a very high volume of communication at around 12:00 PM - 12:30 PM, reaching up to more than 1500 incoming messages in a minute. The second one is at around 2:40 PM - 2:56 PM with up to 400 incoming messages in a minute.
- Communications to ID External (Sunday)
There is also a clear spike in the volume of communication to external party on Sunday 11:45 AM - 11:59 AM, which is just before the spike to the information center occurred.
3. Location where the spike in the volume of communication comes from
For the first spike in the volume of communication to the information center, which happened at 12:00 PM - 12:30 PM, 21,877 of 22,205 messages (98.52%) to the information center comes from Wet Land. The second spike in the volume of communication to the information center happened at 2:40 PM - 2:56 PM. During that period, 3,113 of 3,836 messages (81.15%) to the information center comes from Coaster Alley. For the spike in the volume of communication to external party which happened at 11:45 AM - 11:59 AM, 4,730 of 4,984 messages (94.90%) to external party comes from Wet Land.
4. Volume of communication by location
If we look at the volume of communication by location, there are some interesting patterns and observations from the graph. First, there is a spike in communication volume on Sunday at around 12:00 PM coming from two locations, Entry Corridor and Wet Land. It is also found that the spikes coming from Coaster Alley always occurred at the same time of the day, that is 11:00 AM and 4:00 PM. The only exception is on Sunday 4:00 PM where the event was probably cancelled after the crime happened.
When the vandalism was discovered
From the communication patterns, the vandalism was probably discovered on Sunday at around 11:45 AM – 12:00 PM by visitors at Wet Land, as can be seen from the spike of communication volume to External. After that, they started sending messages to the information center, which resulted in the spike of communication volume to ID 839736 at around 12:00 PM - 12:30 PM. The park security staffs also sent a lot of messages to the information center starting from 12:00 PM.
Software Used
- Tableau 10.0 - Used for data visualization
- JMP Pro - Used for data preparation and analysis
- Gephi - Used for visualization of communication pattern