ISSS608 2016-17 T1 Assign3 Vaishnavi AMS
Contents
Abstract
DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events.
One event last year was a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past.
While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.
The Task
Using the in-app communication data over the three days of the Scott Jones celebration, visual analytics is applied to solve the crime and discover patterns in the crowd that will help officials better prepare themselves for such future events.
Questions for investigation
1. In this assignment , we aim to identify the IDs that stand out for their large volumes of communication and the characteristics of their communication patterns.
2. Identify up to 10 communications patterns in the data and characterize who is communicating, with whom, when and where. We will aim to prioritize those patterns that are most likely to relate to the crime.
3. From this data, we can finally attempt to hypothesize when the vandalism was discovered?
Tools utilized
- Microsoft Excel 2016 – Data cleaning and data preparation
- JMP Pro 12 – Data cleaning and data preparation
- Tableau 10.0 – Data visualization and analysis
- NodeXL - Data visualization and analysis of Communication data
- Gephi - Data visualization and analysis of Communication data
Approaches
Data preparation: Examine the data and make appropriate changes wherever necessary using Excel and JMP Pro 12 to make the data fit for analysis. Extract selected data for analysis.
Data visualization and Analysis: Construct Network graphs and heat maps to examine the underlying insights and patterns and draw conclusions.
Data Visualization and Analysis
Question 1
In the first question we aim to identify the IDs that stand out among all IDs in the park in the course of three days.By aggregating the total number of messages sent from each ID and the total number of messages received by each ID we will be able to draw a conclusion to this question. The aggregation was done in JMP Pro through the Table --> Summary function. Visualization is done in Tableau and a bubble chart is created.
- The above bubble chart shows the IDs that send out messages. Among these two IDs stand out ID - 1278894 and ID - 839736.
- In the above bubble chart the IDs that receive the maximum messages stand out. In addition the two IDs observed in the previous chart ID - 1278894 and ID - 839736 we can also observe that external ID also seems to receive a large share of the messages that are sent each day.
On further drilling down ID-839736 on an hourly basis each day we can find some interesting patterns
- Almost all the IDs that visit the park communicate with this ID and messages are sent to and received back from this ID in short interval. This could be a help desk or Information center of the Dino Fun Park where people send out queries and are replied back with the answers.
- There is a peak observed in the number of messages this ID receives at 9 AM , between 2 to 3 PM, around 4 PM and around 6 PM.
- At these peak positions the number of messages the ID receives outweighs the messages this ID sends out
- More people could be querying the Information center due to the shows being held for Scott Jones celebration
- We know that there were two shows conducted at the Creighton Pavilion everyday for Scott Jones celebration. We can assume that this show might have been around 9 to 10 AM and the second show around 3 to 4 PM
- The ID- 1278894 has a very interesting and different pattern in comparison to the Information center ID. There is a very significant difference between the number of messages sent and received by the ID.
- The number of messages sent is very much higher than the number of messages received.
- The messages from this ID are sent out every 5 minutes in an hour. The ID then starts receiving messages once they have sent messages to these IDs.
- Once messages are sent out in an hour period no messages are sent out in the next hour.
- We can assume from the characteristics above that this ID could be the Cindysaurus trivia game. We can find details about this game in the Dino park website.
Question 2
Observation 1
- On comparing the Information center communication pattern(839736) on an hourly basis across Friday , Saturday and Sunday data we can observe that the patterns on Friday and Saturday are similar across the hours with peaks observed at 9 AM, 12 PM, 2 PM, 4 PM and 8 PM in both messages sent and received.
- But on Sunday we can observe that communication data spikes at primarily at two points. One was at 12 PM and the other was around 3 PM. This could mean that a large number of people were communicating with the Information center regarding an issue. We can assume that the vandalism must have been discovered around this time and the people were reporting or inquiring this with the Information center. The second spike could be due to the second show of the day closed due to the vandalism and people weren't aware of the show being cancelled and were
Observation 2
Observation 3
Observation 4
Observation 5
Observation 6
Observation 7
Question 3
Results