ISSS608 2016-17 T1 Assign3 Meenakshi

From Visual Analytics and Applications
Jump to navigation Jump to search

Abstract

DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events.

One event last year was a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past.

While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.

Problem and motivation

The in-app communication data over the three days of the Scott Jones celebration includes communications between the paying park visitors, as well as communications between the visitors and park services. The data also contains records indicating if and when the user sent a text to an external party.
Using visual analytics we need to analyze the available data and solve the below tasks.

  1. Identify those IDs that stand out for their large volumes of communication. For each of these IDs
    1. Characterize the communication patterns you see.
    2. Based on these patterns, what do you hypothesize about these IDs? Note: Please limit your response to no more than 4 images and 300 words.
  2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime. Note: Please limit your response to no more than 10 images and 1000 words.
  3. From this data, can you hypothesize when the vandalism was discovered? Describe your rationale. Note: Please limit your response to no more than 3 images and 300 words.

Tools Used

Tableau version 10.0
JMP Pro 12.2
Gephi 0.9.1

Approach

Data Cleaning and Preparation

Most analysis is required to be done using the communication data, hence we look at it first. There are three csv files with communication data for Friday, Saturday and Sunday.

  1. Open the three csv files in JMP. Using the Table concatenate function, join all the records and save as JMP table. It contains 4,153,329 records.

Examine the variables and data types -

  1. The data has four columns. Timestamp of communication, from visitor Id, To visitor Id, location from where message was sent.
  2. Change data type of from column- Numeric continuous to Numeric nominal.
  3. The To column contains Ids for external communication as string "external", recode this value to 100. Then change the data type to numeric nominal.
  4. In movement data for Sunday, two records had missing values for X,Y co-ordinates. These were excluded from analysis.

The three days communication data table is now ready for Visual analysis using Tableau. The JMP table is exported as csv file.

Analysis with Tableau

Task 1

On importing the communication data to Tableau, the Timestamp feild could not be read accurately. The months and days were interchanged. By changing the system time format to 24 Hrs and importing a new csv file from JMP the problem was fixed. Looking at the following visualizations for the various user IDs' we can isolate the High volume communication IDs'

Observations
  1. IDs' 1278894 and 839736 stand out for the maximum number of messages sent during the three days. With further analysis, we found that these ID's are in touch with all the park visitors. Hence they must be park services staff who are communicating information on the park events at various intervals. They also receive messages from most of the park visitors. This could be questions from visitors regarding rides or events at the park or any other required assistance during their visit.
  2. Messages to ID 100 represents external communication. This stands out to be the third highest ID for number of messages received. Park visitors are quite active in sharing their experience and park event updates to people or media outside the park.

These graphs show that the IDs' 1278894 and 839736 send messages only from the Entry corridor, but receive messages from all locations at the park. From this pattern, we hypothesize that they must indeed be park services staff.

Task 2

Identifying communication patterns

  • Pattern for ID 1278894

This Park service staff sends out messages every alternate hour starting from 12 PM to 21 PM. Burst of messages are sent every 5 mns. The staff also receives messages every hour between 12 PM to 22 PM from the park visitors. It is possible that these are messages sent out regarding park events or Fun games and visitors are responding back to them. For instance the Dino Fun world website mentions about THE CINDYSAURUS TRIVIA GAME, messages could be related to this game

  • Pattern for ID 839736

There were two peaks in the communication pattern for staff ID 839736. The peaks are on Sunday 8th June between 12 PM and 12:30 PM. The messages are sent from Entry corridor. The peak of messages received were also between 12 PM and 12:30 PM from Wet Land.Then there is a drop and another relative peak happens between 2:45 to 3 PM.

  • External Communication

While looking at the external communication over three days, we see that on average less than 50 messages per minute are sent out by park visitors. But there was a peak of messages sent observed on Sunday, 8th of June between 11.45 PM to 12 PM. By filtering on location ,we can also note that most of these messages were sent from Wet Land.


References

Citation