ISSS608 2016-17 T1 Assign3 Kuar Kah Ling

From Visual Analytics and Applications
Revision as of 23:55, 23 October 2016 by Klkuar.2016 (talk | contribs)
Jump to navigation Jump to search

Overview

DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events.

One event last year was a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past.

While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.


The Task

Using the in-app communication data over the three days of the Scott Jones celebration, visual analytics is applied to solve the following questions:

  1. Identify those IDs that stand out for their large volumes of communication. For each of these IDs
    1. Characterize the communication patterns you see.
    2. Based on these patterns, what do you hypothesize about these IDs? Note: Please limit your response to no more than 4 images and 300 words.
  2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime. Note: Please limit your response to no more than 10 images and 1000 words.
  3. From this data, can you hypothesize when the vandalism was discovered? Describe your rationale. Note: Please limit your response to no more than 3 images and 300 words.


Data

Park movement and communication data over three days of the Scott Jones celebration were provided. Additionally, the DinoFun World Park map and webpages were available for reference. The park movement data had 26,021,962 rows and included the following fields:

  • Timestamp: Date and time of the movement activity
  • ID: ID of park-goer
  • Type: Check-in or movement, where a check-in means the park-goer joined in the attraction queue and movement means general movement within the park.
  • X: X-coordinate of where the movement type was recorded.
  • Y: Y-coordinate of where the movement type was recorded.

The communication data had 4,153,329 rows and included the following fields:

  • Timestamp: Date and time of message when sent, in the format yyyy/MM/dd hh:mm:ss AM/PM
  • From: ID of park-goer who sent out the message
  • To: ID of park-goer who received the message
  • Location: Areas within the DinoFun World Park.


Data Preparation & Exploration

We will mainly be using the communication data to answer the questions.

  1. The communication data for the 3 days are combined in JMP using ‘Concatenate’ function.
  2. Recode ‘external’ to ‘0’ under column ‘To’.
  3. Rename ‘From’ and ‘To’ to ‘Source’ and ‘Target’ respectively.
  4. Change modelling type of ‘From’ and ‘To’ from continuous to nominal.
  5. Reorder columns to be ‘From’, ‘To’, ‘Location’, ‘Timestamp’. This resulting table will be used as for creating the ‘Edge’ table for subsequent network visualisation.
  6. Using the table above, a ‘Nodes’ table is created by copying the ‘From’ column to a new data table and renaming the new ‘From’ column to ‘ID’.
  7. Export both data tables and save the files in CSV format.

Using JMP’s ‘Distribution’ function, we could identify 2 IDs with high volumes communication i.e. ID 839736 and ID 1278994. It is also shown that many park-goers communicate to external contacts, which is denoted as ID number ‘0’. Majority of the communication came from Wet Land and the communication pattern over the first 2 days (Friday and Saturday) are similar. The 11am-12pm hour on Sunday had a sudden spike in communication which could be related to the crime committed in the park.


Task 1

Using Tableau, we can zoom into the communication patterns of ID 839736 and ID 1278994.

ID 839736 - DinoFun's alert/help service ID Communication starts around 8am and ends around 11.30pm daily. Number of messages from ID839736 on Jun 6 and 7 ranges from 1 to 4 messages. However, on Jun 8, it ranged from 1 to 35, with its peak at 12.03pm. Number of messages to ID839736 on Jun 6 and 7 ranges from 1 to 4 messages. However, on Jun 8, it ranged from 1 to 39, with its peak at 12.00pm. The low number of communications, plus its long, regular hours, suggests that this ID is DinoFun's alert/help service ID.

ID 1278894: Cindysaurus Trivia Game (Based on the low take-up rate) Communication from ID 1278894 starts from 12noon to 8.55pm daily, over the 3 days. The communication pattern is very regular, with messages disseminated every 5-minute interval between 2 noon to 12.55pm, 2 to 2.55pm, 4 to 4.55pm, 6 to 6.55pm and 8 to 8.55pm. On 6 and 7 Jun, there is always a huge dip in the number of messages sent from this ID between 2.40pm to 2.55pm and then a jump in number of messengers between 4 to 4.05pm. Number of messages on Jun 6 ranged from 490 to 713, Jun 7 ranged from 897 to 1298, Jun 8 ranged from 1011 to 1475. Communication to ID1278894 is similar in terms of its timing. There are 5 distinct periods of communication – approximately 12 to 1pm, 2 to 3pm, 4 to 5pm, 6 to 7pm and 8 to 9pm. Number of messages ranged from 1 to 42 (Jun 6), 1 to 84 (Jun 7) and 1 to 48 (Jun 9). The distinct increase in messages to ID 1278894 on 6 and 7 June is observed between the 3 and 4pm hour. This could be DinoFun's Cindysaurus Trivia Game ID as it sends out questions to visitors on a 5-minute interval basis and only those who are interested would reply, and this explains the low response ratio.