ISSS608 2016-17 T1 Assign3 Ye Jiatao

From Visual Analytics and Applications
Jump to navigation Jump to search

Overview

DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events. One event last year was a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past. While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.

The Task

In this case, we mainly need to solve the following question using the in-app communication and visitor movement data:

  1. Identify those IDs that stand out for their large volumes of communication. For each of these IDs
    1. Characterize the communication patterns you see.
    2. Based on these patterns, what do you hypothesize about these IDs? Note: Please limit your response to no more than 4 images and 300 words.
  2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime. Note: Please limit your response to no more than 10 images and 1000 words.
  3. From this data, can you hypothesize when the vandalism was discovered? Describe your rationale. Note: Please limit your response to no more than 3 images and 300 words.


Data Set

  1. DinoFunWorld_CommData.zip consist of in-app communication data over the three days of the Scott Jones celebration.
  2. DinoFunWorld_MoveData.zip consists of three days park movement data. The park movement datasets are in csv format.
  3. DinoFunWorld_LayoutMap.zip consists of a jpg file.
  4. DinoFunWorld_Website.zip consists of webpages of DinoFun World Park.

Approaches

Q1

Identify those IDs that stand out for their large volumes of communication. For each of these IDs

  1. Characterize the communication patterns you see.
  2. Based on these patterns, what do you hypothesize about these IDs?
Y 01.jpg


Y 02.jpg

From the network figure above, 3 IDs stand our for their large volume of communication: 1278894, 839736 and external. Because external means sending messages outside the park, in this step, we just focus on ID 1278894 and 839736.

Y 03.jpg


Y 5.jpg


  • ID 1278894
  1. There is no connection among ID 1278894, 839736 and external, namely, these 3 account haven't send or receive message from each other.
  2. There is a clearly cyclical messaging pattern for ID 1278894 within these 3 days. From 12:00 each day, this ID begin to send out a large number of messages in every 5 minutes in an hour, wait for another hour and repeat 5 minutes messaging again within next hour.
  3. The cyclical communication pattern would repeat 5 times a day, which begins from 12:00 to 20:00 in the evening.
  4. The communication volume keeps increase steadily from Friday to Sunday, which means there are more and more people come to visit this park within this period.
  5. This ID only have record of sending messages from Entry Corridor.

By now, we can hypothesize that ID 1278894 is a account for park employee who always on the entry. Because the amount of message sent out by this account remain relatively steady with a day, we can assume that this ID sends out messages to all visitor in the park related to attraction open info or cindysaurus trivia game mentioned in the park's official website.

Y 04.jpg


  • ID 839736
  1. The messaging pattern of ID 839736 is not as clear as that of ID 1278894. It sent and received up to 25 messages per minutes from 8:00 am to 11:30 pm each day except for Sunday, when there is abnormal peak of communication showed up, which reached around 1400 messages in 1 minutes for receiving at 12:00 and sending at 12:03.
  2. The sending and receiving pattern is time correlative for this ID, which means that it will respond to the inquiry or questions manually.
  3. This ID also only have records in Entry Corridor.

From the evidences above, we can confidently hypothesize that this ID is another park employee who in charge of emergency events and responding to visitors' inquiry. It seems there is something emergency happened at 12:00 pm on Sunday, which cause of the abnormal peak of communication in terms of this ID.

Q2

Q2: Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

Y 06.jpg


  • Pattern 1: There are 2 sending messages peaks appeared in Coaster Alley at 11:00 am and 4:00 pm on Friday and Saturday, while there is only 1 peak showed up at 11:00 am on Sunday. From the park map, we can get that there are 2 stage in this area, namely, 32.Creighton Pavilion and 63.Grinosaurus Stage, which is the place Scott would show up. So far, we can guess that Scott's show was hold in Coaster Alley and the show time are around 11:00 am and 4:00 pm. The communication peak in this area may result from people talked about the shows they were watching or notified their friends the show was begin, or shared the photos of the show through app.
  • Pattern 2: There supposed to be a communication peak around 4:00 pm in Coaster Alley on Sunday in normal condition. The missing peak indicate there are something happened between 11:00 am and 4:00 pm, which lead to the cancel of the last Scott's show, so we can infer that the crime timing is between 11:00 am to 4:00 pm. There is a high chance the chaos appeared in the middle of Scott's first show on Sunday.
  • Pattern 3: There is a unusual peal showed up at 12:05 pm in Entry Corridor, which can be more visitors came into the park, but there is is higher potential these are visitors who leaved the park in advance because of vandalism. If my hypothesis is true, the crime timing can be further narrowed into 11:00 am to 12:05 pm.
  • Pattern 4: There are 2 unusual communication peaks appeared at 11:39 am and 12:00 pm in Wet Land. From the map, we can get that the only way out from location 32 or 63 is Wet Land, so the unusual peak maybe the result of people leave the show in advance because of vandalism. By now, we can further shorten the time of incident into 11:00 am to 11:39 am.
Y 07.jpg


  • Pattern 5: There is an unusual peak of messaging to external in West Land at 11:59 am on Sunday, which jumped from around 14 to 337 messages within 5 minutes. It can be caused by people who just evacuated from Coaster Alley because of the vandalism sent messages outside to their family or friends about what they just saw or answered about the police's investigation.
Y 08.JPG


  • Pattern 6: From the communication network graph, we can find out there are lots of visitor groups. Within these groups, the group members messaged to each other frequently. Some group can be further separated into 2 sub-groups, which can be people from different family or friend circle. When we take out several group to examine our theory, we can see that these people also travel together in the park.
Y 11.jpg


  • Pattern 7: The West Land is the busiest area for sending out message, after removing the park service IDs 1278894 and 839736, following by Tundra Land, because most of the Thrill rides and Ride for everyone located within these 2 area. In addition, the Scott’ show of memorabilia in Pavilion significantly increase the visitor volume in Wet Land.
Y 12.jpg


  • Pattern 8: Take a closer look into 2 park official IDs 1278994 and 839736, we can find out they are different types of account in terms of sending and receiving messages pattern. For IDs 1278994, it would start a communication with visitors, while ID 839736 would more likely to respond to a visitor’s message. The ID 1278894 sent message following the cyclical pattern discussed above and wait for visitors’ respond to decide whether to send the next message. We think this account is used for the app game mentioned in the park’s official website, which would send out question about the park and wait for visitor to answer. As for ID 839736, it’s more likely to be as an inquiry counter, which will manually respond to visitors’ question within 2 to 5 minutes.
Y 13.jpg


  • Pattern 9: Dip a bit deeper into the communication clusters, we can get some general pattern within these cluster. The closeness of this ID is even, which means that the IDs inside these groups are fully connected with others, so that they can touch each members in the group easily. This result is obvious, because everyone can connect using the app provide by the park. On the contrary, we find out some IDs stand out in terms of eccentricity, which means the longest path from 1 point to another point. In this case, 3 IDs stand out: 1932021, 16666757, 514576. We guess these are the tour guild of this cluster, because it is obviously team members would have a closer relationship toward each other than toward tour guilders. Although everyone can connect others easily in the group, there should be some point have less communication volume with tour guilders. For example, within a family, it is informative enough for one of the family member have connection with tour guilders.


Y 14.jpg


Y 15.JPG


  • Pattern 10: We also found some IDs with only 1 record of check-in. Because everyone came into the park need to check in once, these guys never played any ride, they just check in once at the main gate of the park. We can also find that they only communicate with 2 park official ID 839736, 1278894 and external. In addition, these guys only stay in the park for one day. Because they didn't have check in record at stage in Wet Land, we cannot recognize them as suspect for the crime event. One reasonable expatiation is that some of them are the reporter of the local newspaper who came to the park to collecting news material.


Q3

From this data, can you hypothesize when the vandalism was discovered? Describe your rationale.

Y 06.jpg


Y 16.jpg


From the pattern 1 to 4 discussed above, we have already set the timing of vandalism into 11:00 am to 11:39 am on Sunday. Take a check at the check-in data for these 2 stage in coaster alley. We can get that the large check-in begin with different time stamp. The check-in time of Creighton pavilion is 11:30 am and 4:30 pm, while the check-in time of Griosaurus Stage is 9:30 am and 2:30 pm. The check-in time for Creighton Pavilion coincide with communication peak at Wet Land, which is 11:00 am and 4:00 pm. The messaging peak is the result of visitor texting their friends when in queue for check-in to Pavilion. By now, we can further shorten the crime time into 11:30 am to 11:39 am. To sum up, we can know the vandalism was discovered within the period of 9 min from 11:30 am to 11:39 am in Creighton Pavilion.

Result

From a series of time-series analysis, network analysis and movement analysis, we have found out some interesting communication pattern from these data set and located the crime time into 11:30 am to 11:39 am in pavilion. The following link linked to tableau sever.

https://public.tableau.com/views/Communication_analysis/Sheet8?:embed=y&:display_count=yes