ISSS608 2016-17 T1 Assign3 Li Dan
Contents
The Data
The data is concerned with the vandalism, happened in 2014 at DinoFun world entertainment park, that marred the Scotts’ show and the memorabilia display. It is scoured from VAST Challenge 2015. There are 2 main datasets: movement data and communication data.
Movement data
- 3 days from Jun 6 to Jun 8, 2014
- more than 26 million records in total
- 11374 unique ids
- screenshot of the dataset shown below.
Communication data
- 3 days from Jun 6 to Jun 8, 2014
- more than 4 million records in total
- 9391 unique ‘to’ ids, 9429 unique ‘from’ ids
- 690,231 unique directed communications
- a screenshot of the dataset shown below
The Objective
- Identify ids with large volume of communications and their communication patterns
- Inferring the info of the case including Sccots Show time, Pavilion Closing hours and the time that vandalism happened
- Detecting the Vandals
Tools Used
- JMP
- Tableau
Data Preparation
1. Missing data exist the following 2 records which is eliminated.
2. The data are stacked as follow to combine 3 days of movement data and 3 days of communication data.
Facts Inference
IDs with large communication volume
The 2 IDs, 1278894 and 839736, has seen the most outstanding communication volumes, as shown in the graph below.
Note that the number of messages the 2 IDs sent out and received are not at the same volume from day to day. For 839736, among the 8721 unique IDs it sent massage to from Fri to Sun, only 14 IDs over-replied or under-replied 1 time, as shown in the table followed. While for the ID 1278894, it has sent out 190,360 massages to only 2522 unique IDs within the 3 days, each of the 419 IDs under-replied for 1 to 3 times, left 466 massages in total un-replied.
Considering the minuscule difference, the number of massages these 2 IDs sent out and received, and the fact that all massages are replied almost instantly, we view the location where the messages replied as the place where the massages were sending to. This validates the following analysis with location. The graph shows that ID 1278894 sent and received massages in an interval of 2 hours. Interestingly, ID 839736, though most of time it was stably and constantly sending and receiving massages, it experienced 2 communication spikes at Jun 8 at 12:pm at Coaster Alley and 12:00 pm at Wet Land.
ID 1278894 could be the park’ advertisement operator while ID 839736 could be the park’ service center or app status-checking operator. They both only sent massages from Entry Corridor to multiple places in the park, and in big and almost equal volumes of massages received and sent. For ID 1278894 especially, it only sent massages to specific targeted IDs with 2-hour interval possibly for advertising purpose. And for ID 839736, when visitors were moving around the park, this ID could be keep sending them touring guides or check their app status.
The time and venue of the Sccots show
The Scotts Show could be hold at No.63, the 'Grinosaurus Stage ', and be scheduled at 9:30 am - 11:00 am & 2:30 pm - 4:00 pm.
In step 1, it is identified that at 11:00am and 4:00 pm in Coaster Alley area, there was 2 large volume of communications, except for Sunday afternoon. As the Sccots show was scheduled 2 times per day, the spike of communication could happen just after the show when people who watched the show communicate with others and receive large volumes of massages. In step2, while setting the time to 11:00 am on Jun 6, it is further discovered that the show was hold at Grinosaurus Stage in Coaster Alley area as many people were moving out from there. In step3, by checking the movement status of the IDs near the exit of No.63, it is found that the show check-in starts at around 9:30 am and 2:30 pm while the number of moving IDs nearby peaks at 11:00 am and 4:00 pm, signaling ends of the shows.
Closing hours of Pavilion
By examining the check-in records of Pavilion, it is noticed that the Pavilion normally closes at around 9:30 am - 11:30 am and 2:30 pm - 4:30 pm on 3 days except for Sunday afternoon. The last check-in record was at 11:59 am, and the pavilion was closed afterwards on Sunday.
Time of Vandalism
It can be hypothesized that the Vandalism happened on Sunday morning between 9:30 am and 11:30 am when it is closed.
The first indicator is that, on Sunday at around 11:30 am, people in WetLand area suddenly started to send far more massages to external parties, the massage volume surged to 377 for 1 single minute at 11:59 am.
Also, unusual large communication volume occurred in WetLand area, which amount to 2407 at 11:39 am. As the only exit of Creighton Pavilion is at WetLand area, people at this area must have discovered the vandalism earlier and they could be communicating the info with their friends at other parts of the park or outside of the park.
It is also noticed in the above image that many massages are sent to Entry Corridor at the first half hour of 12 am. This happened because the ID 839736 which could be operated park’s server center, sent bulk massages to people’s app to check status or inform people the news after the vandalism was reported, and it received the responses from visitors’ App automatically.
The vandalism probably happened after the Sunday morning show started because the show on Sunday morning went on as usual. If the vandalism happened earlier than the show, the morning show would have been cancelled for the safety as what they did for the last show on Sunday.
Detecting the Vandals
It is inferred that the Vandals went to the Pavilion before it closed on Sunday at 9:30 am and stayed inside there to vandalize during 9:30 am to 11:30 am . Because the path to Pavilion is very short, the Pavilion is very close to the main path. If the criminals tried to break the Pavilion door, visitors who were walking or taking the express would be able to find out and report.
Filtering suspects
The following table shows how the IDs are filtered. After these 5 steps of filtering, only 23 unique IDs are left.
Movement patterns of 32 suspects
The relevant movement records of the 32 filtered suspects are shown as follow, the ID 1983765 stands out for both the minimum number of check-in and movement records. It is also noted that IDs that have most number of records visiting the pavilion frequently moves around the park as compared with 1983765.
Among the 32 IDs, the following 5 ID are those who visited the pavilion often and with less movement records during the hypnotized vandalism time. Notably, the first 4 IDs enjoyed their time taking thrill rides during Sunday while the last ID, 1983765, did not take any rides on Sunday and left the park early at 11:47 am.