ISSS608 Assign3 HoLiChin Task2
|
|
|
|
|
|
Contents
- 1 Task Requirement
- 2 Communication Patterns
- 2.1 Pattern #1 – Big Groups
- 2.2 Pattern #2 - Identifying the Dynamics of Park Activities
- 2.3 Pattern #3 - Abnormal Communication Volume from ID:839736 on Sunday
- 2.4 Pattern #4 – Communication Patterns at Creighton Pavilion
- 2.5 Pattern #5 – Detecting the Most Likely Crime Suspect
- 2.6 Pattern #6 - Communication data at Grinosaurus Stage
Task Requirement
Describe up to 10 communications patterns in the data.
Characterize who is communicating, with whom, when and where.
If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime. Note: Please limit your response to no more than 10 images and 1000 words
Communication Patterns
Pattern #1 – Big Groups
Analysis and Findings
From the above Qlik Sense visualisation, besides ID 1278894 and 839736, there were some IDs top in sending large volume of messages (more than 3000 texts, but less than 5000) over the 3 days. The top 5 IDs (in descending order) are 1045021, 1116329, 1749109, 918738 and 1388162. In the bar chart in lower quadrant, it shows that those high volume IDs had sent messages to a big group of unique receiver IDs.
For example, by drilling down the first ID 1045021, it’s observed it had sent 3.81K messages to about 2.77K unique Receiver ID. This ID was in the park for all 3 days, and most messages were sent on Friday & Sat, at Wet Land. From the above, we might deduce that this group of IDs could be the group leaders sending the bulk messages to the group members, or they could be the Park staff that sent information to visitors.
Pattern #2 - Identifying the Dynamics of Park Activities
Analysis and Findings
Some of the observations from the above visualization: -
- The openings hours of the park were from 8:00am to 23:00pm.
- Saturday and Sunday had higher communication volume than Friday, that means more visitors over the weekend.
- Comparing movement and check-in, larger communication happened when visitors checked in at attractions in the park during Saturday and Sunday.
- Again, similar to earlier analysis, there was a spike of communication data observed for both check-in and movement on Sunday @ about 12pm.
- From the charts of Sender and Receiver activities during the 3 days, the volume of communication of Thrill rides was much higher than other types on both Saturday and Sunday.
- One thing to observe is the comm pattern at Entrance followed a periodical pattern of high volume at regular intervals, then the comm ceased at the alternate intervals. This could due to some fixed broadcast information or interactive games that sent by the park staff with their visitors near entrance area. Again an obvious high communication spike observed on Sunday about 12 noon, at Entrance.
Pattern #3 - Abnormal Communication Volume from ID:839736 on Sunday
From Task 1, we have found out ID 839736 stand out with large communication volume on Sunday. Over here, we will further investigate what could had caused the large communication volume on that day.
Analysis and Findings
The obvious abnormality in communication volume was observed on Sunday, with two obvious peaks. One very obvious peak was at 12 noon, and another was at about 2pm.
From the Park Map, it showed that most communication sent out by ID 839736 were received by Receiver IDs at Creighton Pavilion, during the peak at 12noon. From the Receiver IDs table, the top 5 IDs who received most messages from 839736 were 1092525, 1601276, 38945, 2013094, 95112.
As given in the Park website, there was a weekend tribute to Scott Jones (renowned football star). In addition, a show of memorabilia of his awards, trophies, and the Olympic Gold medal would be displayed in Creighton Pavilion. However, the event at Pavilion did not go as planned. The display of Scott Jones’s soccer memorabilia in the Creighton Pavilion was vandalized. This could explain why the peak communication volume happened at Creighton Pavilion, as this is where the crime had taken place.
Detailed analysis will be done in Task 3, to investigate the abnormal communication volume happened at Pavilion, and to find out when the vandalism could had happened.
Pattern #4 – Communication Patterns at Creighton Pavilion
From Figure below, it’s observed the communication volume increased starting from about 11:30AM, then 11:32AM, finally peak at 11:44AM.
Now, we first zoom in to examine the patterns between 11:30am to 11.50 am (first peak).
A series of things happened at Pavilion on Sunday between 11.30AM to 11.52AM:
- A peak of communication occurred around 11:44AM.
- The top senders sent at Pavilion were highlighted in the rectangle
- Most messages were sent to External during this period.
Another peak of communication occurred around 12:00PM (noon). And now most of messages were sent to 839736. (See figure below)
In conclusion, at Pavilion, the peak of communication related to external appeared 10 minutes earlier than the peak of communication related to 839736 at noon on Sunday. There was an increase of messages to external probably due to vandalism that took place, and when police arrived at Pavilion during that time, and that caused visitors to communicate to external for “breaking” some news to outside the park.
There is an increase in messages to the 839736 could probably due to visitors started to contact the Info Center / Park Help desk to enquire on what had happened.
Pattern #5 – Detecting the Most Likely Crime Suspect
As some abnormal things happened at Pavilion, now more analysis will zoom in on the communication patterns at Pavilion, to detect the most likely suspect. First we need to filter the communication data with Day of visit = Sunday, Hour of Visit = from 9am to 12noon. Check in data only at Pavilion.
The data was used to create the Node file and Edge File to be imported to Gehpi. In the Node file, the column info such as HrStayed, NumMsgSent and Number of Unique Rx were included. Whereas for the Edge file, the value of the Directed type is the HrStayed.
The assumption made for using the above criteria is that for the crime suspect, they might had checked in and stayed in Pavilion for a longer time before they committed the vandalism crime. Thus, next we wanted to find if such a community group exist.
From the Gehpi data graph, the communities are color coded by how long they have checked-in and stayed in Pavilion. The majority in Group 1 had stayed only one hour, Group 4 had stayed for 2 hrs, Group 3 stayed for 3 hrs, and a very small group of 12 members (Group 5) has stayed for 4 hrs. We will further zoom in to Group 5 next.
Three special persons capture my attention (SenderID = 461004, 1502920 and 1350546). They got into Pavilion from 9AM, and stayed there until 12Noon.
From the Sankey and Chord Diagram, I tried to link these suspect to any accomplices through their communication patterns. It’s observed that ID 461004 had communicated to 9 unique receivers, and all three ID 461004, 1502920, 1350546 had commonly maintained contacts to 416790, 1187909, 1123214, 100279. Thus we could probably speculate that these few IDs could be the accomplices to the suspects.
Pattern #6 - Communication data at Grinosaurus Stage
Three places at Coaster Alley with high number of communication data were Grinosaurus Stage, Mary Anning Beer Garden and Whitley’s (Shopping). As given in the background information, there were two shows daily featuring Scott, thus we will zoom in to analyse the communication patterns at Grinosaurus Stage.
From above Figure, there was a spike in communications from G Stage at 11:00 AM and 4:00 PM periodically happened on both Friday and Saturday. Also, the communication data were relatively low before 11am and before 4pm. The two peaks of checkin illustrated that there were two shows daily over Scott’s weekend.
The spike in messages at 11am and 4pm could mean the end of Scott Jones show at G Stage, when people communicated with others to share about Scott’s show and what they saw. Or it could also the visitors were trying to find other friends after possibly being separated in the stage area or those who did not attend the show.
However, Sunday only showed a spike at 11:00 AM (after the first show). There was a huge communication spike happened around 2pm, and after that very low messages sent from G Stage from 3pm onwards. That could indicate that G Stage was closed from 3pm onwards on Sunday afternoon because of the crime. It indicates Scott’s show at 4pm was cancelled.
Gephi was used to analyse and to explore the underlying patterns of large volume of messages sent on Sunday 2pm @ G Stage. There are several layouts to choose from in Gephi to arrange the network graph. Force Atlas 2 was chosen for the purpose of this analysis.
The color was categorized based on node’s number of unique receiver that messages was sent to. It can be seen from above data graph, most nodes had sent messages to only one unique receiver (purple nodes), and the ID for this common Receiver is ID 839736.
One possible speculation to this observation of high communication volume at about 2pm could lots of visitors still went to GStage for watching Scott show. These disappointed visitors sent texts to park service ID 839736 to inquire why GStage was closed.