Difference between revisions of "ISSS608 2016-17 T1 Assign3 Shishir Nehete"
Line 33: | Line 33: | ||
<br/> | <br/> | ||
− | # JMP Pro - For Data cleaning, transformation and visualization | + | # <b>JMP Pro</b> - For Data cleaning, transformation and visualization |
− | # Tableau - For data visualization | + | # <b>Tableau</b> - For data visualization |
− | # Gephi - For Graph visualization using nodes and edges | + | # <b>Gephi</b> - For Graph visualization using nodes and edges |
<br clear="all"/> | <br clear="all"/> | ||
Revision as of 01:35, 30 October 2016
To be a Visual Detective: Detecting spatio-temporal patterns
Contents
Overview
DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events.
Our task is to analyse the data for one event, which was organized last year as a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. However, the event was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past.
In view of this mayhem, we are supposed to investigate the in-app communication data over the three days and try to figure out the patterns of communications and make hypothesis of when the vandalism was discovered.
Task
We have access to the in-app communication data over the three days of the Scott Jones celebration. This includes communications between the paying park visitors, as well as communications between the visitors and park services. In addition, the data also contains records indicating if and when the user sent a text to an external party. Our task is to use visual analytics techniques to analyze the available data and develop responses to the questions below.
- Identify those IDs that stand out for their large volumes of communication. For each of these IDs
- Characterize the communication patterns you see.
- Based on these patterns, what do you hypothesize about these IDs?
- Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.
- From this data, can you hypothesize when the vandalism was discovered? Describe your rationale.
Data
Data Preparation
Visualization Software
- JMP Pro - For Data cleaning, transformation and visualization
- Tableau - For data visualization
- Gephi - For Graph visualization using nodes and edges
Results
Task 1
Below figure shows the data for communication that happened on Friday.
Below figure shows the data for communication that happened on Saturday.
Below figure shows the data for communication that happened on Sunday.
While analysing the communication data for all three days, it is observed that there are 2 IDs that stand out in the communications happening in the park.
These 2 IDs are 1278894 and 839736. Other ID that is target of high communication is 9999999, which refers to external party.
Further analysing the data for ID 1278894, it is observed that this ID communicates with majority of the visitors in the park. Hence it can be hypothesized that this ID can be check-in monitoring ID in the park setup. Also, this ID is located at Entry Corridor which confirms the hypothesis.
The other ID i.e. 839736, which also communicates with high number of visitors to the park can be hypothesized as a kind of Service ID in the park. This ID also is located at the Entry Corridor.
We will further analyse the communication patterns of these IDs in the 2nd task that describes the communication patters.
Another noticeable point to note out of this analysis is that the communication has drastically increased with the ID 839736 on Sunday tough the number of visitors is close to the number on Saturday.
As seen in the table above, the communication with 839736 has increased 4 folds while the increase in visitors and check-in monitoring has not significantly changed over Sunday.
This data can be visualized and explored at the link to tableau public. ()
Task 2
Task 3
The task is to hypothesize when the vandalism was discovered.
After carrying out the analysis of data, it is observed that the communications spiked in the duration from 12 to 12.10 as marked in the figure above. This proves that most of the visitors tried to contact either the Service ID of the park or the external sources for reporting the vandalism. Based on the timings of the vandalism, i.e. on Sunday at around 12, further analysis was carried out regarding the possible locations of the crime.
As seen in the above figure, the check-ins at 2 locations have suddenly dropped on Sunday as compared to the variations for check-ins at other locations. These 2 locations are Creighton Pavilion and Grinosaurus Stage.
As we know about these 2 locations, further analysis on the check-ins to these 2 locations gave insights that these are the 2 locations which were shut down by the park authorities immediately as the crime was discovered. These locations were not re-opened the whole day as there are no check-ins, which confirm the hypothesis that some damage was done at these locations.
The link to visualize and explore this data, please follow this link to tableau public. ()
References
Comments