ISSS608 2016-17 T1 Assign3 Nguyen Tien Duong

From Visual Analytics and Applications
Revision as of 23:32, 28 October 2016 by Tdnguyen.2015 (talk | contribs)
Jump to navigation Jump to search
NTD A3 Banner.png
ISSS608 2016-17 Assignment 3
Nguyen Tien Duong
This is the story about the park whose name is "DinoFun World"
THE LUNCH CALL

01:00 PM "Hallo, KAMPONG detective office here"
01:00 PM "Hi, We are calling from Dinno world, we are chaotic now... need your help urgently please......" - a worrying voice
01:01 PM "Sir, please stay calm. We understand your situation, let us know the case"
01:01 PM "Please help us, we are closing down now, a VIP person maybe in danger."
...
01:14 PM "A team has been dispatched to your place. We will prey the criminal down."

A team from KAMPONG detective office was immediately dispatched to Dinno park to interview, collect data and prey the criminal.
NTD Detective logo.png

THE PARK

NTD park map.png

++++++++Place holder, dummy content********* The Visual Analytics Science and Technology (VAST) Challenge is an annual contest with the goal of advancing the field of visual analytics through competition. The VAST Challenge is designed to help researchers understand how their software would be used in a novel analytic task and determine if their data transformations, visualizations, and interactions would be beneficial for particular analytic tasks. VAST Challenge problems provide researchers with realistic tasks and data sets for evaluating their software, as well as an opportunity to advance the field by solving more complex problems. Researchers and software providers have repeatedly used the data sets from throughout the life of the VAST Challenge as benchmarks to demonstrate and test the capabilities of their systems. The ground truth embedded in the data sets has helped researchers evaluate and strengthen the utility of their visualizations.

THE ENVIDENCE

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. COMMUNICATION
orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
MOVEMENT
orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
SETTINGS
orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

BACKSTAGE TECHNICAL

This BACKSTAGE TECHNICAL information provides insigh of evedence investigation process. Since the collection are encoded in data, the detective team needs to execute proper data analysis technique from overview, filtering, brushing and zoom in detail to gain the insight juice of the collected data.

NTD explore data.png Data Exploratory: Diving in dataset, regconize problems and resolutionalize to target issues before process any slicing process NTD data prep.png Data Preparation: Critical process to make data usable. Slice data by difference ways, aggregate and stage data in appropriate storage for investigation NTD trial error.png Trials: POC process to try on different technologies, different approaches and techniques to visualize datasets that may be challenging to understand. This process involves plenty of trials and errors. NTD data implementation.pngHackaton: Code implementation to build up visualization and create meaningful information from provided data. This task includes plenty of codes and application usage

COMMUNICATION AT A GLANCE

After a very short phone communication with Dinno Managers, the team quickly moves on exploring the content that collected from executives team deployed to the park. The first assignment is always to grab the highest overview level of data.
Different approaches are proposed such as network analysis using sigmajs, build up heat map to check the overall distribution of data, a bar chart to grab the quickest information about data statistic. Evaluating options:

  • Sigma.Js (or any network analysis): This is not going to be promising. The communication data seems very big, and mass network analysis is inefficient.
  • Heatmap: Considerable. It can give the overall feeling of how the frequency of each communication is distributed. However, it is tough to differentiate colors and unable to sort by value.
  • Bar chart: This is selected solution. Simple, right-away answer to see who is the most, or how the communication data is distributed. However, the drawback of this method is to be unable to see the entire whole picture of everything, since the number of visitors could be thousands. We can consider using heatmap to corporate with this bar chart.

NTD Comm overview.png

#Found 01
Right after the graph is out, we are alerted of 2 IDs extremely outstanding high communication volume in 2 graphs and 1 additional ID only appears in "IN" graph, which suggests the receiver.
Let's tackle the easiest ID who comes with the name "External". This is indicated as the dummy target that everyone sent SMS to, so "external" receive SMS from people from the park. However, it is aggregated count, combining of whoever outside of the part that people in part sent SMS to.
Now, we are left with 2 other IDs: 839736 and 1278894. These 2 IDs have the high volume of communication that needs further investigation. We wanted to discover who are they communicated with? Did they send a plenty of SMSes to small groups of people or they send to a huge group of recipients?
#Found 02
To answer those questions, we went further to plot a rough cut of how many distinct IDs that ever had communication with these 2 IDs, The result on the right hand side has confirmed with us that: the 2 IDs established mass communication to thousands of people.
The "External" is as expected, this ID does not have any record in form of responding back since this is a common ID represented for outside of the Park.
Now, we decided to know in depth of what exactly are they? In this case, we are unable to make a conclusion by using the current chart. At the next step, we observe the behavior of those IDs. See the result presented in the next section.

A CLOSE UP
ID 1278894

#Found 03
ID 1278894 has "Office working hours". Visualizing sending message of this IDs, we found it only happens from 12 noon till 8.55PM every day crossing 3 days. This sounds interesting and brings us to suspect that ID may be related to the office of the park. We will zoom in and find what does this ID do during that time window.

NTD Comm Out 1278894 3days.png

#Found 04
ID 1278894 has hourly seasoning active - on and off regularly. That should define the ID nature of communication is highly subjected to operation of an organization. The pattern on-off each hour alternatively is outstanding from normal human communicating patten

NTD Comm Out 1278894 1hr.png

#Found 05
ID 1278894 has "machine style". Diving to better resolution, we conclude that ID operation is a robot since it keeps on sending out messages every 5 minutes. Furthermore, it sends to a huge number of IDs in the part, look at the position of IDs where sms was sent, it is fixed only in Entry Corridor where the office is located. Therefore, that should be only a POST server that used by the park to send regular messages to subscribers in the park. The "In" graph shows responding from visitors sending back respond to the auto server, that explains why there are some late responds, for instance, the 2 points of time highlighted in the chart below.
Matching this operation with the park's information that shown in their official website, we concluded: this is the GAME sending server which operated by the park to send to and get responds from visitors.

NTD Comm In 1278894 3days.png


Lets's move on to next ID to find what was hapenning with it.

ID 839736

#Found 07
ID 839736 has similarity with ID 1278894 in terms of working hours pattern. This ID also has its active hours but wider range 8AM to 11.30PM. This is official operating hours of the park. So we may not be able to conclude at this moment. However, this ID looks like a "public service" where it sends and gets sms from a massive crowd. Further investigation is needed. The below graph shows the ID getting responding out sms for selected zones, excluded Wetland and Coaster Alley. Higher rate was found on Saturday and Sunday.

NTD Comm In 839736.png

#Found 08
ID 839736 has similarity with ID 1278894 shows absolute disturbing pattern on Sunday with 2 peaks at 12noon and 2.45PM and messiness fluctuation. This is highly correlated to the criminal timing. This pattern demonstrated a human related behavior nature. We can make a close-guess that is Customer Service caller ID - who are human and deal with everyone in the park. They receive and plenty calls when the public found out the crime and wish call and clarify or inquiry. Pulling out records FROM & TO this ID, we found similarity in both ways of communication. That fits with our assumption that ID 839736 is a customer service.

NTD Comm 839736.png
ID External

#Found 09
ID External is an arbitrary ID represented for any Communication node from outside of the park. This ID tracking reveals an interesting fact of time when the public found the crime scene. At 11.59AM Sunday, there is a top peak of communication from guesses in the park to outside world. That is sharp with a tip. The data suggest that about 11.55AM, the public has found crime scene, so they call to customer services and at the same time take pictures and/or sms to friends to share the "hot news"

NTD Comm To External.png

We have been diving 3 outstanding IDs and identify them. However, we do have vast of interest to see the pattern of guests in the park. Therefore, in this part, we will filter out all 3 IDs above to study the rest.

The rest

#Found 10

The overall trend shows that weekends on Saturday and Sunday are more communication among park's guests. That may because of the volume of visitors is high during weekends. The data also can show the trend of the show business: visitors normally will aim for the last show rather than the first show. This is really happening here as well.
This communication pattern shows the high peak regularly appeared in Coaster Alley where the performance was conducted at that time (11AM, 4PM+), but on Sunday, the second peak was no more, due to the cancellation of the tour. We also found some interesting peak in Wetland during Sunday which raised the second peak locally at 12noon. We will navigate through them closer.

NTD Comm Out rest.png

And this is a zoom in for COAST ALLEY:

NTD Comm Out rest CoastAlley.png