ISSS608 2016-17 T1 Assign3 Nguyen Tien Duong

From Visual Analytics and Applications
Revision as of 22:10, 28 October 2016 by Tdnguyen.2015 (talk | contribs)
Jump to navigation Jump to search
NTD A3 Banner.png
ISSS608 2016-17 Assignment 3
Nguyen Tien Duong
This is the story about the park whose name is "DinoFun World"
THE LUNCH CALL

01:00 PM "Hallo, KAMPONG detective office here"
01:00 PM "Hi, We are calling from Dinno world, we are chaotic now... need your help urgently please......" - a worrying voice
01:01 PM "Sir, please stay calm. We understand your situation, let us know the case"
01:01 PM "Please help us, we are closing down now, a VIP person maybe in danger."
...
01:14 PM "A team has been dispatched to your place. We will prey the criminal down."

A team from KAMPONG detective office was sent to Dinno park to interview, collect data and prey the criminal.
NTD Detective logo.png

THE PARK

NTD park map.png

++++++++Place holder, dummy content********* The Visual Analytics Science and Technology (VAST) Challenge is an annual contest with the goal of advancing the field of visual analytics through competition. The VAST Challenge is designed to help researchers understand how their software would be used in a novel analytic task and determine if their data transformations, visualizations, and interactions would be beneficial for particular analytic tasks. VAST Challenge problems provide researchers with realistic tasks and data sets for evaluating their software, as well as an opportunity to advance the field by solving more complex problems. Researchers and software providers have repeatedly used the data sets from throughout the life of the VAST Challenge as benchmarks to demonstrate and test the capabilities of their systems. The ground truth embedded in the data sets has helped researchers evaluate and strengthen the utility of their visualizations.

THE ENVIDENCE

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. COMMUNICATION
orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
MOVEMENT
orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
SETTINGS
orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

BACKSTAGE TECHNICAL

This BACKSTAGE TECHNICAL information provides insigh of evedence investigation process. Since the collection are encoded in data, the detective team needs to execute proper data analysis technique from overview, filtering, brushing and zoom in detail to gain the insight juice of the collected data.

NTD explore data.png Data Exploratory: Diving in dataset, regconize problems and resolutionalize to target issues before process any slicing process NTD data prep.png Data Preparation: Critical process to make data usable. Slice data by difference ways, aggregate and stage data in appropriate storage for investigation NTD trial error.png Trials: POC process to try on different technologies, different approaches and techniques to visualize datasets that may be challenging to understand. This process involves plenty of trials and errors. NTD data implementation.pngHackaton: Code implementation to build up visualization and create meaningful information from provided data. This task includes plenty of codes and application usage

COMMUNICATION AT A GLANCE

After a very short phone communication with Dinno Managers, the team quickly moves on exploring the content that collected from executives team deployed to the park. The first assignment is always to grab the highest overview level of data.
Different approaches are proposed such as network analysis using sigmajs, build up heat map to check the overall distribution of data, a bar chart to grab the quickest information about data statistic. Evaluating options:

  • Sigma.Js (or any network analysis): This is not going to be promising. The communication data seems very big, and mass network analysis is inefficient.
  • Heatmap: Considerable. It can give the overall feeling of how the frequency of each communication is distributed. However, it is tough to differentiate colors and unable to sort by value.
  • Bar chart: This is selected solution. Simple, right-away answer to see who is the most, or how the communication data is distributed. However, the drawback of this method is to be unable to see the entire whole picture of everything, since the number of visitors could be thousands. We can consider using heatmap to corporate with this bar chart.

NTD Comm overview.png

#Found 01
Right after the graph is out, we are alerted of 2 IDs extremely outstanding high communication volume in 2 graphs and 1 additional ID only appears in "IN" graph, which suggests the receiver.
Let's tackle the easiest ID who comes with the name "External". This is indicated as the dummy target that everyone sent SMS to, so "external" receive SMS from people from the park. However, it is aggregated count, combining of whoever outside of the part that people in part sent SMS to.
Now, we are left with 2 other IDs: 839736 and 1278894. These 2 IDs have the high volume of communication that needs further investigation. We wanted to discover who are they communicated with? Did they send a plenty of SMSes to small groups of people or they send to a huge group of recipients?
#Found 02
To answer those questions, we went further to plot a rough cut of how many distinct IDs that ever had communication with these 2 IDs, The result on the right hand side has confirmed with us that: the 2 IDs established mass communication to thousands of people.
The "External" is as expected, this ID does not have any record in form of responding back since this is a common ID represented for outside of the Park.
Now, we decided to know in depth of what exactly are they? In this case, we are unable to make a conclusion by using the current chart. At the next step, we observe the behavior of those IDs. See the result presented in the next section.

A CLOSE UP

#Found 03
ID 1278894 has "Office working hours". Visualizing sending message of this IDs, we found it only happens from 12 noon till 8.55PM every day crossing 3 days. This sounds interesting and bring us to suspectation that ID maybe related to office of the park. We will zoom in and find what does this ID do during that time window.

NTD Comm Out 1278894 3days.png

#Found 04
ID 1278894 has hourly seasoning active - on and off regularly. That should define the ID nature of communication is highly subjected to operation of an organization. The pattern on-off each hour alternatively is outstanding from normal human communicating patten

NTD Comm Out 1278894 1hr.png

#Found 05
ID 1278894 has "machine style". Diving to better resolution, we conclude that ID operation is a robot since it keeps on sending out messages every 5 minutes. Furthermore, it sends to a huge number of IDs in the part, look at the position of IDs where sms was sent, it is fixed only in Entry Corridor where the office is located. Therefore, that should be only a POST server that used by the park to send regular messages to subscribers in the park. The "In" graph shows responding from visitors sending back respond to the auto server, that explains why there are some late responds, for instance, the 2 points of time highlighted in the chart below.

NTD Comm In 1278894 3days.png