ISSS608 2016-17 T3 Assign CHEN YINJUE Data Analytics

From Visual Analytics and Applications
Jump to navigation Jump to search

Bird.jpg VAST Challenge 2017 : Help Rose-Crested Blue Pipit

Introduction

Data Preparation

Visualization & Insights

Conclusion

Data Visualization & Insights

1. Loading data into tableau

After data preparation, now it's time to draw sunburst chart in tableau. Firstly, open tableau, under ‘Connect’ click ‘Excel’ and then select the Excel file we prepared before. Noted that we should choose ‘open with legacy connection’ in this step so that custom SQL can be used later. Secondly, drag ‘New Custom SQL’ to the area with ‘Drag tables here’ and apply SQL code as below:

SQL.png


Besides, to lay out the path on the map later, the coordinate table should also be loaded into tableau. So, at the left pane of ‘Data source’, click ‘Add’ (to the right of ‘Connection’) and choose file ‘Coordinate table’. After that, we can select ‘Coordinate table’ under ‘Connection’ and drag ‘Coordinate table.csv’ to join with ‘Custom SQL Query’ by matching the ‘gate name’ of these two sheets. Below shows a sample of output:

Tableau datasource.png


2. Drawing the Sunburst Diagram

1) Step-by-Step Description

2) Output & Discovery

Below is the sunburst diagram showing from level 1 to level 20. To make the graph readable, I purposely colored same gate type with same color:

• Entrance: Blue

• General-Gate: Yellow

• Gate: Brown

• Camp Site: Green

• Ranger-stop: Red

• Ranger-base: Purple

Sunburst.png


Now, from this sunburst chart, we can summarize several general route patterns:

1)) Cars that start from entrance:

• Cars that enter by an entrance and then directly exit the park by an entrance (can be either the same entrance or another entrance):

Entrance > Entrance


• Cars that enter the park and go through one or several general-gate(s) and then exit the park:

Entrance > General-gate(s) > Entrance


• Cars that enter the park for camping (basically includes all routes with camp sites in):

Entrance > … > Camping(s) > … > Entrance


• Cars that enter the park and pass by ranger-stop(s) without going to any camp site:

Entrance > … > Ranger-stop(s) > … > Entrance


• Cars that enter the park and illegally pass by gate(s):

Entrance > … > Gate(s) > … > Entrance


2)) Cars that start from ranger-base, which basically should be cars with type ‘2P’. We should pay attention to checking whether there is any other car-type that illegally has this kind of route pattern.


3. Pattern Detection

1) Splitting Dataset

After attaining an overview as above, next we can begin to divide our datasets. Briefly, the restructured dataset can be divided into:

• Single path & Single day

• Single path & Multiple days

• Multiple paths


So, how to do split the dataset? The steps are descripted as below:


2) Grab the Patterns

As mentioned before, we have several patterns that can be viewed in sunburst chart already. Therefore, we can further divide the dataset based on route patterns. After that the weight of routes can be calculated and sorted and we pick up records whose routes have relatively heavier weights than others to treat them as major path pattern. Next, we can extract all specific data (those have heavier weight ones) to build calendar view so that the duration (which months or what hours) have heavier traffic for specific path patterns can be displayed, respectively. Besides, map can be used to demo the corridors. Details of steps are as below:

• Using the general route patterns summarized from sunburst to apply filter on the dataset, which are ‘Single path & Single day’, ‘Single path & Multiple days’ so that the original dataset can be further divided into several worksheets based on routes pattern, name the worksheets respectively (Noted that when using filter, we can just pick out the entire paths of similar route pattern to get a car-id list, then join the whole dataset (without filtering) with the car-id list table).

• Next, each worksheet can be loaded into JMP to calculate weights of different routes (noted only entire paths’ weight will be used to analyse) so that those entire paths with higher number of records can be picked out as major trip patterns.

• Then data will be further extracted from the filtering dataset based on the major trip patterns that we got from step two. By now, the timestamp of these data can be used to build the calendar view so that we can know which duration (months within a year or hours within a day) have bigger traffic volumes.

• Using map to lay out the specific path pattern.


1)) ‘Single path & Single day’

Based on the analysis steps described above, we can get these several route patterns from the dataset:


• Entrance > Entrance

For this specific short route pattern, it is discovered that entrance0, entrance2, entrance3, entrance4 have occurred very frequently:

E,e.png


Although there are four paths shown above, we can summarize the patterns as: one route between entrance2 & 4 while the other path is between entrance0 & 3.

Trip between entrance2 and entrance4 (number of records:824):

Map ee 1.png


Trip between entrance0 and entrance3 (number of records:738):

Map ee 2.png


Car-type 1 occurs most in this route pattern, followed by type 2, type3 and type4, orderly. Type1 & 2 have quite close number of records while type3 & 4 also have similar weights:

Car type(e,e).png


From calendar, firstly, it can be seen from the charts that during November to January, there are relatively much less records of this kind of pattern compared to other months. Besides, during June to August there are obviously more path records, especially July and August.

Ee(mth).png


For route between entrance2 and entrance4, records more frequently occurred during midnight, especially 3am. Also, quite a lot records at dusk (6pm):

Calendar(ee24,h).png


Meanwhile, for path between entrance0 and entrance3, most records happened early in the morning (6am) as well as around noon (11am and 2pm~3pm):

Calendar(ee03,h).png


• Entrance > General-gate(s) > Entrance

Similarly, as before, we can summarize a route, which goes through entrance0, general-gate4, general-gate7, entrance1, has quite heavy weight that equals to 826 (including both directions). Moreover, there is another path pattern has quite a lot number of records (746) also, which is the one goes through entrance3, general-gate7 and entrance1 (both directions also):

Egge(jmp).png


Path1: goes through entrance0, general-gate4, general-gate7, entrance1 (826 records):

Egge(path1).png


Path2: goes through entrance3, general-gate7, entrance1 (746 records):

Egge(path2).png


Car-type 1 occurs most in this route pattern, followed by car-type 2. Besides, type3 and type4 have quite close number of records:

Eggetype.png


From the calendar view, firstly, it can be seen from the charts that during November to January, there are relatively much less records of this kind of pattern compared to other months. Besides, during June to August there are obviously more path records, especially July and August.

Ee(mth).png


For route between entrance2 and entrance4, records more frequently occurred during midnight, especially 3am. Also, quite a lot records at dusk (6pm):

Calendar(ee24,h).png


Meanwhile, for path between entrance0 and entrance3, most records happened early in the morning (6am) as well as around noon (11am and 2pm~3pm):

Calendar(ee03,h).png


• Entrance > … > Camping(s) > … > Entrance


• Entrance > … > Ranger-stop(s) > … > Entrance

For this pattern, there are two paths have highest weights (413 records each path) compared to other routes of similar pattern:

entrance1 > general-gate7 > general-gate4 > general-gate1 > ranger-stop2 > ranger-stop0 > general-gate2 > entrance2:

Path 1.png


entrance2 > general-gate2 > ranger-stop0 > ranger-stop2 > general-gate1 > entrance0:

Path 2.png


Car-type 1 occurs most in this route pattern, followed by car-type 2:

Car type(e,...,rs,...e).png


It is found that not all the days have this kind of routes records. Moreover, there are more number of records in 2015 than in 2016. But of course, because the months recorded in 2015 and 2016 are all different except for May and number of records of May in 2016 seems start to increase compared to the earlier months in the same year. So, it is reasonable to make a guess that this path route tends to happen more frequently during May to September:

Calendar(e,rs,e,mth).png


Besides, below shows that the route pattern happens more during night-time compared to day-time (20pm~22pm, 1am~3am):

Calendar(e,rs,e,hour).png


• Entrance > … > Gate(s) > … > Entrance