Difference between revisions of "1718t1is428T9"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
 
(38 intermediate revisions by 3 users not shown)
Line 16: Line 16:
 
| style="background:#565555;" width="1%" |  
 
| style="background:#565555;" width="1%" |  
 
| style="font-family:tahoma; padding:0.3em; font-size:100%;  background-color:#565555;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
 
| style="font-family:tahoma; padding:0.3em; font-size:100%;  background-color:#565555;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
[[1718t1is428T9: Poster  |
+
[[1718t1is428T9_Poster: Poster  |
 
<font color="#F5F5F5" size=2><b>POSTER</b></font>]]
 
<font color="#F5F5F5" size=2><b>POSTER</b></font>]]
  
 
| style="background:#565555;" width="1%" | &nbsp;
 
| style="background:#565555;" width="1%" | &nbsp;
 
| style="font-family:tahoma; padding:0.3em; font-size:100%;  background-color:#565555;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
 
| style="font-family:tahoma; padding:0.3em; font-size:100%;  background-color:#565555;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
[[1718t1is428T9: Application |
+
[[1718t1is428T9_Application: Application |
 
<font color="#F5F5F5" size=2><b>APPLICATION</b></font>]]
 
<font color="#F5F5F5" size=2><b>APPLICATION</b></font>]]
  
 
| style="background:#565555;" width="1%" | &nbsp;
 
| style="background:#565555;" width="1%" | &nbsp;
 
| style="font-family:tahoma; padding:0.3em; font-size:100%;  background-color:#565555;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
 
| style="font-family:tahoma; padding:0.3em; font-size:100%;  background-color:#565555;  border-bottom:0px solid #3D9DD7; text-align:center; color:#F5F5F5" width="10%" |  
[[1718t1is428T9: Research Paper |
+
[[1718t1is428T9_Research_Paper: Research Paper |
 
<font color="#F5F5F5" size=2><b>RESEARCH PAPER</b></font>]]
 
<font color="#F5F5F5" size=2><b>RESEARCH PAPER</b></font>]]
 
|}
 
|}
 +
<br>
 
<!--/Header-->
 
<!--/Header-->
  
== Proposal ==
+
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Problem & Motivation</font></div>==
While harmful and underage college drinking are significant public health problems, and they exact an enormous toll on the intellectual and social lives of student on campus. With the uptrend of student having drinking habits, this study is to find out that how alcohol consumption affect student general well-being, relationships and mostly, their grades.
 
  
== Objective ==
+
Flight delays has been a very common problem for travelers, the delay can be attributable to various problems, such as, aircraft issues, weather issues at origin airport or/and destination airport. The delay has no doubts will disappoint air travelers and affect their flight experience greatly. Thus, in this project, our team aims to investigate the performance of different airlines and flight delays in detail.
The aim of this project is to visually analysis how alcohol affect students general well-being, the area we are going to specify are:
+
 
 +
In addition, airport network is a very critical and complex transportation infrastructure for a nation, it is increasingly important for public policy considerations. The disruptions of the airport network, caused by terrorist attack, disease transmission or other reasons, can lead to huge economic loss. Thus, the study on the airport network can assist us better understand the relationship between different airports, for example, identify most critical airport, and take proactive measures to prevent occurrence of disruptions. 
 +
 
 +
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Objectives</font></div>==
 +
 
 +
In this project, we will adopt visualization techniques to:
 
<ul>
 
<ul>
<li>Demographics of student alcohol consumption</li>
+
<li>Demographics of student alcohol consumptionAnalyse airport network connectivity</li>
<li>How alcohol affects their academic standing</li>
+
<li>Analyse flight delays for different airlines</li>
<li>Relationship with family.</li>
+
<li>Evaluate on-time performance for airlines and aircrafts</li>
 
</ul>
 
</ul>
 +
With the visualization, airline companies will become aware of its on-time performance among all airlines and meanwhile have a better idea on areas where greater attention should be placed on routine operation, such as service or aircraft maintenance.
 +
Our visualization will also provide a detailed insight on airport network, it will speed up the decision making process when faced with infectious diseases and terrorist attacks.
 +
 +
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Selected Dataset</font></div>==
 +
 +
We have obtained the dataset from Kaggle, which can be download from https://www.kaggle.com/usdot/flight-delays/data
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|-
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;width: 30%;" | Dataset/Source
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;width: 30%" | Data Attributes
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;" | Rationale Of Usage
 +
|-
 +
| <center>airline.csv<br/>
 +
||
 +
* IATA_Code, String, Airline identifier
 +
* Airline, String, Airport Name
 +
||
 +
<center>This data is used to identify and provide detailed information about the different airlines. </center>
 +
|-
 +
| <center>airport.csv<br/></center>
 +
||
 +
* IATA_Code, String, Location identifier
 +
* Airport, String, Airport Name
 +
* City, String, City of Airport
 +
* State, String, State of Airport
 +
* Country, String, State of Airport
 +
* Latitude, Numeric, Latitude of the Airport
 +
* Longitude, Numeric, Longitude of the Airport
 +
||
 +
<center>This data is used to identify and provide detailed information about different airport. It complements the main dataset by providing detailed location information about latitude and longitude, city, state and country of the airport.</center>
 +
|-
 +
| <center>flights.csv</center>
 +
||
 +
* Year, Numeric, Year of the flight
 +
* Month, Numeric, Month of the flight
 +
* Day, Numeric, Day of the flight
 +
* Day_of_Week, Numeric, Day of week of the flight
 +
* Airline, String, Airline identifier
 +
* Tail_Number, String, Aircraft identifier
 +
* Origin_Airport, String, Departing airport
 +
* Destination_Airport, String, Destination airport
 +
* Departure_Delary, Numeric, Total delay on Departure, negative value indicates the flight departs before scheduled time
 +
* Arrival_Delay, Numeric, Total delay on arrival, it is derived from the difference of arrival_time and scheduled_arrival, negative value
 +
* indicates the flight arrived before scheduled time.
 +
* Diverted,Numeric (binary data), Aircraft landed on airport that out of schedule
 +
* Cancelled, Numeric (binary data), 1 means cancelled
 +
* Cancellation_Reason, String, Reason for Cancellation of flight: A - Airline/Carrier; B - Weather; C - National Air System; D -
 +
Security
 +
* Air_System_Delay, String, Delay caused by air system
 +
* Security_Delay, String, Delay caused by security
 +
* Airline_Delay, String, Delay caused by airline
 +
* Late_Aircraft_Delay, String, Delay caused by aircraft
 +
* Weather_Delay, String, Delay caused by weather
 +
||
 +
<center>This data is used as the major source of information in our project. We mainly use this data to analyse flight delays and reasons of delay. In addition, the data will be used investigate airport network and analyse airport network relationship by different centrality measures, such as betweenness centrality, degree centrality. </center>
 +
|}
 +
 +
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Background Survey of Related Work</font></div>==
 +
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|-
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;width: 50%;" | Related Works
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;" | What We Can Learn
 +
|-
 +
|
 +
<p><center>'''Monthly Performance of Airline in Asia Pacific''' </center></p>
 +
[[File:1718T1G9 BackgroundSurvey1.png|400px|center]]
 +
<p><center>'''Source''': https://www.flightstats.com/company/monthly-performance-reports/airlines/</center></p>
 +
||
 +
* The heatmap provides a clear annotation from which viewers know the size stands for the scheduled flights whereas color for on-time performance.
 +
* The colors are well contrast with each other
 +
|-
 +
| <p><center> '''Trends in the Causes of Flight Delay in US''' </center></p>
 +
[[File:1718T1G9 BackgroundSurvey2.png|400px|center]]
 +
<p><center> '''Source''': https://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/2012_04_13.pdf </center> </p>
 +
||
 +
* The use of line chart is effective in comparing the various delay causes
 +
* The chart title is clear enough to demonstrate the chart purpose
 +
|-
 +
| <p><center> '''Global Digital Attack Network''' </center></p>
 +
[[File:1718T1G9 BackgroundSurvey3.png|400px|center]]
 +
<p><center> '''Source''': http://www.digitalattackmap.com/#anim=1&color=0&country=US&list=0&time=17475&view=map </center></p>
 +
||
 +
* The graph vividly displays the path with its origin and destination
 +
* When mouse hovers on the path, the label shows up with its detailed information
 +
|}
 +
 +
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Proposed Dashboard</font></div>==
 +
 +
{| class="wikitable" style="background-color:#FFFFFF;" width="100%"
 +
|-
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;width: 50%;" | Proposed Layout
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;" | How Analyst Conduct Analysis
 +
|-
 +
|
 +
<p><center>'''Dashboard of Flight Route and Arrival Delay By Airline''' </center></p>
 +
[[File:1718T1G9 ProposedDashboard1.png|400px|center]]
 +
||
 +
The 2-columns dashboard will provide reader a brief layout of flight route map for the selected city. The chart on the right-hand side will give reader an overview of the average arrival delay of specific airline that depart from selected origin point.
 +
 +
In this dashboard, filter will also be provided to update the dashboard, so that readers can see and compare route maps or average arrival delay between different city.
 +
 +
|-
 +
| <p><center> '''Dashboard of Last Aircraft Delay by Airline and Aircraft''' </center></p>
 +
[[File:1718T1G9 ProposedDashboard2.png|400px|center]]
 +
||
 +
There are two bar charts in the dashboard. The bar chart at the top displays the sum of delay (in minute) caused by aircraft by airline in US, Jan, 2015 in descending order. The bar chart at the bottom shows the sum of delay (in minute) caused by aircraft by aircraft in US, Jan, 2015 in descending order.
 +
 +
In the dashboard, when user can hover over the bars in the bar chart at the top, the corresponding tip will show up and the bar chart at the bottom will also be filtered. With this, user will know which airline has the worst on-time performance due to aircraft and which aircraft contributes most to the airline’s delay.
 +
|}
 +
 +
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Technical Complexity</font></div>==
 +
 +
Below are the list of technical challenges that team may be faced with when developing the visualization application.
 +
{| class="wikitable" style="background-color:#ffffff;" width="100%"
 +
|-
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;width: 50%;" | Technical Challenges
 +
! style="font-weight: bold;background: #061935;color:#fbfcfd;" | How To Resolve
 +
|-
 +
| <center> Unfamiliar with D3.js libraries and building D3 application </center> ||
 +
* Attend D3.js workshop
 +
* Individual learning on how to build D3 application
 +
* Peer Learning
 +
|-
 +
| <center> Lack of knowledge on how to integrate tableau work with D3 application </center> ||
 +
* Research on how to integrate tableau work with D3 application
 +
* Conduct early integration so that team have enough time to tackle some potential errors.
 +
|-
 +
| <center> Insufficient metadata for the source data</center> ||
 +
* Research in the official website of US Department of Transportation
 +
* Arrange team discussion to facilitate the understanding for the source data
 +
|}
  
== Data Source==
+
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Tools/Technology</font></div>==
 +
Below are the tools/technologies we will use when developing the visualization
 +
* Excel
 +
* Tableau
 +
* D3.js
 +
* Gephi<br />
  
We have obtained the 2014 from Kaggle, which can be download from https://www.kaggle.com/uciml/student-alcohol-consumption
+
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Project Milestones</font></div>==
 +
[[File:1718g1t9_milestones.png|800px|center]]
  
The information provided are very extensive and comprehensive, it includes <br />
+
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>References</font></div>==
1. school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)<br />
+
* https://public.tableau.com/en-us/s/blog/2015/07/taking-path-function
2. sex - student's sex (binary: 'F' - female or 'M' - male) <br />
+
* https://www.youtube.com/watch?v=96Pa1kSJDHM
3. age - student's age (numeric: from 15 to 22) <br />
 
4. address - student's home address type (binary: 'U' - urban or 'R' - rural) <br />
 
5. famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) <br />
 
6. Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart) <br />
 
7. Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) <br />
 
8. Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) <br />
 
9. Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') <br />
 
10. Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') <br />
 
11. reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') <br />
 
12. guardian - student's guardian (nominal: 'mother', 'father' or 'other') <br />
 
13. traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) <br />
 
14. studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) <br />
 
15. failures - number of past class failures (numeric: n if 1<=n<3, else 4) <br />
 
16. schoolsup - extra educational support (binary: yes or no) <br />
 
17. famsup - family educational support (binary: yes or no) <br />
 
18. paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) <br />
 
19. activities - extra-curricular activities (binary: yes or no) <br />
 
20. nursery - attended nursery school (binary: yes or no) <br />
 
21. higher - wants to take higher education (binary: yes or no) <br />
 
22. internet - Internet access at home (binary: yes or no) <br />
 
23. romantic - with a romantic relationship (binary: yes or no) <br />
 
24. famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) <br />
 
25. freetime - free time after school (numeric: from 1 - very low to 5 - very high) <br />
 
26. goout - going out with friends (numeric: from 1 - very low to 5 - very high) <br />
 
27. Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high) <br />
 
28. Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) <br />
 
29. health - current health status (numeric: from 1 - very bad to 5 - very good) <br />
 
30. absences - number of school absences (numeric: from 0 to 93) <br />
 
  
== Project Milestones ==
+
==<div style="background: #061935; padding: 15px; font-weight: bold; line-height: 0.3em; text-indent: 15px; font-size:18px; font-family:helvetica"><font color= #ffffff>Comments</font></div>==
== References ==
 
== Comments ==
 

Latest revision as of 09:00, 15 November 2017

1718T9G1 Logo.jpg


HOME

 

PROPOSAL

 

POSTER

 

APPLICATION

 

RESEARCH PAPER


Problem & Motivation

Flight delays has been a very common problem for travelers, the delay can be attributable to various problems, such as, aircraft issues, weather issues at origin airport or/and destination airport. The delay has no doubts will disappoint air travelers and affect their flight experience greatly. Thus, in this project, our team aims to investigate the performance of different airlines and flight delays in detail.

In addition, airport network is a very critical and complex transportation infrastructure for a nation, it is increasingly important for public policy considerations. The disruptions of the airport network, caused by terrorist attack, disease transmission or other reasons, can lead to huge economic loss. Thus, the study on the airport network can assist us better understand the relationship between different airports, for example, identify most critical airport, and take proactive measures to prevent occurrence of disruptions.

Objectives

In this project, we will adopt visualization techniques to:

  • Demographics of student alcohol consumptionAnalyse airport network connectivity
  • Analyse flight delays for different airlines
  • Evaluate on-time performance for airlines and aircrafts

With the visualization, airline companies will become aware of its on-time performance among all airlines and meanwhile have a better idea on areas where greater attention should be placed on routine operation, such as service or aircraft maintenance. Our visualization will also provide a detailed insight on airport network, it will speed up the decision making process when faced with infectious diseases and terrorist attacks.

Selected Dataset

We have obtained the dataset from Kaggle, which can be download from https://www.kaggle.com/usdot/flight-delays/data

Dataset/Source Data Attributes Rationale Of Usage
airline.csv
  • IATA_Code, String, Airline identifier
  • Airline, String, Airport Name
This data is used to identify and provide detailed information about the different airlines.
airport.csv
  • IATA_Code, String, Location identifier
  • Airport, String, Airport Name
  • City, String, City of Airport
  • State, String, State of Airport
  • Country, String, State of Airport
  • Latitude, Numeric, Latitude of the Airport
  • Longitude, Numeric, Longitude of the Airport
This data is used to identify and provide detailed information about different airport. It complements the main dataset by providing detailed location information about latitude and longitude, city, state and country of the airport.
flights.csv
  • Year, Numeric, Year of the flight
  • Month, Numeric, Month of the flight
  • Day, Numeric, Day of the flight
  • Day_of_Week, Numeric, Day of week of the flight
  • Airline, String, Airline identifier
  • Tail_Number, String, Aircraft identifier
  • Origin_Airport, String, Departing airport
  • Destination_Airport, String, Destination airport
  • Departure_Delary, Numeric, Total delay on Departure, negative value indicates the flight departs before scheduled time
  • Arrival_Delay, Numeric, Total delay on arrival, it is derived from the difference of arrival_time and scheduled_arrival, negative value
  • indicates the flight arrived before scheduled time.
  • Diverted,Numeric (binary data), Aircraft landed on airport that out of schedule
  • Cancelled, Numeric (binary data), 1 means cancelled
  • Cancellation_Reason, String, Reason for Cancellation of flight: A - Airline/Carrier; B - Weather; C - National Air System; D -

Security

  • Air_System_Delay, String, Delay caused by air system
  • Security_Delay, String, Delay caused by security
  • Airline_Delay, String, Delay caused by airline
  • Late_Aircraft_Delay, String, Delay caused by aircraft
  • Weather_Delay, String, Delay caused by weather
This data is used as the major source of information in our project. We mainly use this data to analyse flight delays and reasons of delay. In addition, the data will be used investigate airport network and analyse airport network relationship by different centrality measures, such as betweenness centrality, degree centrality.

Background Survey of Related Work

Related Works What We Can Learn

Monthly Performance of Airline in Asia Pacific

1718T1G9 BackgroundSurvey1.png

Source: https://www.flightstats.com/company/monthly-performance-reports/airlines/

  • The heatmap provides a clear annotation from which viewers know the size stands for the scheduled flights whereas color for on-time performance.
  • The colors are well contrast with each other

Trends in the Causes of Flight Delay in US

1718T1G9 BackgroundSurvey2.png

Source: https://www.rita.dot.gov/bts/sites/rita.dot.gov.bts/files/2012_04_13.pdf

  • The use of line chart is effective in comparing the various delay causes
  • The chart title is clear enough to demonstrate the chart purpose

Global Digital Attack Network

1718T1G9 BackgroundSurvey3.png

Source: http://www.digitalattackmap.com/#anim=1&color=0&country=US&list=0&time=17475&view=map

  • The graph vividly displays the path with its origin and destination
  • When mouse hovers on the path, the label shows up with its detailed information

Proposed Dashboard

Proposed Layout How Analyst Conduct Analysis

Dashboard of Flight Route and Arrival Delay By Airline

1718T1G9 ProposedDashboard1.png

The 2-columns dashboard will provide reader a brief layout of flight route map for the selected city. The chart on the right-hand side will give reader an overview of the average arrival delay of specific airline that depart from selected origin point.

In this dashboard, filter will also be provided to update the dashboard, so that readers can see and compare route maps or average arrival delay between different city.

Dashboard of Last Aircraft Delay by Airline and Aircraft

1718T1G9 ProposedDashboard2.png

There are two bar charts in the dashboard. The bar chart at the top displays the sum of delay (in minute) caused by aircraft by airline in US, Jan, 2015 in descending order. The bar chart at the bottom shows the sum of delay (in minute) caused by aircraft by aircraft in US, Jan, 2015 in descending order.

In the dashboard, when user can hover over the bars in the bar chart at the top, the corresponding tip will show up and the bar chart at the bottom will also be filtered. With this, user will know which airline has the worst on-time performance due to aircraft and which aircraft contributes most to the airline’s delay.

Technical Complexity

Below are the list of technical challenges that team may be faced with when developing the visualization application.

Technical Challenges How To Resolve
Unfamiliar with D3.js libraries and building D3 application
  • Attend D3.js workshop
  • Individual learning on how to build D3 application
  • Peer Learning
Lack of knowledge on how to integrate tableau work with D3 application
  • Research on how to integrate tableau work with D3 application
  • Conduct early integration so that team have enough time to tackle some potential errors.
Insufficient metadata for the source data
  • Research in the official website of US Department of Transportation
  • Arrange team discussion to facilitate the understanding for the source data

Tools/Technology

Below are the tools/technologies we will use when developing the visualization

  • Excel
  • Tableau
  • D3.js
  • Gephi

Project Milestones

1718g1t9 milestones.png

References

Comments