ISSS608 2016 17T1 Group14 Proposal

From Visual Analytics and Applications
Jump to navigation Jump to search

ProjectGroup14.png

Motivation

Nowadays, with the rapid development of technology, people are becoming more reluctant to do outdoor activities such as sport. There are many types of sports that can be done. One of the most popular sports worldwide is football. Football can be considered as the most favourite sport in the world, especially in Europe. The motivation of this project is to discover interesting findings about the top 4 leagues in Europe. (England, Spain, Italy, Germany).


Background

Football is one of the most popular sports in the world, especially in Europe. In order to provide useful findings and insights, data visualization is used to answer these questions:

  • How is the performance of the teams in top 4 leagues in Europe in the first 10 matches last season (2015/2016) compared to this season (2016/2017)?
  • How is the composition of Half Time Result and Full Time Result of the team in the selected league?


Data Source

There are few sources used for this project. The first dataset is obtained from http://www.football-data.co.uk/data.php. Data from top 4 leagues in Europe is combined from this source.

The dataset consist of the following fields:
All data is in csv format, ready for use within standard spreadsheet applications.
Key to results data:

  • Div = League Division
  • Date = Match Date (dd/mm/yy)
  • HomeTeam = Home Team
  • AwayTeam = Away Team
  • FTHG = Full Time Home Team Goals
  • FTAG = Full Time Away Team Goals
  • FTR = Full Time Result (H=Home Win, D=Draw, A=Away Win)
  • HTHG = Half Time Home Team Goals
  • HTAG = Half Time Away Team Goals
  • HTR = Half Time Result (H=Home Win, D=Draw, A=Away Win)


Match Statistics (where available)

  • Attendance = Crowd Attendance
  • Referee = Match Referee
  • HS = Home Team Shots
  • AS = Away Team Shots
  • HST = Home Team Shots on Target
  • AST = Away Team Shots on Target
  • HHW = Home Team Hit Woodwork
  • AHW = Away Team Hit Woodwork
  • HC = Home Team Corners
  • AC = Away Team Corners
  • HF = Home Team Fouls Committed
  • AF = Away Team Fouls Committed
  • HO = Home Team Offsides
  • AO = Away Team Offsides
  • HY = Home Team Yellow Cards
  • AY = Away Team Yellow Cards
  • HR = Home Team Red Cards
  • AR = Away Team Red Cards
  • HBP = Home Team Bookings Points (10 = yellow, 25 = red)
  • ABP = Away Team Bookings Points (10 = yellow, 25 = red)

The second dataset is used to compare the team's performance this season and last season. Data from various sources is compiled for this dataset. https://www.premierleague.com/tables
http://www.legaseriea.it/en/serie-a-tim/league-table
http://www.laliga.es/en/laliga-santander
http://www.bundesliga.com/en/stats/table/

Challenges

There are few challenges faced during the project:

  • Combining the data from various sources and different formats into one dataset
  • Unfamiliar of using d3.js
  • Finding interesting patterns or insights from the visualizations


Approaches

Slopegraph: data visualization technique that allows us to visualize data in a way that makes easy quite a number of observations and comparisons.
Slopegraph is used to visualize and compare the changes of different attributes that indicates how the performance of the teams last season compared to this season.
Parallel Sets: a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation.
Parallel sets is used to compare different attributes of categorical data in one graph.

References

http://www.storytellingwithdata.com/blog/2014/03/more-on-slopegraphs
http://dataremixed.com/2013/12/slopegraphs-in-tableau/
https://eagereyes.org/parallel-sets