ISSS608 2016 17T1 Group14 Report
Contents
Motivation
Nowadays, with the rapid development of technology, people are becoming more reluctant to do outdoor activities such as sport. There are many types of sports that can be done. One of the most popular sports worldwide is football. Football can be considered as the most favourite sport in the world, especially in Europe. Finding interesting observations and insights about the statistics and results from top leagues in Europe may help to encourage people to watch or even play football. The motivation of this project is to discover interesting findings about the top 4 leagues in Europe. (England, Spain, Italy, Germany).
Background
Football is one of the most popular sports in the world, especially in Europe. In order to provide useful findings and insights, data visualization is used to answer these questions:
- How is the performance of the teams in top 4 leagues in Europe in the first 10 matches last season (2015/2016) compared to this season (2016/2017)?
- How is the composition of Full Time Result compared to Half Time Result of the teams in the selected league?
- Are there any patterns on last season's Full Time Result?
- Are there any interesting or unexpected match results from last season? Which league is more predictable?
Data Sources & Data Preparation
There are few sources used for this project. The first dataset is obtained from http://www.football-data.co.uk/data.php. Data from top 4 leagues in Europe is combined from this source.
The dataset consist of the following fields:
All data is in csv format, ready for use within standard spreadsheet applications.
Key to results data:
- Div = League Division
- Date = Match Date (dd/mm/yy)
- HomeTeam = Home Team
- AwayTeam = Away Team
- FTHG = Full Time Home Team Goals
- FTAG = Full Time Away Team Goals
- FTR = Full Time Result (H=Home Win, D=Draw, A=Away Win)
- HTHG = Half Time Home Team Goals
- HTAG = Half Time Away Team Goals
- HTR = Half Time Result (H=Home Win, D=Draw, A=Away Win)
Match Statistics (where available)
- Attendance = Crowd Attendance
- Referee = Match Referee
- HS = Home Team Shots
- AS = Away Team Shots
- HST = Home Team Shots on Target
- AST = Away Team Shots on Target
- HHW = Home Team Hit Woodwork
- AHW = Away Team Hit Woodwork
- HC = Home Team Corners
- AC = Away Team Corners
- HF = Home Team Fouls Committed
- AF = Away Team Fouls Committed
- HO = Home Team Offsides
- AO = Away Team Offsides
- HY = Home Team Yellow Cards
- AY = Away Team Yellow Cards
- HR = Home Team Red Cards
- AR = Away Team Red Cards
- HBP = Home Team Bookings Points (10 = yellow, 25 = red)
- ABP = Away Team Bookings Points (10 = yellow, 25 = red)
The second dataset is used to compare the team's performance this season and last season. Data from various sources is compiled for this dataset.
https://www.premierleague.com/tables
http://www.legaseriea.it/en/serie-a-tim/league-table
http://www.laliga.es/en/laliga-santander
http://www.bundesliga.com/en/stats/table/
Approaches
Slopegraph: data visualization technique that allows us to visualize data in a way that makes easy quite a number of observations and comparisons.
Slopegraph is used to visualize and compare the changes of different attributes that indicates how the performance of the teams last season compared to this season.
Parallel Sets: a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation.
Parallel sets is used to compare different attributes of categorical data in one graph.
Challenges
There are few challenges faced during the project:
- Combining the data from various sources and different formats into one dataset
- Unfamiliarity of using d3.js
- Finding interesting patterns or insights from the visualizations
Applications
d3.js
d3.js is used to visualize team stats for 2015/2016 season using Parallel Sets.
- Open the link to view the graph : Parallel Sets
- Choose the League and the attributes to visualize (e.g. League = Barclays Premier League, attributes = Home Team, FTR, HTR) and click "Update"
- You can dynamically change the order or placement of the categories by dragging it to the place you want
- The source code can be accessed on this link: Code
Tableau
Tableau is used to compare between last season's team stats and this season's stats. Slopegraph is built to visualize the comparison. The link below is the link to the slopegraph shared on Tableau Public.
Slopegraph
We also use Tableau to visualize last season's results to reveal a pattern in the data and observe interesting match results from the graph. The link to Tableau Public can be accessed in the link below.
Full Time Results
- Open the Tableau Public link to view the graph.
- Use filters for different combinations.
Results and Insights
The visualization results can be seen below.
Comparing 2015/2016 season to 2016/2017 season
The slopegraph compares the team’s performance last season compared to this season. Blue color indicates that a team has better performance this season than last season on the selected attribute, while red color indicates worse performance. Thickness of the line shows the magnitude of change.
From this graph, it can be seen that Chelsea and Liverpool has a much better performance in terms of Points after 10 matches this season compared to last season while West Ham, Leicester, and Swansea didn't perform as well as they did last season.
Composition of Half Time and Full Time Results
In parallel sets, each line-set represents the dimension chosen to be visualized. The width of the line indicates the number of observations for that dimension. The color of the line is used to show and compare the distribution between different categories.
From the graph, it can be seen that Southampton has a very good home record where they were mostly leading at half time (12 out of 19), and also managed to maintain that lead until the end of the match. (10 out of 12)
2015/2016 Full Time Results
The visualization of Full Time Results for the top 4 Leagues in Europe (England, Spain, Italy, Germany) can be seen below.
The visualization of Full Time Results for Barclays Premier League 2015/2016 season.
In this graph, the home team is sorted descendingly by their total home points and the away team is sorted ascendingly by total away points. This design is intended to reveal a pattern. The pattern will most likely show green colors on the upper left part and red colors on the lower right part. However, there are a few unexpected results as highlighted.
- The first unexpected result is Man United vs Norwich, Man United is the second best in terms of total home points and Norwich is the third worst in terms of total away points. However, Norwich managed to get an away win against Man United.
- The second unexpected result is Tottenham vs Newcastle, Newcastle only has two away wins in the entire 2015/2016 season. One of them was against Tottenham who has a relatively good home record (6th best home record).
- The third unexpected result is West Brom vs Arsenal, West Brom is the fourth worst in terms of total home points while Arsenal is the third best in total away points. However, West Brom managed to get a home win against Arsenal.
The visualization of Full Time Results for La Liga 2015/2016 season.
From this graph, the pattern most likely shows green colors on the upper left part and red colors on the lower right part. Therefore, no "unexpected" results found.
The visualization of Full Time Results for Bundesliga 2015/2016 season.
Same as the previous graph, the pattern most likely shows green colors on the upper left part and red colors on the lower right part and therefore, no "unexpected" results found from Bundesliga full time results.
The visualization of Full Time Results for Serie A 2015/2016 season.
There are few unexpected results in Serie A 2015/2016 season.
- First unexpected result is Verona vs Juventus. Verona has the worst home record while Juventus has the best away record in Serie A. However, Verona managed to get a home win against Juventus.
- Second and third unexpected result in Serie A last season is related to Napoli. Napoli has the third best away record. However, they got two away losses against two teams which have the second and third worst home record (Bologna and Udinese).
Comparing the full time results of different leagues also show us that Bundesliga and La Liga matches are more predictable as there are less unpredictable or unexpected results compared to Barclays Premier League and Serie A.
Future Work
- Visualize football data from other continents (Asia, America,Africa)
- Include more seasons in the visualization
- Explore other data visualization techniques like parallel coordinates or radar chart.
References
http://www.storytellingwithdata.com/blog/2014/03/more-on-slopegraphs
http://dataremixed.com/2013/12/slopegraphs-in-tableau/
https://eagereyes.org/parallel-sets