Mystery at the Wildlife Preserve

From Visual Analytics and Applications
Jump to navigation Jump to search

1. Introduction

Since 2008, in May of each year the Indian Premier League is a 20 over cricket match played across the major cricket stadiums in India. The competitive tournament lasts for about 1 month. Players are auctioned months ahead for the 8 to 12 teams which would participate in the mega cricketing event of the year. The auction for the players is a crucial time for each team member and owner. Like most auctions the team owners must select the best member which fit their team. The auction for the widely-viewed event can cost team owners as much as $12-$25 million each year. The project delves into the statistics of the game which would be useful to team owners in season 10 and beyond before the auction. The owners and management would be able to understand the player’s comparative performance among other key stats.

2. Data Preparation

Data Given: ‘Matches’ data: (578 rows) id, season, city, date, team1&2, toss winner, toss_decision, result dl_applied, winner, win_by_runs, win_by_wickets, player_of_match, venue, umpire 1, 2 & 3

                ‘Deliveries’ data: (136599 rows) match_id, inning, batting_team, bowling_team, over, ball, batsman, non_striker, bowler, is_super_over, wide_runs, bye_runs, legbye_runs, noball_runs, penalty_runs, batsman_runs, extra_runs, total_runs, player_dismissed, dismissal_kind and fielder

Data Preparation: The Matches data and the Deliveries data were filtered based on the normal result and combined by the match id. A new data set created which combined the ball by ball result of each match.

data_prep.jpg

3. Model Building

The model was built in R using the following packages: ggplot2 (for data visualization), readr, plyr, dplyr, gridExtra, treemap, RColorBrewer, tidyr, radarchart The dplyr package was used for manipulating the large to long format, group_by and to manipulate other formats.

Implementation: The model was wrangled and plotted using ggplot. The Treemap was made using the package and depicts the runs a player made against bowlers in decreasing order. The legends were removed for ease of the viewer to read the data clearly among other things.