Difference between revisions of "Mystery at the Wildlife Preserve"

From Visual Analytics and Applications
Jump to navigation Jump to search
(NYC Taxi Trip Duration Visualisation Project)
(CricketViz: A visual analytics tool for discovering insights from Indian Premier League data)
Line 1: Line 1:
The project is a previous Kaggle challenge to build a model which predicts the total ride duration for taxis for their trips in New York City.  
+
'''1. Introduction'''
The primary dataset was released by Limousine Commission and the NYC Taxi.  
+
 
It includes the pickup time, the geo-coordinates, the number of passengers, and many other variables.
+
Since 2008, in May of each year the Indian Premier League is a 20 over cricket match played across the major cricket stadiums in India. The competitive tournament lasts for about 1 month. Players are auctioned months ahead for the 8 to 12 teams which would participate in the mega cricketing event of the year. The auction for the players is a crucial time for each team member and owner. Like most auctions the team owners must select the best member which fit their team. The auction for the widely-viewed event can cost team owners as much as $12-$25 million each year. The project delves into the statistics of the game which would be useful to team owners in season 10 and beyond before the auction. The owners and management would be able to understand the player’s comparative performance among other key stats.  
 +
 
 +
'''2. Data Preparation'''
 +
 
 +
Data Given: ‘Matches’ data: (578 rows) id, season, city, date, team1&2, toss winner, toss_decision, result dl_applied, winner, win_by_runs, win_by_wickets, player_of_match, venue, umpire 1, 2 & 3
 +
                ‘Deliveries’ data: (136599 rows) match_id, inning, batting_team, bowling_team, over, ball, batsman, non_striker, bowler, is_super_over, wide_runs, bye_runs, legbye_runs, noball_runs, penalty_runs, batsman_runs, extra_runs, total_runs, player_dismissed, dismissal_kind and fielder
 +
Data Preparation: The Matches data and the Deliveries data were filtered based on the normal result and combined by the match id. A new data set created which combined the ball by ball result of each match.
 +
 
 +
 
 +
'''3. Model Building'''
 +
 
 +
The model was built in R using the following packages:
 +
ggplot2 (for data visualization), readr, plyr, dplyr, gridExtra, treemap, RColorBrewer, tidyr, radarchart
 +
The dplyr package was used for manipulating the large to long format, group_by and to manipulate other formats.
 +
 
 +
Implementation: The model was wrangled and plotted using ggplot. The Treemap was made using the package and depicts the runs a player made against bowlers in decreasing order. The legends were removed for ease of the viewer to read the data clearly among other things.

Revision as of 15:06, 3 August 2017

1. Introduction

Since 2008, in May of each year the Indian Premier League is a 20 over cricket match played across the major cricket stadiums in India. The competitive tournament lasts for about 1 month. Players are auctioned months ahead for the 8 to 12 teams which would participate in the mega cricketing event of the year. The auction for the players is a crucial time for each team member and owner. Like most auctions the team owners must select the best member which fit their team. The auction for the widely-viewed event can cost team owners as much as $12-$25 million each year. The project delves into the statistics of the game which would be useful to team owners in season 10 and beyond before the auction. The owners and management would be able to understand the player’s comparative performance among other key stats.

2. Data Preparation

Data Given: ‘Matches’ data: (578 rows) id, season, city, date, team1&2, toss winner, toss_decision, result dl_applied, winner, win_by_runs, win_by_wickets, player_of_match, venue, umpire 1, 2 & 3

                ‘Deliveries’ data: (136599 rows) match_id, inning, batting_team, bowling_team, over, ball, batsman, non_striker, bowler, is_super_over, wide_runs, bye_runs, legbye_runs, noball_runs, penalty_runs, batsman_runs, extra_runs, total_runs, player_dismissed, dismissal_kind and fielder

Data Preparation: The Matches data and the Deliveries data were filtered based on the normal result and combined by the match id. A new data set created which combined the ball by ball result of each match.


3. Model Building

The model was built in R using the following packages: ggplot2 (for data visualization), readr, plyr, dplyr, gridExtra, treemap, RColorBrewer, tidyr, radarchart The dplyr package was used for manipulating the large to long format, group_by and to manipulate other formats.

Implementation: The model was wrangled and plotted using ggplot. The Treemap was made using the package and depicts the runs a player made against bowlers in decreasing order. The legends were removed for ease of the viewer to read the data clearly among other things.