Group24 Proposal

From Visual Analytics and Applications
Jump to navigation Jump to search
IPL.png


ISSS608 Visual Analytics and Applications
IPL Analytics through Data Visualisations

Proposal

Poster

Application

Report


Introduction

Cricket, the second most popular sport in the world, is a bat and ball game played between two teams, 11 players each, on a field which has a rectangular 22-yard-long pitch in the center. The game is played by 120 million players worldwide and the purpose of the game is to score more runs than your opposing team.
The Indian Premier League (IPL), is a professional Twenty20 Cricket League in India contested during April and May of every year by teams representing Indian cities and some states. The league was founded by the Board of Control for Cricket in India (BCCI) in 2008.
There have been eleven seasons of the IPL tournament. The current IPL title holders are the Chennai Super Kings, who won the 2018 season. The most successful franchises in the tournament are the Chennai Super Kings and Mumbai Indians with 3 tournament wins each.

Motivation

For the love of Cricket, we want to find the best possible combinations which team owners can be supported with to pick the right players for the upcoming IPL seasons as they invest millions in the endeavor to win the trophy each year. As avid fans, we want to also build our knowledge of the game and apply the R skills learnt at school in a project that enriches our knowledge of the sport, helps us look at subjects of our interest with a statistical lens, and showcase our skillset on a large canvas.

Objective

This project is about exploring the statistics of historical IPL data of 8 seasons (2008-16) to understand patterns in individual player performance, team strengths & weaknesses, time series analysis to find the Most Valuable Players (MVPs), and thus suggest the best IPL team for each season, amongst others.

Dataset and Description

There are two datasets provided to us that covers 8 seasons of IPL from year 2008 to 2016. The first dataset, deliveries.csv gives us information about each delivery being bowled across all the 8 seasons where as the second dataset i.e. matches.csv gives us general information about the match that is being played between the two teams. The 8 important variables in the raw dataset have been summarized in the table below:

Variable Name Description
batting_team The team name which is currently batting.
bowling_team The team name which is currently bowling.
over Describe the current over number.
ball Describe the current bowl no of the current over.
batsman Name of the batsman on striking end.
non_striker Name of the batsman on non-striking end.
bowler Name of the bowler
inning Tells which set of batting was going . 1: First Innings 2: Second Innings.

Project Flow

Cricviz6.jpg

Tools and Packages

We will be using R programming language to implement our visualisation. To be more precise, some relevant packages like tidyverse,plotly ggplot2 and dplyr will also come in handy. For data preparation, cleaning and exploration we will use JMP and Tableau.

Link to Dataset

Click here to view our dataset