Ipl Analytics

Posted at — Jul 10, 2020

Dharamshala

Overview

We’ve all been engulfed by the uncertainty that is COVID-19. It doesn’t seem to be going anywhere anytime soon. Of all the stuff that has come to an indefinite standstill, sporting action, especially cricket is something that we are missing the most.

It is now that time of the year where we would ideally be getting out of a hangover from the IPL. I have been an avid follower of the Indian Premier League from its inception, so much so that I wanted to fill in this void of having no IPL by looking at some of its data points and this exercise led me to build a side project.

Here’s some number crunching from IPL over the years if you’ve been dying for that cricket fix of yours

Analysis so far!!

You can find the notebook here

Batsman Statistics

Most Runs
Most Runs (over)
Most Fours
Most Fours (innings)
Most Sixes
Most Sixes (innings)
Most Fifties
Most Centuries
Hightest Scores (innings)
Best batsman in powerplay
Best batsman in death overs (last 4 overs)
Best Batting Strike Rate
Best Batting Average
Bowler vs Batsman Head to Head Competition
Top 4 Batsman Score comparison

Bowler Statistics

Bowler providing most extras
Most Wickets
Most Maidens
Most Dot Balls
Most Dot Balls (innings)
Most Runs Conceded (Innings)
Best Bowling Economy
Best Bowling Economy (innings)
Best bowler in powerplay
Best bowler in death overs (last 4 overs)
Best Bowling Innings
Bowler Wicket Taking Kind Percentage

Feilder Statistics

Best Fielder

Data Source

CricInfo
Kaggle
CricSheet
IPL T20 Official Website

Data Information

We have two different CSV files that are imported as inputs.

matches.csv
deliveries.csv

Matches

It has a total of 18 features, out of which 12 of them are categorical, 5 of them are numerical and 1 is a date field. It had a few data discrepancies which I am listing below for reference.

Feature	Type	Remarks
city	category	It had duplicate values for Bangalore (Bengaluru) and Chandigarh (Mohali). It also had 7 null values, each of these had venue as Dubai.
team1, team2, toss_winner, winner	category	It had duplicate values for Rising Pune Supergiant (Rising Pune Supergiants) team
winner	category	It had 4 null values; they correspond to matches which had no result.
player_of_match	category	It had 4 null values; they correspond to matches which had no result.
venue	category	It had a lot duplicate values. For example, we had duplicates for Rajiv Gandhi International Stadium, Punjab Cricket Association IS Bindra Stadium, M. A. Chidambaram Stadium, M Chinnaswamy Stadium and Feroz Shah Kotla.
umpire1, umpire2	category	It had 2 null values. I found those values from Cricinfo

I could reduce memory consumption by 30% by typecasting the features to valid datatypes

Deliveries

It has a total of 21 features, out of which 8 of them are categorical and 13 of them are numerical. It had a few data discrepancies which I am listing below for reference.

Feature	Type	Remarks
inning	int64	I have no idea how an innings value be equal to 5. I could not understand this feature completely
batting_team, bowling_team	category	It had duplicate values for Rising Pune Supergiant (Rising Pune Supergiants) team
total_runs	int64	It had a lot of entries where the total_runs was more than 7

I could reduce memory consumption by 22% just by typecasting the features to valid datatypes.

Dhiraj Balakrishnan

Software Engineer