Dhiraj Balakrishnan

Software Engineer

Ipl Analytics

Posted at — Jul 10, 2020

Dharamshala

Overview

We’ve all been engulfed by the uncertainty that is COVID-19. It doesn’t seem to be going anywhere anytime soon. Of all the stuff that has come to an indefinite standstill, sporting action, especially cricket is something that we are missing the most.

It is now that time of the year where we would ideally be getting out of a hangover from the IPL. I have been an avid follower of the Indian Premier League from its inception, so much so that I wanted to fill in this void of having no IPL by looking at some of its data points and this exercise led me to build a side project.

Here’s some number crunching from IPL over the years if you’ve been dying for that cricket fix of yours

Analysis so far!!

You can find the notebook here

Batsman Statistics

Bowler Statistics

Feilder Statistics

Data Source

Data Information

We have two different CSV files that are imported as inputs.

Matches

It has a total of 18 features, out of which 12 of them are categorical, 5 of them are numerical and 1 is a date field. It had a few data discrepancies which I am listing below for reference.

Feature Type Remarks
city category It had duplicate values for Bangalore (Bengaluru) and Chandigarh (Mohali). It also had 7 null values, each of these had venue as Dubai.
team1, team2, toss_winner, winner category It had duplicate values for Rising Pune Supergiant (Rising Pune Supergiants) team
winner category It had 4 null values; they correspond to matches which had no result.
player_of_match category It had 4 null values; they correspond to matches which had no result.
venue category It had a lot duplicate values. For example, we had duplicates for Rajiv Gandhi International Stadium, Punjab Cricket Association IS Bindra Stadium, M. A. Chidambaram Stadium, M Chinnaswamy Stadium and Feroz Shah Kotla.
umpire1, umpire2 category It had 2 null values. I found those values from Cricinfo

I could reduce memory consumption by 30% by typecasting the features to valid datatypes

Deliveries

It has a total of 21 features, out of which 8 of them are categorical and 13 of them are numerical. It had a few data discrepancies which I am listing below for reference.

Feature Type Remarks
inning int64 I have no idea how an innings value be equal to 5. I could not understand this feature completely
batting_team, bowling_team category It had duplicate values for Rising Pune Supergiant (Rising Pune Supergiants) team
total_runs int64 It had a lot of entries where the total_runs was more than 7

I could reduce memory consumption by 22% just by typecasting the features to valid datatypes.

Contribution

PRs on new Analytics, Bug fixes are welcome. Feel free to open an issue.

License

CC

Disclaimer

This tool is only intended for personal use and is a simple demonstration.