I originally hail from India, which means that I was eating, drinking and sleeping Cricket at least for a good part of my childhood. Growing up, I used to “get sick” and stay home when the one TV channel that we had broadcasted Cricket matches. One thing I never truly understood then was how the batting average was calculated in Cricket. The formula is straightforward:
Batting average = Total Number of Runs Scored/ Total Number of Outs
Here “out” indicates that the batsman had to stop his play because he was unable to keep his wicket. In Baseball terms, this will be similar to a strike out or a catch where the player has to leave the field. The part that I could not understand was when the Cricket batsman did not get out. The runs he scored was added to the numerator but there was no changes made to the denominator. I could not see this as a true indicator of the player’s batting average.
When I started learning about Reliability Engineering, I finally understood why the batting average calculation was bothering me. The way the batting average in Cricket is calculated is very similar to the MTTF (Mean Time To Failure) calculation. MTTF is calculated as follows:
MTTF = Total time on testing/Number of failures
For a simple example, if we were testing 10 motors for 100 hours and three of them failed at 50, 60 and 70 hours respectively, we can calculate MTTF as 293.33 hours. The problem with this is that the data is a right-censored data. This means that we still have samples where the failure has not occurred and we stopped the testing. This is similar to the case where we do not include the number of innings where the batsman did not get out. A key concept to grasp here is that the MTTF or the MTBF (Mean Time Between Failure) metric is not for a single unit. There is more to this than just saying that on average a motor is going to last 293.33 hours.
When we do reliability calculations, we should be aware whether censored data is being used and use appropriate survival analysis to make a “reliability specific statement” – we can expect that 95% of the motor population will survive x hours. Another good approach is to calculate the lower bound confidence intervals based on the MTBF. A good resource is https://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm.
Ty Cobb. Don Bradman and Sachin Tendulkar:
We can compare the batting averages in Cricket to Baseball. My understanding is that the batting average in Baseball is calculated as follows:
Batting Average = Number of Hits/Number of Bats
Here the hit can be in the form of singles, home runs etc. Apparently, this statistic was initially brought up by an English statistician Henry Chadwick. Chadwick was a keen Cricket fan.
I want to now look at the greats of Baseball and Cricket, and look at a different approach to their batting capabilities. I have chosen Ty Cobb, Don Bradman and Sachin Tendulkar for my analyses. Ty Cobb has the largest Baseball batting average in American Baseball. Don Bradman, an Australian Cricketer often called the best Cricket player ever, has the largest batting average in Test Cricket. Sachin Tendulkar, an Indian Cricketer and one of the best Cricket players of recent times, has the largest number of runs scored in Test Cricket. The batting averages of the three players are shown below:
As we discussed in the last post regarding calculating reliability with Bayesian approach, we can make reliability statements in place of batting averages. Based on 4191 hits in 11420 bats, we could make a statement that – with 95% confidence Ty Cobb is 36% likely to make a hit in the next bat. We can utilize the batting average concept in Baseball to Cricket. In Cricket, hitting fifty runs is a sign of a good batsman. Bradman has hit fifty or more runs on 56 occasions in 80 innings (70%). Similarly Tendulkar has hit fifty or more runs on 125 occasions in 329 innings (38%).
We could state that with 95% confidence, Bradman was 61% likely to score fifty or more runs in the next inning. Similarly, Sachin was 34% likely to score fifty runs or more in the next inning at 95% confidence level.
As we discussed earlier, similar to MTTF, batting average is not a good estimation for a single inning. It is an attempt for a point estimate for reliability but we need additional information regarding this. This should not be looked at it as a single metric in isolation. We cannot expect that Don Bradman would score 99.94 runs per innings. In fact, in the last very match that Bradman played, all he had to do was score 4 single runs to achieve the immaculate batting average of 100. He had been out only 69 times and he just needed four measly runs to complete 7000 runs and even if he got out on that inning, he would have achieved the spectacular batting average of 100. He was one of the best players ever. His highest score was 334. This is called “triple century” in Cricket, and this is a rare achievement. As indicated earlier, he was 61% likely to have scored fifty runs or more in the next inning. In fact, Bradman had scored more than four runs 69 times in 79 innings.
Everyone expected Bradman to cross the 100 mark easily. As fate would have it, Bradman scored zero runs as he was bowled out (the batsman misses and the ball hits the wicket) by the English bowler Eric Hollies, in the second ball he faced. He had hit 635 fours in his career. A four is where the batsman scores four runs by hitting the ball so that it rolls over the boundary of the field. All Bradman needed was one four to achieve the “100”. Bradman proved that to be human is to be fallible. He still remains the best that ever was and his record is far from broken. At this time, the batsman with the second best batting average is 61.87.
Always keep on learning…
In case you missed it, my last post was Reliability/Sample Size Calculation Based on Bayesian Inference: