MTTF Reliability, Cricket and Baseball:

bradman last

I originally hail from India, which means that I was eating, drinking and sleeping Cricket at least for a good part of my childhood. Growing up, I used to “get sick” and stay home when the one TV channel that we had broadcasted Cricket matches. One thing I never truly understood then was how the batting average was calculated in Cricket. The formula is straightforward:

Batting average = Total Number of Runs Scored/ Total Number of Outs

Here “out” indicates that the batsman had to stop his play because he was unable to keep his wicket. In Baseball terms, this will be similar to a strike out or a catch where the player has to leave the field. The part that I could not understand was when the Cricket batsman did not get out. The runs he scored was added to the numerator but there was no changes made to the denominator. I could not see this as a true indicator of the player’s batting average.

When I started learning about Reliability Engineering, I finally understood why the batting average calculation was bothering me. The way the batting average in Cricket is calculated is very similar to the MTTF (Mean Time To Failure) calculation. MTTF is calculated as follows:

MTTF = Total time on testing/Number of failures

For a simple example, if we were testing 10 motors for 100 hours and three of them failed at 50, 60 and 70 hours respectively, we can calculate MTTF as 293.33 hours. The problem with this is that the data is a right-censored data. This means that we still have samples where the failure has not occurred and we stopped the testing. This is similar to the case where we do not include the number of innings where the batsman did not get out. A key concept to grasp here is that the MTTF or the MTBF (Mean Time Between Failure) metric is not for a single unit. There is more to this than just saying that on average a motor is going to last 293.33 hours.

When we do reliability calculations, we should be aware whether censored data is being used and use appropriate survival analysis to make a “reliability specific statement” – we can expect that 95% of the motor population will survive x hours. Another good approach is to calculate the lower bound confidence intervals based on the MTBF. A good resource is https://www.itl.nist.gov/div898/handbook/apr/section4/apr451.htm.

Ty Cobb. Don Bradman and Sachin Tendulkar:

We can compare the batting averages in Cricket to Baseball. My understanding is that the batting average in Baseball is calculated as follows:

Batting Average = Number of Hits/Number of Bats

Here the hit can be in the form of singles, home runs etc. Apparently, this statistic was initially brought up by an English statistician Henry Chadwick. Chadwick was a keen Cricket fan.

I want to now look at the greats of Baseball and Cricket, and look at a different approach to their batting capabilities. I have chosen Ty Cobb, Don Bradman and Sachin Tendulkar for my analyses. Ty Cobb has the largest Baseball batting average in American Baseball. Don Bradman, an Australian Cricketer often called the best Cricket player ever, has the largest batting average in Test Cricket. Sachin Tendulkar, an Indian Cricketer and one of the best Cricket players of recent times, has the largest number of runs scored in Test Cricket. The batting averages of the three players are shown below:

averages

As we discussed in the last post regarding calculating reliability with Bayesian approach, we can make reliability statements in place of batting averages. Based on 4191 hits in 11420 bats, we could make a statement that – with 95% confidence Ty Cobb is 36% likely to make a hit in the next bat. We can utilize the batting average concept in Baseball to Cricket. In Cricket, hitting fifty runs is a sign of a good batsman. Bradman has hit fifty or more runs on 56 occasions in 80 innings (70%). Similarly Tendulkar has hit fifty or more runs on 125 occasions in 329 innings (38%).

We could state that with 95% confidence, Bradman was 61% likely to score fifty or more runs in the next inning. Similarly, Sachin was 34% likely to score fifty runs or more in the next inning at 95% confidence level.

Final Words:

As we discussed earlier, similar to MTTF, batting average is not a good estimation for a single inning. It is an attempt for a point estimate for reliability but we need additional information regarding this. This should not be looked at it as a single metric in isolation. We cannot expect that Don Bradman would score 99.94 runs per innings. In fact, in the last very match that Bradman played, all he had to do was score 4 single runs to achieve the immaculate batting average of 100. He had been out only 69 times and he just needed four measly runs to complete 7000 runs and even if he got out on that inning, he would have achieved the spectacular batting average of 100. He was one of the best players ever. His highest score was 334. This is called “triple century” in Cricket, and this is a rare achievement. As indicated earlier, he was 61% likely to have scored fifty runs or more in the next inning. In fact, Bradman had scored more than four runs 69 times in 79 innings.

bradman last

Everyone expected Bradman to cross the 100 mark easily. As fate would have it, Bradman scored zero runs as he was bowled out (the batsman misses and the ball hits the wicket) by the English bowler Eric Hollies, in the second ball he faced. He had hit 635 fours in his career. A four is where the batsman scores four runs by hitting the ball so that it rolls over the boundary of the field. All Bradman needed was one four to achieve the “100”. Bradman proved that to be human is to be fallible. He still remains the best that ever was and his record is far from broken. At this time, the batsman with the second best batting average is 61.87.

Always keep on learning…

In case you missed it, my last post was Reliability/Sample Size Calculation Based on Bayesian Inference:

Advertisements

Reliability/Sample Size Calculation Based on Bayesian Inference:

Bayesian

I have written about sample size calculations many times before. One of the most common questions a statistician is asked is “how many samples do I need – is a sample size of 30 appropriate?” The appropriate answer to such a question is always – “it depends!”

In today’s post, I have attached a spreadsheet that calculates the reliability based on Bayesian Inference. Ideally, one would want to have some confidence that the widgets being produced is x% reliable, or in other words, it is x% probable that the widget would function as intended. There is the ubiquitous 90/90 or 95/95 confidence/reliability sample size table that is used for this purpose.

90-95

In Bayesian Inference, we do not assume that the parameter (the value that we are calculating like Reliability) is fixed. In the non-Bayesian (Frequentist) world, the parameter is assumed to be fixed, and we need to take many samples of data to make an inference regarding the parameter. For example, we may flip a coin 100 times and calculate the number of heads to determine the probability of heads with the coin (if we believe it is a loaded coin). In the non-Bayesian world, we may calculate confidence intervals. The confidence interval does not provide a lot of practical value. My favorite explanation for confidence interval is with the analogy of an archer. Let’s say that the archer shot an arrow and it hit the bulls-eye. We can draw a 3” circle around this and call that as our confidence interval based on the first shot. Now let’s assume that the archer shot 99 more arrows and they all missed the bull-eye. For each shot, we drew a 3” circle around the hit resulting in 100 circles. A 95% confidence interval simply means that 95 of the circles drawn contain the first bulls-eye that we drew. In other words, if we repeated the study a lot of times, 95% of the confidence intervals calculated will contain the true parameter that we are after. This would indicate that the one study we did may or may not contain the true parameter. Compared to this, in the Bayesian world, we calculate the credible interval. This practically means that we can be 95% confident that the parameter is inside the 95% credible interval we calculated.

In the Bayesian world, we can have a prior belief and make an inference based on our prior belief. However, if your prior belief is very conservative, the Bayesian inference might make a slightly liberal inference. Similarly, if your prior belief is very liberal, the inference made will be slightly conservative. As the sample size goes up, impact of this prior belief is minimized. A common method in Bayesian inference is to use the uninformed prior. This means that we are assuming equal likelihood for all the events. For a binomial distribution we can use beta distribution to model our prior belief. We will use (1, 1) to assume the uninformed prior. This is shown below:

uniform prior

For example, if we use 59 widgets as our samples and all of them met the inspection criteria, then we can calculate the 95% lower bound credible interval as 95.13%. This is assuming the (1, 1) beta values. Now let’s say that we are very confident of the process because we have historical data. Now we can assume a stronger prior belief with the beta values as (22,1). The new prior plot is shown below:

22-1 prior

Based on this, if we had 0 rejects for the 59 samples, then the 95% lower bound credible interval is 96.37%. A slightly higher reliability is estimated based on the strong prior.

We can also calculate a very conservative case of (1, 22) where we assume very low reliability to begin with. This is shown below:

1-22 Prior

Now when we have 0 rejects with 59 samples, we are pleasantly surprised because we were expecting our reliability to be around 8-10%. The newly calculated 95% lower bound credible interval is 64.9%.

I have created a spreadsheet that you can play around with. Enter the data in the yellow cells. For a stronger prior (liberal), enter a higher a_prior value. Similarly, for a conservative prior, enter a higher b_prior value. If you are unsure, retain the (1, 1) value to have a uniform prior. The spreadsheet also calculates the maximum expected rejects per million value as well.

You can download the spreadsheet here.

I will finish with my favorite confidence interval joke.

“Excuse me, professor. Why do we always calculate 95% confidence interval and not a 94% or 96% interval?”, asked the student.

“Shut up,” explained the professor.

Always keep on learning…

In case you missed it, my last post was Mismatched Complexity and KISS:

Rules of 3 and 5:

rules of thumb

It has been a while since I have blogged about statistics. So in today’s post, I will be looking at rules of 3 and 5. These are heuristics or rules of thumb that can help us out. They are associated with sample sizes.

Rule of 3:

Let’s assume that you are looking at a binomial event (pass or fail). You took 30 samples and tested them to see how many passes or failures you get. The results yielded no failures. Then, based on the rule of 3, you can state that at 95% confidence level, the upper bound for a failure is 3/30 = 10% or the reliability is at least 90%. The rule is written as;

p = 3/n

Where p is the upper bound of failure, and n is the sample size.

Thus, if you used 300 samples, then you could state with 95% confidence that the process is at least 99% reliable based on p = 3/300 = 1%. Another way to express this is to say that with 95% confidence fewer than 1 in 100 units will fail under the same conditions.

This rule can be derived from using binomial distribution. The 95% confidence comes from the alpha value of 0.05. The calculated value from the rule of three formula gets more accurate with a sample size of 20 or more.

Rule of 5:

I came across the rule of 5 from Douglas Hubbard’s informative book “How to Measure Anything” [1]. Hubbard states the Rule of 5 as;

There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.

This is a really neat heuristic because you can actually tell a lot from a sample size of 5! The median is the 50th percentile value of a population, the point where half of the population is above it and half of the population is below it. Hubbard points out the probability of picking a value above or below the median is 50% – the same as a coin toss. Thus, we can calculate that the probability of getting 5 heads in a row is 0.5^5 or 3.125%. This would be the same for getting 5 tails in a row. Then the probability of not getting all heads or all tails is (100 – (3.125+3.125)) or 93.75%. Thus, we can state that the chance of one value out of five being above the median and at least one value below the median is 93.75%.

Final words:

The reader has to keep in mind that both of the rules require the use of randomly selected samples. The Rule of 3 is a version of Bayes’ Success Run Theorem and Wilk’s One-sided Tolerance calculation. I invite the reader to check out my posts that sheds more light on this 1) Relationship between AQL/RQL and Reliability/Confidence , 2) Reliability/Confidence Level Calculator (with c = 0, 1….., n) and 3) Wilk’s One-sided Tolerance Spreadsheet.

When we are utilizing random samples to represent a population, we are calculating a statistic – a representation value of the parameter value. A statistic is an estimate of the parameter, the true value from a population. The higher the sample size used, the better the statistic can represent the parameter and better your estimation.

I will finish with a story based on chance and probability;

It was the finals and an undergraduate psychology major was totally hung over from the previous night. He was somewhat relieved to find that the exam was a true/false test. He had taken a basic stat course and did remember his professor once performing a coin flipping experiment. On a moment of clarity, he decided to flip a coin he had in his pocket to get the answers for each questions. The psychology professor watched the student the entire two hours as he was flipping the coin…writing the answer…flipping the coin….writing the answer, on and on. At the end of the two hours, everyone else had left the room except for this one student. The professor walks up to his desk and angrily interrupts the student, saying: “Listen, it is obvious that you did not study for this exam since you didn’t even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?”

The stunned student looks up at the professor and replies bitterly (as he is still flipping the coin): “Shhh! I am checking my answers!”

Always keep on learning…

In case you missed it, my last post was Kenjutsu, Ohno and Polanyi:

[1] How to Measure Anything.

Cpk/Ppk and Percent Conforming:

cap

It has been a while since I have posted about Quality Statistics. In today’s post, I will talk about how process capability is connected to percent conforming.

In this post, I will be using Cpk and assuming normality for the sake of simplicity. Please bear in mind that there are multiple ways to calculate process capability, and that not all distributions are normal in nature. The two assumptions help me in explaining this better.

What is Cpk?

The process capability index Cpk is a one shot number that gives you an idea of the capability of the process to center around the nominal specification. It also tells you how much percent conforming product is the process producing. Please note that I am not discussing Cp index in this post.

Cpk is determined as the lower of two values. To simplify, let’s call them Cpklower and Cpkupper.

Cpklower = (Process Mean – LSL)/3* s

Cpkupper = (USL – Process Mean)/ 3* s

Where USL is the Upper Specification Limit,

LSL is the Lower Specification Limit, and

s is an estimate of the Population Standard Deviation.

Cpk = minimum (Cpklower, Cpkupper)

The “k” in Cpk stands for “Process Location Ratio” and is dimensionless. It is defined as;

k = abs(Specification Mean – Process Mean)/((USL-LSL)/2)

Where Specification Mean is the nominal specification.

Interestingly when k = 0, Cpk = Cp. This happens when the process is perfectly centered. An additional thing to note is also that Cpk ≈ Ppk when the process is perfectly centered.

You can easily use Ppk in place of Cpk for the above equations. The only difference between Ppk and Cpk is the way we calculate the estimate for the standard deviation.

But What Does Cpk Tell Us?

If we can assume normality, we can easily convert the Cpk value to a Z value. This allows one to calculate the percentage falling inside the specification limits using normal distribution tables. We can easily do this in Excel.

Cpk can be converted to the Z value by simply multiplying it by 3.

Z = 3 * Cpk

In Excel, the Estimated % Non-conforming can be calculated as =NORMSDIST(-Z)

It does get a little tricky, if the process is not centered or if you are looking at a one-sided specification. The table below should come in handy.

z table

The Estimated % Conforming can be easily calculated as 1 – Estimated % Non-conforming.

The % Conforming is very similar to a tolerance interval calculation. The tolerance interval calculation allows us to make a statement like “we can expect x% of the population to be between two tolerance values at y% confidence level.” However, we cannot make such a statement with just a Cpk calculation. To make such a statement, we will need to calculate the RQL (Rejectable Quality Level) by creating an OC curve. Unfortunately, this is not straightforward, and requires methods like non-central t-distribution. I highly recommend Dr. Taylor’s Distribution Analyzer for this.

What about Confidence Interval?

I am proposing that we can calculate the confidence interval for the Cpk value and thus, for the Estimated % Non-conforming. It is recommended that we use the lower bound confidence interval for this. Before I proceed, I should explain what confidence interval means. It is not technically correct that the population parameter value (e.g. height of kids between ages 10 and 15) is between the two confidence interval bounds. We cannot technically say that at 95% confidence level, the mean height of the population is between X and Y for kids between ages 10 and 15.

Using the mean height as an example, the confidence interval just means that if we keep taking samples from the population, and keep calculating the estimate for mean height, the calculated confidence interval for each of those sample would contain the true mean height, 95% of the time (if we used a 95% confidence level).

We can calculate the lower bound for Cpk at a preferred confidence level, say 95%. We can then convert this to the Z-value and find the estimated % conforming at 95% confidence level. We can then make a statement similar to the tolerance interval.

A Cpk value of 2.00 with a sample size of 12 may not mean much. The calculated Cpk is only an estimate of the true Cpk of the population. Thus like any other parameter (mean, variance etc.), you need a larger sample size to make a better estimate. The use of confidence interval helps us in this regard since it penalizes for lack of sample size.

An Example:

The Quality Engineer at a Medical Device company is performing a capability study on seal strength on pouches. The LSL is 1.1 lbf/in. He used 30 as the sample size, and found that the sample mean was 1.87 lbf/in, and the sample standard deviation was 0.24.

Let’s apply what we have discussed here so far.

LSL = 1.1

Process Mean = 1.87

Process sigma = 0.24

From this we can calculate the Ppk as 1.07. The Quality Engineer calculated Ppk since this was a new process.

Ppk = (Process Mean – LSL) /3 * Process Sigma

Z = Ppk * 3 = 3.21

Estimated % Non-conforming = NORMSDIST(-Z) = 0.000663675 = 0.07%

Note: Since we are using a unilateral specification, we do not need to double the % non-conforming to capture both sides of the bell curve.

Estimated % Conforming = 1 – Estimated % Non-conforming = 99.93363251%

We can calculate the Ppk lower bound at a 95% confidence level for a sample size = 30. You can use the spreadsheet at the end of this post to do this calculation.

Ppk Lower bound at 95% confidence level = 0.817

Lower bound Z = Ppk_lower_bound x 3 = 2.451

Lower bound (95%) % Non-conforming = NORMSDIST(-Lower_bound_Z) = 0.007122998 = 0.71%

Lower bound (95%) % Conforming = 99.28770023% =99.29%

In effect (all things considered), we can state that with 95% confidence at least 99.29% of the values are in spec. Or we can correctly state that the 95% confidence lower bound for % in spec is 99.29%.

You can download the spreadsheet here. Please note that this post is based on my personal view on the matter. Please use it with caution. I have used normal distribution to calculate the Ppk and the lower bound for Ppk. I welcome your thoughts and comments.

Always keep on learning…

In case you missed it, my last post was Want to Increase Productivity at Your Plant? Read This.

Relationship between AQL/RQL and Reliability/Confidence:

Untitled2

The Z1.4 AQL sampling plan tables do not translate to reliability/confidence level values. In fact, the Z1.4 tables do not translate to %quality values at 95% confidence level as well. This seems to be a general misconception regarding the Z1.4 tables.  One cannot state that if the sampling plan criteria are met, the % non-conforming equates to the AQL value at 95% confidence level.

How can we define AQL in layman’s terms? Looking at the figure below, one can simply state that AQL is the % nonconforming value at which there is (1-α)% chance that the product will be accepted by the customer. Please note this does not mean that the product quality equals the AQL value.

Untitled

Similar to the AQL value, we can also define the RQL value based on the picture above. RQL is the %nonconforming value at which there is β% chance that the product will be accepted by the customer.

The RQL value corresponding to the beta value is much more important than the AQL value. The RQL value has a direct relationship with the Reliability/Confidence values.

The relationship between β and RQL is shown below, based on the Binomial equation.

rql

Where n = sample size, and x = number of rejects.

When x = 0, the above equation becomes;

eqn2

Taking logarithms, the above equation can be converted as;

eqn3

Interestingly, this equation is comparable to the Success Run Theorem equation;

eqn4

Where C is the confidence level, and R is the reliability(%).

The Reliability value(%) is (1-RQL)% value at the desired β value.

The Reliability value(%) is (1-RQL)% value at the desired β value. The confidence level value translates to the β value, as shown in the equation above.

I have created a Shiny App through R-studio where the reader can play around with this. This web based app will create OC-curve, and provide values for AQL, RQL, and reliability values based on sample size and number of rejects.

https://harishjose.shinyapps.io/OCR1

I encourage the reader to check out the above link.

Keep on learning…

Evolution of Hypothesis Testing

hypo

This is the second post in the series of “Let’s not hypothesize.” The first post is available here.

This post is written to have a brief look at how the Hypothesis testing seen in most Statistics texts came into being.

My main sources of information are;

1) The empire of Chance

2) The lady tasting tea, and

3) Explorations in statistics: hypothesis tests and P values

I have the evolution separated into three phases.

1) Pre-Fisher:

The Explorations in statistics: hypothesis tests and P values provides a date of 1279 as the origin of Hypothesis testing. The Royal Mint from London used a sample of coins made from each run of the mint which were compared against a known set of standards. I welcome the reader to click on the third reference given above to read this in more detail.

gosset

The article also speaks about William Sealy Gosset (Student) and his t-test method. What struck me most was the description of Gosset explaining the significance of a drug in terms of an odds ratio. This was well before the advent of p-values to determine significance of the data.

First let us see what is the probability that [drug A] will on the average give increase of sleep. [Looking up the ratio of the sample mean to the sample standard deviation] in the table for ten experiments we find by interpolating. . .the odds are .887 to .113 that the mean is positive. That is about 8 to 1 and would correspond to the normal curve to about 1.8 times the probable error. It is then very likely that [drug A] gives an increase of sleep, but would occasion no surprise if the results were reversed by further experiments.

2) Sir Ronald Fisher:

200px-R._A._Fischer

It was Sir Ronald Fisher who clearly came up with the idea of a null hypothesis (H0) and the use of a conditional probability p-value to make a decision based on the data found. He termed this as “Significance Testing”. The main distinction here from the texts today, is that Fisher only used Null or Nil Hypothesis. He did not find value in the alternate hypothesis. His thought process was that if the p-value was less than a cut-off point (let’s say .05), this would indicate that either this was due to a very rare event or that the null hypothesis model was wrong. More than likely, it is highly probable that the null hypothesis model was wrong. Fisher did not see a need for an alternate hypothesis nor the need for repeating tests to see how powerful the test was.His method is based on Inductive Inference.

Fisher never also meant to use only .05 as the cut-off value. He viewed p-values as inductive evidence against the null hypothesis.

If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent. point), or one in a hundred (the 1 per cent. point). Personally, the writer prefers to set a low standard of significance at the 5 per cent. point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.

3) Neyman-Pearson Hypothesis Testing:

Jerzy_Neyman2Pearson_Egon_3

The books “Lady tasting tea” and “The empire of chance” go into detail about the “feud” between the great minds Fisher, and Neyman/Pearson.

It was Neyman and Pearson who came up with idea of using an alternate hypothesis (H1) and testing it against the null hypothesis. Additionally, they also created the idea of the power of a test, and introduced the ideas of type I and type II errors. They termed their version as Hypothesis testing.Their version is based on inductive behavior.

They defined alpha, beta and power as follows.

alpha = P(reject H0|H0 is true)

beta = P(fail to reject H0|H0 is false)

power = 1 – beta

Where we are now:

What we use and learn these days is a combined method of Fisher and Neyman/Pearson. The textbook method is generally as follows;

1) define null and alternate hypotheses.

2) set an alpha value of .05, and power value of .80 before the experiment.

3) calculate test statistic and p-value based on the data collected.

4) Reject or retain (fail to reject) null hypothesis based on the p-value.

Critiques of this combined method claim that the combined method utilizes the worst of the two methods. They emphasize the focus on effect size, and the use of confidence intervals to provide better view of the problem at hand, rather than blindly relying on the p-value alone.

Keep on learning…

Reliability/Confidence Level Calculator (with c = 0, 1….., n)

rel

The reliability/Confidence level sample size calculation is fairly known to Quality Engineers. For example, with 59 samples and 0 rejects, one can be 95% confident that the process is at least 95% reliable or that the process yields at least 95% conforming product.

I have created a spreadsheet “calculator”, that allows the user to enter the sample size, number of rejects and the desired confidence level, and the calculator will provide the reliability result.

It is interesting to note that the reliability/confidence calculation, LTPD calculation and Wilk’s first degree non-parametric one sided tolerance calculation all yield the same results.

I will post another day about LTPD versus AQL.

The spreadsheet is available here Reliability calculator based on Binomial distribution.

Keep on learning…

Wilk’s One Sided Tolerance spreadsheet for download

wilks

I have created a spreadsheet that allows the user to calculate the number of samples needed for a desired one-sided tolerance interval at a desired confidence level. Additionally, the user can also enter the desired order for the sample size.

For example, if you have 93 samples, you can be 95% confident that 95% of the population are above the 2nd lowest value samples. Alternatively, you can also state that 95% of the population is below the 2nd highest value of the samples.

Here is an example of this in use.

If there is an interest, I can also try creating a two sided tolerance interval spreadsheet as well.

The keen student might notice that the formula is identical to the Bayes Success Run Theorem when the order p =1.

The spreadsheet is available for download here. Wilks one sided

Keep on learning…