Rules of 3 and 5:

rules of thumb

It has been a while since I have blogged about statistics. So in today’s post, I will be looking at rules of 3 and 5. These are heuristics or rules of thumb that can help us out. They are associated with sample sizes.

Rule of 3:

Let’s assume that you are looking at a binomial event (pass or fail). You took 30 samples and tested them to see how many passes or failures you get. The results yielded no failures. Then, based on the rule of 3, you can state that at 95% confidence level, the upper bound for a failure is 3/30 = 10% or the reliability is at least 90%. The rule is written as;

p = 3/n

Where p is the upper bound of failure, and n is the sample size.

Thus, if you used 300 samples, then you could state with 95% confidence that the process is at least 99% reliable based on p = 3/300 = 1%. Another way to express this is to say that with 95% confidence fewer than 1 in 100 units will fail under the same conditions.

This rule can be derived from using binomial distribution. The 95% confidence comes from the alpha value of 0.05. The calculated value from the rule of three formula gets more accurate with a sample size of 20 or more.

Rule of 5:

I came across the rule of 5 from Douglas Hubbard’s informative book “How to Measure Anything” [1]. Hubbard states the Rule of 5 as;

There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.

This is a really neat heuristic because you can actually tell a lot from a sample size of 5! The median is the 50th percentile value of a population, the point where half of the population is above it and half of the population is below it. Hubbard points out the probability of picking a value above or below the median is 50% – the same as a coin toss. Thus, we can calculate that the probability of getting 5 heads in a row is 0.5^5 or 3.125%. This would be the same for getting 5 tails in a row. Then the probability of not getting all heads or all tails is (100 – (3.125+3.125)) or 93.75%. Thus, we can state that the chance of one value out of five being above the median and at least one value below the median is 93.75%.

Final words:

The reader has to keep in mind that both of the rules require the use of randomly selected samples. The Rule of 3 is a version of Bayes’ Success Run Theorem and Wilk’s One-sided Tolerance calculation. I invite the reader to check out my posts that sheds more light on this 1) Relationship between AQL/RQL and Reliability/Confidence , 2) Reliability/Confidence Level Calculator (with c = 0, 1….., n) and 3) Wilk’s One-sided Tolerance Spreadsheet.

When we are utilizing random samples to represent a population, we are calculating a statistic – a representation value of the parameter value. A statistic is an estimate of the parameter, the true value from a population. The higher the sample size used, the better the statistic can represent the parameter and better your estimation.

I will finish with a story based on chance and probability;

It was the finals and an undergraduate psychology major was totally hung over from the previous night. He was somewhat relieved to find that the exam was a true/false test. He had taken a basic stat course and did remember his professor once performing a coin flipping experiment. On a moment of clarity, he decided to flip a coin he had in his pocket to get the answers for each questions. The psychology professor watched the student the entire two hours as he was flipping the coin…writing the answer…flipping the coin….writing the answer, on and on. At the end of the two hours, everyone else had left the room except for this one student. The professor walks up to his desk and angrily interrupts the student, saying: “Listen, it is obvious that you did not study for this exam since you didn’t even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?”

The stunned student looks up at the professor and replies bitterly (as he is still flipping the coin): “Shhh! I am checking my answers!”

Always keep on learning…

In case you missed it, my last post was Kenjutsu, Ohno and Polanyi:

[1] How to Measure Anything.


Cpk/Ppk and Percent Conforming:


It has been a while since I have posted about Quality Statistics. In today’s post, I will talk about how process capability is connected to percent conforming.

In this post, I will be using Cpk and assuming normality for the sake of simplicity. Please bear in mind that there are multiple ways to calculate process capability, and that not all distributions are normal in nature. The two assumptions help me in explaining this better.

What is Cpk?

The process capability index Cpk is a one shot number that gives you an idea of the capability of the process to center around the nominal specification. It also tells you how much percent conforming product is the process producing. Please note that I am not discussing Cp index in this post.

Cpk is determined as the lower of two values. To simplify, let’s call them Cpklower and Cpkupper.

Cpklower = (Process Mean – LSL)/3* s

Cpkupper = (USL – Process Mean)/ 3* s

Where USL is the Upper Specification Limit,

LSL is the Lower Specification Limit, and

s is an estimate of the Population Standard Deviation.

Cpk = minimum (Cpklower, Cpkupper)

The “k” in Cpk stands for “Process Location Ratio” and is dimensionless. It is defined as;

k = abs(Specification Mean – Process Mean)/((USL-LSL)/2)

Where Specification Mean is the nominal specification.

Interestingly when k = 0, Cpk = Cp. This happens when the process is perfectly centered. An additional thing to note is also that Cpk ≈ Ppk when the process is perfectly centered.

You can easily use Ppk in place of Cpk for the above equations. The only difference between Ppk and Cpk is the way we calculate the estimate for the standard deviation.

But What Does Cpk Tell Us?

If we can assume normality, we can easily convert the Cpk value to a Z value. This allows one to calculate the percentage falling inside the specification limits using normal distribution tables. We can easily do this in Excel.

Cpk can be converted to the Z value by simply multiplying it by 3.

Z = 3 * Cpk

In Excel, the Estimated % Non-conforming can be calculated as =NORMSDIST(-Z)

It does get a little tricky, if the process is not centered or if you are looking at a one-sided specification. The table below should come in handy.

z table

The Estimated % Conforming can be easily calculated as 1 – Estimated % Non-conforming.

The % Conforming is very similar to a tolerance interval calculation. The tolerance interval calculation allows us to make a statement like “we can expect x% of the population to be between two tolerance values at y% confidence level.” However, we cannot make such a statement with just a Cpk calculation. To make such a statement, we will need to calculate the RQL (Rejectable Quality Level) by creating an OC curve. Unfortunately, this is not straightforward, and requires methods like non-central t-distribution. I highly recommend Dr. Taylor’s Distribution Analyzer for this.

What about Confidence Interval?

I am proposing that we can calculate the confidence interval for the Cpk value and thus, for the Estimated % Non-conforming. It is recommended that we use the lower bound confidence interval for this. Before I proceed, I should explain what confidence interval means. It is not technically correct that the population parameter value (e.g. height of kids between ages 10 and 15) is between the two confidence interval bounds. We cannot technically say that at 95% confidence level, the mean height of the population is between X and Y for kids between ages 10 and 15.

Using the mean height as an example, the confidence interval just means that if we keep taking samples from the population, and keep calculating the estimate for mean height, the calculated confidence interval for each of those sample would contain the true mean height, 95% of the time (if we used a 95% confidence level).

We can calculate the lower bound for Cpk at a preferred confidence level, say 95%. We can then convert this to the Z-value and find the estimated % conforming at 95% confidence level. We can then make a statement similar to the tolerance interval.

A Cpk value of 2.00 with a sample size of 12 may not mean much. The calculated Cpk is only an estimate of the true Cpk of the population. Thus like any other parameter (mean, variance etc.), you need a larger sample size to make a better estimate. The use of confidence interval helps us in this regard since it penalizes for lack of sample size.

An Example:

The Quality Engineer at a Medical Device company is performing a capability study on seal strength on pouches. The LSL is 1.1 lbf/in. He used 30 as the sample size, and found that the sample mean was 1.87 lbf/in, and the sample standard deviation was 0.24.

Let’s apply what we have discussed here so far.

LSL = 1.1

Process Mean = 1.87

Process sigma = 0.24

From this we can calculate the Ppk as 1.07. The Quality Engineer calculated Ppk since this was a new process.

Ppk = (Process Mean – LSL) /3 * Process Sigma

Z = Ppk * 3 = 3.21

Estimated % Non-conforming = NORMSDIST(-Z) = 0.000663675 = 0.07%

Note: Since we are using a unilateral specification, we do not need to double the % non-conforming to capture both sides of the bell curve.

Estimated % Conforming = 1 – Estimated % Non-conforming = 99.93363251%

We can calculate the Ppk lower bound at a 95% confidence level for a sample size = 30. You can use the spreadsheet at the end of this post to do this calculation.

Ppk Lower bound at 95% confidence level = 0.817

Lower bound Z = Ppk_lower_bound x 3 = 2.451

Lower bound (95%) % Non-conforming = NORMSDIST(-Lower_bound_Z) = 0.007122998 = 0.71%

Lower bound (95%) % Conforming = 99.28770023% =99.29%

In effect (all things considered), we can state that with 95% confidence at least 99.29% of the values are in spec. Or we can correctly state that the 95% confidence lower bound for % in spec is 99.29%.

You can download the spreadsheet here. Please note that this post is based on my personal view on the matter. Please use it with caution. I have used normal distribution to calculate the Ppk and the lower bound for Ppk. I welcome your thoughts and comments.

Always keep on learning…

In case you missed it, my last post was Want to Increase Productivity at Your Plant? Read This.

Relationship between AQL/RQL and Reliability/Confidence:


The Z1.4 AQL sampling plan tables do not translate to reliability/confidence level values. In fact, the Z1.4 tables do not translate to %quality values at 95% confidence level as well. This seems to be a general misconception regarding the Z1.4 tables.  One cannot state that if the sampling plan criteria are met, the % non-conforming equates to the AQL value at 95% confidence level.

How can we define AQL in layman’s terms? Looking at the figure below, one can simply state that AQL is the % nonconforming value at which there is (1-α)% chance that the product will be accepted by the customer. Please note this does not mean that the product quality equals the AQL value.


Similar to the AQL value, we can also define the RQL value based on the picture above. RQL is the %nonconforming value at which there is β% chance that the product will be accepted by the customer.

The RQL value corresponding to the beta value is much more important than the AQL value. The RQL value has a direct relationship with the Reliability/Confidence values.

The relationship between β and RQL is shown below, based on the Binomial equation.


Where n = sample size, and x = number of rejects.

When x = 0, the above equation becomes;


Taking logarithms, the above equation can be converted as;


Interestingly, this equation is comparable to the Success Run Theorem equation;


Where C is the confidence level, and R is the reliability(%).

The Reliability value(%) is (1-RQL)% value at the desired β value.

The Reliability value(%) is (1-RQL)% value at the desired β value. The confidence level value translates to the β value, as shown in the equation above.

I have created a Shiny App through R-studio where the reader can play around with this. This web based app will create OC-curve, and provide values for AQL, RQL, and reliability values based on sample size and number of rejects.

I encourage the reader to check out the above link.

Keep on learning…

Evolution of Hypothesis Testing


This is the second post in the series of “Let’s not hypothesize.” The first post is available here.

This post is written to have a brief look at how the Hypothesis testing seen in most Statistics texts came into being.

My main sources of information are;

1) The empire of Chance

2) The lady tasting tea, and

3) Explorations in statistics: hypothesis tests and P values

I have the evolution separated into three phases.

1) Pre-Fisher:

The Explorations in statistics: hypothesis tests and P values provides a date of 1279 as the origin of Hypothesis testing. The Royal Mint from London used a sample of coins made from each run of the mint which were compared against a known set of standards. I welcome the reader to click on the third reference given above to read this in more detail.


The article also speaks about William Sealy Gosset (Student) and his t-test method. What struck me most was the description of Gosset explaining the significance of a drug in terms of an odds ratio. This was well before the advent of p-values to determine significance of the data.

First let us see what is the probability that [drug A] will on the average give increase of sleep. [Looking up the ratio of the sample mean to the sample standard deviation] in the table for ten experiments we find by interpolating. . .the odds are .887 to .113 that the mean is positive. That is about 8 to 1 and would correspond to the normal curve to about 1.8 times the probable error. It is then very likely that [drug A] gives an increase of sleep, but would occasion no surprise if the results were reversed by further experiments.

2) Sir Ronald Fisher:


It was Sir Ronald Fisher who clearly came up with the idea of a null hypothesis (H0) and the use of a conditional probability p-value to make a decision based on the data found. He termed this as “Significance Testing”. The main distinction here from the texts today, is that Fisher only used Null or Nil Hypothesis. He did not find value in the alternate hypothesis. His thought process was that if the p-value was less than a cut-off point (let’s say .05), this would indicate that either this was due to a very rare event or that the null hypothesis model was wrong. More than likely, it is highly probable that the null hypothesis model was wrong. Fisher did not see a need for an alternate hypothesis nor the need for repeating tests to see how powerful the test was.His method is based on Inductive Inference.

Fisher never also meant to use only .05 as the cut-off value. He viewed p-values as inductive evidence against the null hypothesis.

If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent. point), or one in a hundred (the 1 per cent. point). Personally, the writer prefers to set a low standard of significance at the 5 per cent. point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.

3) Neyman-Pearson Hypothesis Testing:


The books “Lady tasting tea” and “The empire of chance” go into detail about the “feud” between the great minds Fisher, and Neyman/Pearson.

It was Neyman and Pearson who came up with idea of using an alternate hypothesis (H1) and testing it against the null hypothesis. Additionally, they also created the idea of the power of a test, and introduced the ideas of type I and type II errors. They termed their version as Hypothesis testing.Their version is based on inductive behavior.

They defined alpha, beta and power as follows.

alpha = P(reject H0|H0 is true)

beta = P(fail to reject H0|H0 is false)

power = 1 – beta

Where we are now:

What we use and learn these days is a combined method of Fisher and Neyman/Pearson. The textbook method is generally as follows;

1) define null and alternate hypotheses.

2) set an alpha value of .05, and power value of .80 before the experiment.

3) calculate test statistic and p-value based on the data collected.

4) Reject or retain (fail to reject) null hypothesis based on the p-value.

Critiques of this combined method claim that the combined method utilizes the worst of the two methods. They emphasize the focus on effect size, and the use of confidence intervals to provide better view of the problem at hand, rather than blindly relying on the p-value alone.

Keep on learning…

Reliability/Confidence Level Calculator (with c = 0, 1….., n)


The reliability/Confidence level sample size calculation is fairly known to Quality Engineers. For example, with 59 samples and 0 rejects, one can be 95% confident that the process is at least 95% reliable or that the process yields at least 95% conforming product.

I have created a spreadsheet “calculator”, that allows the user to enter the sample size, number of rejects and the desired confidence level, and the calculator will provide the reliability result.

It is interesting to note that the reliability/confidence calculation, LTPD calculation and Wilk’s first degree non-parametric one sided tolerance calculation all yield the same results.

I will post another day about LTPD versus AQL.

The spreadsheet is available here Reliability calculator based on Binomial distribution.

Keep on learning…

Wilk’s One Sided Tolerance spreadsheet for download


I have created a spreadsheet that allows the user to calculate the number of samples needed for a desired one-sided tolerance interval at a desired confidence level. Additionally, the user can also enter the desired order for the sample size.

For example, if you have 93 samples, you can be 95% confident that 95% of the population are above the 2nd lowest value samples. Alternatively, you can also state that 95% of the population is below the 2nd highest value of the samples.

Here is an example of this in use.

If there is an interest, I can also try creating a two sided tolerance interval spreadsheet as well.

The keen student might notice that the formula is identical to the Bayes Success Run Theorem when the order p =1.

The spreadsheet is available for download here. Wilks one sided

Keep on learning…