Rules of 3 and 5:

rules of thumb

It has been a while since I have blogged about statistics. So in today’s post, I will be looking at rules of 3 and 5. These are heuristics or rules of thumb that can help us out. They are associated with sample sizes.

Rule of 3:

Let’s assume that you are looking at a binomial event (pass or fail). You took 30 samples and tested them to see how many passes or failures you get. The results yielded no failures. Then, based on the rule of 3, you can state that at 95% confidence level, the upper bound for a failure is 3/30 = 10% or the reliability is at least 90%. The rule is written as;

p = 3/n

Where p is the upper bound of failure, and n is the sample size.

Thus, if you used 300 samples, then you could state with 95% confidence that the process is at least 99% reliable based on p = 3/300 = 1%. Another way to express this is to say that with 95% confidence fewer than 1 in 100 units will fail under the same conditions.

This rule can be derived from using binomial distribution. The 95% confidence comes from the alpha value of 0.05. The calculated value from the rule of three formula gets more accurate with a sample size of 20 or more.

Rule of 5:

I came across the rule of 5 from Douglas Hubbard’s informative book “How to Measure Anything” [1]. Hubbard states the Rule of 5 as;

There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.

This is a really neat heuristic because you can actually tell a lot from a sample size of 5! The median is the 50th percentile value of a population, the point where half of the population is above it and half of the population is below it. Hubbard points out the probability of picking a value above or below the median is 50% – the same as a coin toss. Thus, we can calculate that the probability of getting 5 heads in a row is 0.5^5 or 3.125%. This would be the same for getting 5 tails in a row. Then the probability of not getting all heads or all tails is (100 – (3.125+3.125)) or 93.75%. Thus, we can state that the chance of one value out of five being above the median and at least one value below the median is 93.75%.

Final words:

The reader has to keep in mind that both of the rules require the use of randomly selected samples. The Rule of 3 is a version of Bayes’ Success Run Theorem and Wilk’s One-sided Tolerance calculation. I invite the reader to check out my posts that sheds more light on this 1) Relationship between AQL/RQL and Reliability/Confidence , 2) Reliability/Confidence Level Calculator (with c = 0, 1….., n) and 3) Wilk’s One-sided Tolerance Spreadsheet.

When we are utilizing random samples to represent a population, we are calculating a statistic – a representation value of the parameter value. A statistic is an estimate of the parameter, the true value from a population. The higher the sample size used, the better the statistic can represent the parameter and better your estimation.

I will finish with a story based on chance and probability;

It was the finals and an undergraduate psychology major was totally hung over from the previous night. He was somewhat relieved to find that the exam was a true/false test. He had taken a basic stat course and did remember his professor once performing a coin flipping experiment. On a moment of clarity, he decided to flip a coin he had in his pocket to get the answers for each questions. The psychology professor watched the student the entire two hours as he was flipping the coin…writing the answer…flipping the coin….writing the answer, on and on. At the end of the two hours, everyone else had left the room except for this one student. The professor walks up to his desk and angrily interrupts the student, saying: “Listen, it is obvious that you did not study for this exam since you didn’t even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?”

The stunned student looks up at the professor and replies bitterly (as he is still flipping the coin): “Shhh! I am checking my answers!”

Always keep on learning…

In case you missed it, my last post was Kenjutsu, Ohno and Polanyi:

[1] How to Measure Anything.


Cpk/Ppk and Percent Conforming:


It has been a while since I have posted about Quality Statistics. In today’s post, I will talk about how process capability is connected to percent conforming.

In this post, I will be using Cpk and assuming normality for the sake of simplicity. Please bear in mind that there are multiple ways to calculate process capability, and that not all distributions are normal in nature. The two assumptions help me in explaining this better.

What is Cpk?

The process capability index Cpk is a one shot number that gives you an idea of the capability of the process to center around the nominal specification. It also tells you how much percent conforming product is the process producing. Please note that I am not discussing Cp index in this post.

Cpk is determined as the lower of two values. To simplify, let’s call them Cpklower and Cpkupper.

Cpklower = (Process Mean – LSL)/3* s

Cpkupper = (USL – Process Mean)/ 3* s

Where USL is the Upper Specification Limit,

LSL is the Lower Specification Limit, and

s is an estimate of the Population Standard Deviation.

Cpk = minimum (Cpklower, Cpkupper)

The “k” in Cpk stands for “Process Location Ratio” and is dimensionless. It is defined as;

k = abs(Specification Mean – Process Mean)/((USL-LSL)/2)

Where Specification Mean is the nominal specification.

Interestingly when k = 0, Cpk = Cp. This happens when the process is perfectly centered. An additional thing to note is also that Cpk ≈ Ppk when the process is perfectly centered.

You can easily use Ppk in place of Cpk for the above equations. The only difference between Ppk and Cpk is the way we calculate the estimate for the standard deviation.

But What Does Cpk Tell Us?

If we can assume normality, we can easily convert the Cpk value to a Z value. This allows one to calculate the percentage falling inside the specification limits using normal distribution tables. We can easily do this in Excel.

Cpk can be converted to the Z value by simply multiplying it by 3.

Z = 3 * Cpk

In Excel, the Estimated % Non-conforming can be calculated as =NORMSDIST(-Z)

It does get a little tricky, if the process is not centered or if you are looking at a one-sided specification. The table below should come in handy.

z table

The Estimated % Conforming can be easily calculated as 1 – Estimated % Non-conforming.

The % Conforming is very similar to a tolerance interval calculation. The tolerance interval calculation allows us to make a statement like “we can expect x% of the population to be between two tolerance values at y% confidence level.” However, we cannot make such a statement with just a Cpk calculation. To make such a statement, we will need to calculate the RQL (Rejectable Quality Level) by creating an OC curve. Unfortunately, this is not straightforward, and requires methods like non-central t-distribution. I highly recommend Dr. Taylor’s Distribution Analyzer for this.

What about Confidence Interval?

I am proposing that we can calculate the confidence interval for the Cpk value and thus, for the Estimated % Non-conforming. It is recommended that we use the lower bound confidence interval for this. Before I proceed, I should explain what confidence interval means. It is not technically correct that the population parameter value (e.g. height of kids between ages 10 and 15) is between the two confidence interval bounds. We cannot technically say that at 95% confidence level, the mean height of the population is between X and Y for kids between ages 10 and 15.

Using the mean height as an example, the confidence interval just means that if we keep taking samples from the population, and keep calculating the estimate for mean height, the calculated confidence interval for each of those sample would contain the true mean height, 95% of the time (if we used a 95% confidence level).

We can calculate the lower bound for Cpk at a preferred confidence level, say 95%. We can then convert this to the Z-value and find the estimated % conforming at 95% confidence level. We can then make a statement similar to the tolerance interval.

A Cpk value of 2.00 with a sample size of 12 may not mean much. The calculated Cpk is only an estimate of the true Cpk of the population. Thus like any other parameter (mean, variance etc.), you need a larger sample size to make a better estimate. The use of confidence interval helps us in this regard since it penalizes for lack of sample size.

An Example:

The Quality Engineer at a Medical Device company is performing a capability study on seal strength on pouches. The LSL is 1.1 lbf/in. He used 30 as the sample size, and found that the sample mean was 1.87 lbf/in, and the sample standard deviation was 0.24.

Let’s apply what we have discussed here so far.

LSL = 1.1

Process Mean = 1.87

Process sigma = 0.24

From this we can calculate the Ppk as 1.07. The Quality Engineer calculated Ppk since this was a new process.

Ppk = (Process Mean – LSL) /3 * Process Sigma

Z = Ppk * 3 = 3.21

Estimated % Non-conforming = NORMSDIST(-Z) = 0.000663675 = 0.07%

Note: Since we are using a unilateral specification, we do not need to double the % non-conforming to capture both sides of the bell curve.

Estimated % Conforming = 1 – Estimated % Non-conforming = 99.93363251%

We can calculate the Ppk lower bound at a 95% confidence level for a sample size = 30. You can use the spreadsheet at the end of this post to do this calculation.

Ppk Lower bound at 95% confidence level = 0.817

Lower bound Z = Ppk_lower_bound x 3 = 2.451

Lower bound (95%) % Non-conforming = NORMSDIST(-Lower_bound_Z) = 0.007122998 = 0.71%

Lower bound (95%) % Conforming = 99.28770023% =99.29%

In effect (all things considered), we can state that with 95% confidence at least 99.29% of the values are in spec. Or we can correctly state that the 95% confidence lower bound for % in spec is 99.29%.

You can download the spreadsheet here. Please note that this post is based on my personal view on the matter. Please use it with caution. I have used normal distribution to calculate the Ppk and the lower bound for Ppk. I welcome your thoughts and comments.

Always keep on learning…

In case you missed it, my last post was Want to Increase Productivity at Your Plant? Read This.

Let’s Talk About Tea:


This week, I was talking to one of my colleagues and going off on a tangent we began discussing tea. His parents are from UK. Today’s post is inspired by that conversation.

Milk First or Tea First:

The question of whether to add milk first or tea first is an interesting one. As part of writing this post, I did some research on this one. The first documented account of milk being added to tea is from Johan Nieuhof (1618-1672), a steward of the then Dutch ambassador to China. He wrote about adding one fourth of warm milk to tea with salt. The idea of using milk with tea was made popular in Europe by social critic Marie de Rabutin Chantal, the Marquise de Seven in 1680.

The socially correct protocol, according to Douglas Adams (author of Hitchhiker’s Guide to the Galaxy) and many others is to add milk in after tea. There are many anecdotes on why this is the case. The most popular version is about the quality of tea cups back in the day. Pouring hot tea first broke the low quality cups. The upper class of the society showed off their high quality cups by pouring hot tea first and then milk. The people who could not afford high quality tea cups poured milk first and then tea. Another reason could be also the way the process of making tea was documented. As noted above, the documented process was to add milk to tea.

George Orwell even wrote an essay on making tea called “A Nice Cup of Tea”. His preference was to add tea first and then milk. His logic was as follows;

One should pour tea into the cup first. This is one of the most controversial points of all; indeed in every family in Britain there are probably two schools of thought on the subject. The milk-first school can bring forward some fairly strong arguments, but I maintain that my own argument is unanswerable. This is that, by putting the tea in first and stirring as one pours, one can exactly regulate the amount of milk whereas one is liable to put in too much milk if one does it the other way round.

Douglas Adams on the other hand liked to add milk first even though it was not the socially correct protocol. Today, scientists will tell you that the proper way of making tea is to add the milk first and then tea. Milk proteins when exposed to a temperature above 75 degrees C (167 degrees F) will start to degrade through the process of denaturation. This is more prone to happen when milk is added to tea rather than when tea is added to milk.

The Lady Tasting Tea:

The story of the lady tasting tea is perhaps the most fantastic story in the field of statistics. There are a few different versions as to where the incident took place. The story goes that in an English afternoon in 1920’s, a statistician, a chemist and an algologist were sitting together. The statistician offered to make tea, and proceeded to pour tea and then milk. The algologist, a lady (hence the name a lady tasting tea) objected to the process. She told the statistician that she preferred to have the milk poured before tea. She claimed that she could tell the difference. The chemist who was the fiancée of the algologist immediately wanted to test her claim, as any warm blooded scientist would do. The statistician proceeded to create an impromptu test for the lady. He created four cups of tea with milk first, and then four cups of tea with tea first. He randomized the cups using a published collection of random sampling numbers. The lady was informed of the test protocol and then she tasted each cup and identified all the cups accurately, thus standing by her claim.

The statistician was Sir Ronald Fisher, the chemist was Dr. William A Roach and the lady algologist was Dr. Blanche Muriel Bristol. The story was documented by Sir Fisher in the groundbreaking book “The Design of Experiments” and in his paper “The Mathematics of a Lady Tasting Tea”. The probability of the lady getting all the results correct was 1/70 = 0.014. This value is less than the magical 0.05. Interestingly, Sir Fisher wrote the following about the 0.05 value in the paper;

“It is usual and convenient for experimenters to take 5 percent, as a standard level of significance…”

If the lady had gotten one result incorrect, the p-value would had been 0.243, and the testers would have failed to reject the null hypothesis that the lady has no ability to tell the difference between the two styles of making tea. Thus, one can say the test is not fair since if the lady failed once, it would not help justify her claim. In the paper, Sir Fisher advised that to improve the test, one should use 6 cups each of tea. The p-value of getting one incorrect is only 0.04, which is still less than 0.05. Thus, the lady has a little more leeway.

This story helped explain the idea of randomization and significance testing. The test’s efficacy is improved further if the total number of particular styles were kept secret. Dr. Bristol was told about the exact number of each style of tea beforehand.

The Answer to the Ultimate Question of Life, the Universe, and Everything:

In Hitchhiker’s Guide to Galaxy, the answer to the Ultimate Question of life, the universe and everything is given as 42! I came across a possible explanation during my research for this post based on Douglas Adam’s passion for tea.

42 = fortytwo

For tea two.

Two for tea!

Always keep on learning…

In case you missed it, my last post was about Respect for Humanity.

Extra Sensory Perception Statistics:


In today’s post, I am going to combine two of my favorite topics – mindreading and statistics.

I should confess upfront that I do not read minds, at least not literally. I do have a passion for magic and mentalism. I would like to introduce the readers to Joseph Banks Rhine. He is the creator of ESP (Extra Sensory Perception) cards. These are a set of 5 cards with 5 shapes (circle, cross, waves, square and a star). These cards were used for testing ESP. The readers might remember the Bill Murray scene in the movie Ghostbusters. The ESP cards are a common tool for a mentalist.

In 1937, Zenith Radio Corporation carried out multiple experiments under the guidance of Rhine. A selected group of psychics chose a “random” sequence and transmitted it out during the radio show. The listeners were asked to “receive” the transmitted sequence, write it down and send it back to the radio station. The sequence had 5 values and each value was binary in nature. This could be heads and tails, light and dark, black and white, or a group of symbols. The two values were represented as 0 and 1. Thus, a possible sequence could be 00101.

The hypothesis was that human beings are sensitive to psychic transmissions. It is reported that over a million data points were collected as part of these experiments. From a statistics viewpoint, this is a statistician’s dream come true!

The results of the study implied strongly about the existence of ESP. The number of correct guesses was significantly high, if the calculations were based on assumption of randomness.

A million data points is a statistically valid sample size. The studies were blind in nature. The “psychics” in the radio station did not cheat. The responding listeners did not have a way to know the sequence before-hand. So did they prove that ESP is real?

Enter Louis Goodfellow:

Goodfellow (an apt name) was a psychologist involved in the study. He realized something was fundamentally wrong with the study. The data that was transmitted was not truly random. The data was “randomly” chosen by the psychics. Unfortunately, being random is not something that we, human beings, are good at. We will try really hard to create a random sequence, and in the process create a completely non-random sequence. Certain sequences are chosen more than the others, across the board. With over a million data points, there should have been close to 3% occurrence of 11111 or 00000. The data showed this was actually less than 1%. Additionally, with such a large sample size, we would expect uniform data, meaning all sequences should show up with nearly equal proportions. This was not the case either.

In other words, the study revealed that the psychics were indeed human beings. Goodfellow repeated the study without involving the psychics. The study group was required to create a “random” sequence. The resulting data was very much similar to the Zenith radio data. Goodfellow also repeated studies with truly random sequences, and the study group failed to “receive” the sequences. (A psychological interpretation of the results of the Zenith radio experiments in telepathy)

The basic assumptions of independence and randomness were not followed for the original study. Thus, we still do not have evidence that ESP is real.

Always keep on learning…