Divine Wisdom and Paradigm Shifts:

cancer

One of the best books I have read in recent times is The Emperor of All Maladies by the talented Siddhartha Mukherjee. Mukherjee won the 2011 Pulitzer Prize for this book. The book is a detailed history of Cancer and humanity’s battle with it. Amongst many things that piqued my interest, was one of the quotes I had heard attributed to Dr. Deming – In God we trust, all others must bring data.

To tell this story, I must first talk about William S. Halsted. Halsted was a very famous surgeon from John Hopkins who came up with the surgical procedure known as the “Radical Mastectomy” in the 1880’s. This is a procedure to remove the breast, the underlying muscles and attached lymph nodes to treat breast cancer. He hypothesized that the breast cancer spreads centrifugally from the breast to other areas. Thus, the removal of the breast, underlying muscles and lymph nodes would prevent the spread of cancer. He called this the “centrifugal theory”. Halsted called this procedure as “radical” to notate that the roots of the cancer are removed. Mukherjee wrote in his book that the intent of radical mastectomy was to arrest the centrifugal spread by cutting every piece of it out of the body. Physicians all across America identified the Radical Mastectomy as the best way to treat breast cancer. The centrifugal theory became the paradigm for breast cancer treatment for almost a century.

There were skeptics of this theory. The strongest critics of this theory were Geoffrey Keynes, a London based surgeon in the 1920s, and George Barney Crile, an American surgeon who started his career in the 1950s. They noted that even with the procedures that Halsted had performed, many patients died within four or five years from metastasis (cancer spreading to different organs). The surgeons overlooked these flaws, as they were firm believers in the Radical Mastectomy. Daniel Dennett, the famous American Philosopher, talks about the concept of Occam’s Broom, which might explain the thinking process for ignoring the flaws in a hypothesis. When there is a strong acceptance of a hypothesis, any contradicting information may get swept under the rug with Occam’s Broom. The contradictory information gets ignored and not confronted.

Keynes was even able to perform a local surgery of the breast and together with radiation treatment achieve some success. But Halsted’s followers in America ridiculed this approach, and came up with the name “lumpectomy” to call the local surgery. In their minds, the surgeon was simply removing “just” a lump, and this did not make much sense. They were aligning themselves with the paradigm of Radical Mastectomy. In fact, some of the surgeons even went further to come up with “superradical” and “ultraradical” procedures that were morbidly disfiguring procedures where the breast, underlying muscles, axillary nodes, the chest wall, and occasionally the ribs, part of the sternum, the clavicle and the lymph nodes inside the chest were removed. The idea of “more was better” became prevalent.

Another paradigm with clinical studies during that time was trying to look only for positive results – is treatment A better than treatment B? However, this approach did not show that treatment A was no better than treatment B. Two statisticians, Jerry Neyman and Egon Pearson, changed the approach with their idea of using the statistical concept of power. The sample size for a study should be based on the power calculated. Loosely stated, more independent samples mean higher power. Thus, with a large sample size of randomized trials, one can make a claim of “lack of benefit” from a treatment. The Halsted procedure did not get challenged for a long time because the surgeons were not willing to take part in a large sample size study.

A Philadelphia surgeon named Dr. Bernard Fisher was finally able to shift this paradigm in the 1980s. Fisher found no reason to believe in the centrifugal theory. He studied the cases put forth by Keynes and Crile. He concluded that he needed to perform a controlled clinical trial to test the Radical Mastectomy against Simple Mastectomy and Lumpectomy with radiation. The opposition from the surgeons slowly shifted with the strong advocacy from the women who wanted a less invasive treatment. Mukherjee cites the Thalidomide tragedy, the Roe vs Wade case, along with the strong exhortation from Crile to women to refuse to submit to a Radical Mastectomy, and the public attention swirling around breast cancer for the slow shift in the paradigm. Fisher was finally able to complete the study, after ten long years. Fisher stated that he was willing to have faith in divine wisdom but not in Halsted as divine wisdom. Fisher brusquely told a journalist – “In God we trust. All other must have data.”

The results of the study proved that all three cases were statistically identical. The group treated with Radical Mastectomy however paid heavily from the procedure but had no real benefits in survival, recurrence or mortality. The paradigm of Radical Mastectomy shifted and made way to better approaches and theories.

While I was researching this further, I found that the quote “In God we trust…” was attributed to another Dr. Fisher. Dr. Edwin Fisher, brother of Dr. Bernard Fisher, when he appeared before the Subcommittee on Tobacco of the Committee on Agriculture, House of Representatives, Ninety-fifth Congress, Second Session, on September 7, 1978. As part of presentation Dr. Fisher said – “I should like to close by citing a well-recognized cliche in scientific circles. The cliche is, “In God we trust, others must provide data. This is recorded in “Effect of Smoking on Nonsmokers. Hearing Before the Subcommittee on Tobacco of the Committee on Agriculture House of Representatives. Ninety-fifth Congress, Second Session, September 7, 1978. Serial Number 95-000”. Dr. Edwin Fisher unfortunately was not a supporter of the hypothesis that smoking is bad for a non-smoker. He even cited that people traveling on an airplane are more bothered by crying babies than the smoke from the smokers.

fisher

Final Words:

This past year, I was personally affected by a family member suffering from the scourge of breast cancer. During this period of Thanksgiving in America, I am thankful for the doctors and staff who facilitated her recovery. I am thankful for the doctors and experts in the medical field who were courageous to challenge the “norms” of the day for treating breast cancer. I am thankful for the paradigm shift(s) that brought better and effective treatments for breast cancer. More is not always better! I am thankful for them for not accepting a hypothesis based on just rationalism, an intuition on how things might be working. I am thankful for all the wonderful doctors and staff out there who take great care in treating all cancer patients.

I am also intrigued to find the quote of “In God we trust…” used with the statement that smoking may not have a negative impact on non-smokers.

I will finish with a story of another paradigm shift from Joel Barker in The Business of Paradigms.

A couple of Swiss watchmakers in Centre Electronique Horloger (CEH) in Neuchâtel, Switzerland, developed the first Quartz based watch. They went to different Swiss watchmakers with the technology that would later revolutionize the watch industry. However, the paradigm at that time was the intricate Swiss watch making process with gears and springs. No Swiss Watch company was interested in this new technology which did not rely on gears or springs for keeping time. The Swiss watchmakers with the new idea then went to a Clock convention and set up a booth to demonstrate their new idea. Again, no Swiss watch company was interested in what they had to offer. Two representatives, one from the Japanese company Seiko, and the other from Texas Instruments took notice of the new technology. They purchased the patents and as they say – the rest is history. The new paradigm then became Quartz watches. The Swiss, who were on the top of watch making with over 50% of the watch market in the 1970s, stepped aside for the Quartz watch revolution marking the decline of their industry. This was later termed as the Quartz Revolution.    

Always keep on learning…

In case you missed it, my last post was The Best Attribute to Have at the Gemba:

Advertisements

Rules of 3 and 5:

rules of thumb

It has been a while since I have blogged about statistics. So in today’s post, I will be looking at rules of 3 and 5. These are heuristics or rules of thumb that can help us out. They are associated with sample sizes.

Rule of 3:

Let’s assume that you are looking at a binomial event (pass or fail). You took 30 samples and tested them to see how many passes or failures you get. The results yielded no failures. Then, based on the rule of 3, you can state that at 95% confidence level, the upper bound for a failure is 3/30 = 10% or the reliability is at least 90%. The rule is written as;

p = 3/n

Where p is the upper bound of failure, and n is the sample size.

Thus, if you used 300 samples, then you could state with 95% confidence that the process is at least 99% reliable based on p = 3/300 = 1%. Another way to express this is to say that with 95% confidence fewer than 1 in 100 units will fail under the same conditions.

This rule can be derived from using binomial distribution. The 95% confidence comes from the alpha value of 0.05. The calculated value from the rule of three formula gets more accurate with a sample size of 20 or more.

Rule of 5:

I came across the rule of 5 from Douglas Hubbard’s informative book “How to Measure Anything” [1]. Hubbard states the Rule of 5 as;

There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.

This is a really neat heuristic because you can actually tell a lot from a sample size of 5! The median is the 50th percentile value of a population, the point where half of the population is above it and half of the population is below it. Hubbard points out the probability of picking a value above or below the median is 50% – the same as a coin toss. Thus, we can calculate that the probability of getting 5 heads in a row is 0.5^5 or 3.125%. This would be the same for getting 5 tails in a row. Then the probability of not getting all heads or all tails is (100 – (3.125+3.125)) or 93.75%. Thus, we can state that the chance of one value out of five being above the median and at least one value below the median is 93.75%.

Final words:

The reader has to keep in mind that both of the rules require the use of randomly selected samples. The Rule of 3 is a version of Bayes’ Success Run Theorem and Wilk’s One-sided Tolerance calculation. I invite the reader to check out my posts that sheds more light on this 1) Relationship between AQL/RQL and Reliability/Confidence , 2) Reliability/Confidence Level Calculator (with c = 0, 1….., n) and 3) Wilk’s One-sided Tolerance Spreadsheet.

When we are utilizing random samples to represent a population, we are calculating a statistic – a representation value of the parameter value. A statistic is an estimate of the parameter, the true value from a population. The higher the sample size used, the better the statistic can represent the parameter and better your estimation.

I will finish with a story based on chance and probability;

It was the finals and an undergraduate psychology major was totally hung over from the previous night. He was somewhat relieved to find that the exam was a true/false test. He had taken a basic stat course and did remember his professor once performing a coin flipping experiment. On a moment of clarity, he decided to flip a coin he had in his pocket to get the answers for each questions. The psychology professor watched the student the entire two hours as he was flipping the coin…writing the answer…flipping the coin….writing the answer, on and on. At the end of the two hours, everyone else had left the room except for this one student. The professor walks up to his desk and angrily interrupts the student, saying: “Listen, it is obvious that you did not study for this exam since you didn’t even open the question booklet. If you are just flipping a coin for your answer, why is it taking you so long?”

The stunned student looks up at the professor and replies bitterly (as he is still flipping the coin): “Shhh! I am checking my answers!”

Always keep on learning…

In case you missed it, my last post was Kenjutsu, Ohno and Polanyi:

[1] How to Measure Anything.

Cpk/Ppk and Percent Conforming:

cap

It has been a while since I have posted about Quality Statistics. In today’s post, I will talk about how process capability is connected to percent conforming.

In this post, I will be using Cpk and assuming normality for the sake of simplicity. Please bear in mind that there are multiple ways to calculate process capability, and that not all distributions are normal in nature. The two assumptions help me in explaining this better.

What is Cpk?

The process capability index Cpk is a one shot number that gives you an idea of the capability of the process to center around the nominal specification. It also tells you how much percent conforming product is the process producing. Please note that I am not discussing Cp index in this post.

Cpk is determined as the lower of two values. To simplify, let’s call them Cpklower and Cpkupper.

Cpklower = (Process Mean – LSL)/3* s

Cpkupper = (USL – Process Mean)/ 3* s

Where USL is the Upper Specification Limit,

LSL is the Lower Specification Limit, and

s is an estimate of the Population Standard Deviation.

Cpk = minimum (Cpklower, Cpkupper)

The “k” in Cpk stands for “Process Location Ratio” and is dimensionless. It is defined as;

k = abs(Specification Mean – Process Mean)/((USL-LSL)/2)

Where Specification Mean is the nominal specification.

Interestingly when k = 0, Cpk = Cp. This happens when the process is perfectly centered. An additional thing to note is also that Cpk ≈ Ppk when the process is perfectly centered.

You can easily use Ppk in place of Cpk for the above equations. The only difference between Ppk and Cpk is the way we calculate the estimate for the standard deviation.

But What Does Cpk Tell Us?

If we can assume normality, we can easily convert the Cpk value to a Z value. This allows one to calculate the percentage falling inside the specification limits using normal distribution tables. We can easily do this in Excel.

Cpk can be converted to the Z value by simply multiplying it by 3.

Z = 3 * Cpk

In Excel, the Estimated % Non-conforming can be calculated as =NORMSDIST(-Z)

It does get a little tricky, if the process is not centered or if you are looking at a one-sided specification. The table below should come in handy.

z table

The Estimated % Conforming can be easily calculated as 1 – Estimated % Non-conforming.

The % Conforming is very similar to a tolerance interval calculation. The tolerance interval calculation allows us to make a statement like “we can expect x% of the population to be between two tolerance values at y% confidence level.” However, we cannot make such a statement with just a Cpk calculation. To make such a statement, we will need to calculate the RQL (Rejectable Quality Level) by creating an OC curve. Unfortunately, this is not straightforward, and requires methods like non-central t-distribution. I highly recommend Dr. Taylor’s Distribution Analyzer for this.

What about Confidence Interval?

I am proposing that we can calculate the confidence interval for the Cpk value and thus, for the Estimated % Non-conforming. It is recommended that we use the lower bound confidence interval for this. Before I proceed, I should explain what confidence interval means. It is not technically correct that the population parameter value (e.g. height of kids between ages 10 and 15) is between the two confidence interval bounds. We cannot technically say that at 95% confidence level, the mean height of the population is between X and Y for kids between ages 10 and 15.

Using the mean height as an example, the confidence interval just means that if we keep taking samples from the population, and keep calculating the estimate for mean height, the calculated confidence interval for each of those sample would contain the true mean height, 95% of the time (if we used a 95% confidence level).

We can calculate the lower bound for Cpk at a preferred confidence level, say 95%. We can then convert this to the Z-value and find the estimated % conforming at 95% confidence level. We can then make a statement similar to the tolerance interval.

A Cpk value of 2.00 with a sample size of 12 may not mean much. The calculated Cpk is only an estimate of the true Cpk of the population. Thus like any other parameter (mean, variance etc.), you need a larger sample size to make a better estimate. The use of confidence interval helps us in this regard since it penalizes for lack of sample size.

An Example:

The Quality Engineer at a Medical Device company is performing a capability study on seal strength on pouches. The LSL is 1.1 lbf/in. He used 30 as the sample size, and found that the sample mean was 1.87 lbf/in, and the sample standard deviation was 0.24.

Let’s apply what we have discussed here so far.

LSL = 1.1

Process Mean = 1.87

Process sigma = 0.24

From this we can calculate the Ppk as 1.07. The Quality Engineer calculated Ppk since this was a new process.

Ppk = (Process Mean – LSL) /3 * Process Sigma

Z = Ppk * 3 = 3.21

Estimated % Non-conforming = NORMSDIST(-Z) = 0.000663675 = 0.07%

Note: Since we are using a unilateral specification, we do not need to double the % non-conforming to capture both sides of the bell curve.

Estimated % Conforming = 1 – Estimated % Non-conforming = 99.93363251%

We can calculate the Ppk lower bound at a 95% confidence level for a sample size = 30. You can use the spreadsheet at the end of this post to do this calculation.

Ppk Lower bound at 95% confidence level = 0.817

Lower bound Z = Ppk_lower_bound x 3 = 2.451

Lower bound (95%) % Non-conforming = NORMSDIST(-Lower_bound_Z) = 0.007122998 = 0.71%

Lower bound (95%) % Conforming = 99.28770023% =99.29%

In effect (all things considered), we can state that with 95% confidence at least 99.29% of the values are in spec. Or we can correctly state that the 95% confidence lower bound for % in spec is 99.29%.

You can download the spreadsheet here. Please note that this post is based on my personal view on the matter. Please use it with caution. I have used normal distribution to calculate the Ppk and the lower bound for Ppk. I welcome your thoughts and comments.

Always keep on learning…

In case you missed it, my last post was Want to Increase Productivity at Your Plant? Read This.

Let’s Talk About Tea:

wonderland

This week, I was talking to one of my colleagues and going off on a tangent we began discussing tea. His parents are from UK. Today’s post is inspired by that conversation.

Milk First or Tea First:

The question of whether to add milk first or tea first is an interesting one. As part of writing this post, I did some research on this one. The first documented account of milk being added to tea is from Johan Nieuhof (1618-1672), a steward of the then Dutch ambassador to China. He wrote about adding one fourth of warm milk to tea with salt. The idea of using milk with tea was made popular in Europe by social critic Marie de Rabutin Chantal, the Marquise de Seven in 1680.

The socially correct protocol, according to Douglas Adams (author of Hitchhiker’s Guide to the Galaxy) and many others is to add milk in after tea. There are many anecdotes on why this is the case. The most popular version is about the quality of tea cups back in the day. Pouring hot tea first broke the low quality cups. The upper class of the society showed off their high quality cups by pouring hot tea first and then milk. The people who could not afford high quality tea cups poured milk first and then tea. Another reason could be also the way the process of making tea was documented. As noted above, the documented process was to add milk to tea.

George Orwell even wrote an essay on making tea called “A Nice Cup of Tea”. His preference was to add tea first and then milk. His logic was as follows;

One should pour tea into the cup first. This is one of the most controversial points of all; indeed in every family in Britain there are probably two schools of thought on the subject. The milk-first school can bring forward some fairly strong arguments, but I maintain that my own argument is unanswerable. This is that, by putting the tea in first and stirring as one pours, one can exactly regulate the amount of milk whereas one is liable to put in too much milk if one does it the other way round.

Douglas Adams on the other hand liked to add milk first even though it was not the socially correct protocol. Today, scientists will tell you that the proper way of making tea is to add the milk first and then tea. Milk proteins when exposed to a temperature above 75 degrees C (167 degrees F) will start to degrade through the process of denaturation. This is more prone to happen when milk is added to tea rather than when tea is added to milk.

The Lady Tasting Tea:

The story of the lady tasting tea is perhaps the most fantastic story in the field of statistics. There are a few different versions as to where the incident took place. The story goes that in an English afternoon in 1920’s, a statistician, a chemist and an algologist were sitting together. The statistician offered to make tea, and proceeded to pour tea and then milk. The algologist, a lady (hence the name a lady tasting tea) objected to the process. She told the statistician that she preferred to have the milk poured before tea. She claimed that she could tell the difference. The chemist who was the fiancée of the algologist immediately wanted to test her claim, as any warm blooded scientist would do. The statistician proceeded to create an impromptu test for the lady. He created four cups of tea with milk first, and then four cups of tea with tea first. He randomized the cups using a published collection of random sampling numbers. The lady was informed of the test protocol and then she tasted each cup and identified all the cups accurately, thus standing by her claim.

The statistician was Sir Ronald Fisher, the chemist was Dr. William A Roach and the lady algologist was Dr. Blanche Muriel Bristol. The story was documented by Sir Fisher in the groundbreaking book “The Design of Experiments” and in his paper “The Mathematics of a Lady Tasting Tea”. The probability of the lady getting all the results correct was 1/70 = 0.014. This value is less than the magical 0.05. Interestingly, Sir Fisher wrote the following about the 0.05 value in the paper;

“It is usual and convenient for experimenters to take 5 percent, as a standard level of significance…”

If the lady had gotten one result incorrect, the p-value would had been 0.243, and the testers would have failed to reject the null hypothesis that the lady has no ability to tell the difference between the two styles of making tea. Thus, one can say the test is not fair since if the lady failed once, it would not help justify her claim. In the paper, Sir Fisher advised that to improve the test, one should use 6 cups each of tea. The p-value of getting one incorrect is only 0.04, which is still less than 0.05. Thus, the lady has a little more leeway.

This story helped explain the idea of randomization and significance testing. The test’s efficacy is improved further if the total number of particular styles were kept secret. Dr. Bristol was told about the exact number of each style of tea beforehand.

The Answer to the Ultimate Question of Life, the Universe, and Everything:

In Hitchhiker’s Guide to Galaxy, the answer to the Ultimate Question of life, the universe and everything is given as 42! I came across a possible explanation during my research for this post based on Douglas Adam’s passion for tea.

42 = fortytwo

For tea two.

Two for tea!

Always keep on learning…

In case you missed it, my last post was about Respect for Humanity.

Extra Sensory Perception Statistics:

rhine_zener

In today’s post, I am going to combine two of my favorite topics – mindreading and statistics.

I should confess upfront that I do not read minds, at least not literally. I do have a passion for magic and mentalism. I would like to introduce the readers to Joseph Banks Rhine. He is the creator of ESP (Extra Sensory Perception) cards. These are a set of 5 cards with 5 shapes (circle, cross, waves, square and a star). These cards were used for testing ESP. The readers might remember the Bill Murray scene in the movie Ghostbusters. The ESP cards are a common tool for a mentalist.

In 1937, Zenith Radio Corporation carried out multiple experiments under the guidance of Rhine. A selected group of psychics chose a “random” sequence and transmitted it out during the radio show. The listeners were asked to “receive” the transmitted sequence, write it down and send it back to the radio station. The sequence had 5 values and each value was binary in nature. This could be heads and tails, light and dark, black and white, or a group of symbols. The two values were represented as 0 and 1. Thus, a possible sequence could be 00101.

The hypothesis was that human beings are sensitive to psychic transmissions. It is reported that over a million data points were collected as part of these experiments. From a statistics viewpoint, this is a statistician’s dream come true!

The results of the study implied strongly about the existence of ESP. The number of correct guesses was significantly high, if the calculations were based on assumption of randomness.

A million data points is a statistically valid sample size. The studies were blind in nature. The “psychics” in the radio station did not cheat. The responding listeners did not have a way to know the sequence before-hand. So did they prove that ESP is real?

Enter Louis Goodfellow:

Goodfellow (an apt name) was a psychologist involved in the study. He realized something was fundamentally wrong with the study. The data that was transmitted was not truly random. The data was “randomly” chosen by the psychics. Unfortunately, being random is not something that we, human beings, are good at. We will try really hard to create a random sequence, and in the process create a completely non-random sequence. Certain sequences are chosen more than the others, across the board. With over a million data points, there should have been close to 3% occurrence of 11111 or 00000. The data showed this was actually less than 1%. Additionally, with such a large sample size, we would expect uniform data, meaning all sequences should show up with nearly equal proportions. This was not the case either.

In other words, the study revealed that the psychics were indeed human beings. Goodfellow repeated the study without involving the psychics. The study group was required to create a “random” sequence. The resulting data was very much similar to the Zenith radio data. Goodfellow also repeated studies with truly random sequences, and the study group failed to “receive” the sequences. (A psychological interpretation of the results of the Zenith radio experiments in telepathy)

The basic assumptions of independence and randomness were not followed for the original study. Thus, we still do not have evidence that ESP is real.

Always keep on learning…