The Anatomy of an Isolated Incident:

challenger

I read about the death of Bob Ebeling today. He was a NASA contract Engineer from Morton Thiokol who tried to stop the launch of the space shuttle Challenger in 1986. On January 26, 1986 soon after the launch, the Challenger was engulfed in flames. All seven crew members lost their lives in this terrible accident. Famous Nobel laureate Richard Feynman was part of Rogers Commission which investigated the Challenger accident.  Feynman wrote about this investigation in depth in his 1988 book “What Do You Care What Other People Think?”

In today’s post, I will be looking at Isolated Incidents. There are times in my career where I am taken aback by isolated events.  These events happen very rarely, and thus it is not easy to understand the root causes. I will use the Challenger accident as the primary example to look at this. There have been 135 NASA space shuttle missions between 1981 and 2011. Of the 135 missions, 133 flights went as planned, with two ending in disaster.

The O-Ring Fiasco:

The Roger Commission identified that the Challenger accident was caused by a failure in the O-rings that were used to seal a joint on the right solid rocket booster. Bob Ebeling was among the group of Engineers who had warned NASA against the launch based on his concerns about the seals. The O-rings were not proven to work under cold conditions. It was noted that the temperature was below freezing on the day of the launch. Feynman famously demonstrated this by immersing an O-ring in a glass of ice water, and demonstrating that the O-rings were less resilient and that it retained its shape for a very short amount of time. This lack of resilience caused the failure of the seals leading to the Challenger catastrophe.

vlcsnap-2016-03-20-14h56m01s705

The Roger Commission indicated the following issues led to the Challenger accident:

  • Improper material used for the O-ring.
  • Lack of robust testing – the O-ring material was not determined to function as intended by NASA. Even though the O-ring manufacturer gave data to prove the lack of functionality at low temperatures, NASA management did not heed this.
  • Lack of understanding of risk from NASA management.
  • Potential push from management to launch the space shuttle to meet a rush deadline.

Feynman also wrote about the great disparity in the view of risk by the NASA management and the engineers. NASA management assigned a probability of 1 in 100,000 for a failure with loss of vehicle. However, when Feynman asked the engineers, he got values as low as 1 in 100. Feynman reviewed the NASA document that discussed the risk analysis of the space shuttle and was surprised to see extremely low probability values for failures. In his words;

The whole paper was quantifying everything. Just about every nut and bolt was in there. “The chance that a HPHTP pipe will burst is 10-7”. You can’t estimate things like that; a probability of 1 in 10,000,000 is almost impossible to estimate. It was clear that the numbers for each part of the engine were chosen so that when you put everything together you get 1 in 100,000.

Feynman also talked about an engineer being candid with him about his probability value of 1 in 300. He said that he calculated the risk as 1 in 300. However, he did not want to tell Feynman how he got his number!

The Anatomy of an Isolated Incident:

I have come to view the Isolated Incident cause-effect relationship as an equation. This is shown below.

Isolated Incident = Cause(s) + System weak points + Enabling Conditions

The Challenger Accident can be summarized:

Challenger Accident = Material limitation of the O-ring + NASA Management Policies + Cold conditions

The System Weak Point(s) are internal in nature. The enabling conditions, on the other hand, are external in nature. When you combine all the three factors in a perfect storm, you get an isolated incident. If we do not know all of the three factors, we are not able to solve the isolated incident. By itself alone, none of the factors above may cause the problem.

Another example is – when demand goes up, and production doubles. If the process is not robust enough to handle the spike in production, then isolated events can happen.

Pontiac’s Allergy to Vanilla Ice Cream:

I will finish this post with a fantastic story I read from Snopes:

The Pontiac division of General Motors received a complaint in the form of a letter. The letter was from a frustrated customer. He had been trying to contact the company for a while.

He wrote in the letter that he and his family were used to buying ice cream after dinner on a frequent basis. The type of ice cream that is purchased depended upon the mood of the family. He had recently purchased a new Pontiac car, and he had been having issues on his ice cream trips. He had figured out that the new car is allergic to vanilla ice cream.

If he purchased any other flavor, his car would start with no problem. However, if he purchased vanilla ice cream, his car will not start.

“What is there about a Pontiac that makes it not start when I get vanilla ice cream, and easy to start whenever I get any other kind?”, he asked in the letter.

The letter was delivered to the Pontiac President who was very amused by it. He sent an engineer to investigate the fantastic problem. The engineer went with the family three nights to get ice cream in the new car. The first night the family got chocolate ice cream, and the car started with no problem. The second night, they got strawberry. The car again was fine. On the third day, the family got vanilla ice cream; lo and behold the car would not start.

This was repeated on multiple days, and the results were always the same. The engineer was a logical man, and this stumped him. He took notes of everything. The only thing that he could see was time. The family always took the shortest amount of time when they purchased vanilla ice cream. This was because of the store layout. The vanilla ice cream was quite popular and was kept at the front of the store. Suddenly, the engineer identified why the isolated incident happened. “Vapor lock”, he exclaimed. For all the other flavors, the longer time allowed the engine to cool down sufficiently to start without any issues. When the vanilla ice cream was bought, the engine was still too hot for the vapor lock to dissipate.

Always keep on learning…

In case you missed it, my last post was Kintsukuroi and Kaizen.

Advertisements

One thought on “The Anatomy of an Isolated Incident:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s