Confessions of a Quality Manager  

Being the adventures of four jet-setting quality consultants who like to talk shop even more than they like good food and drink.

This is fantasy consulting. For the real thing, go to Fell Services' Quality pages.

People don't normally die because of poorly designed graphs. They don't normally die, either, because a US Consumer Product Safety Commission banned the use of asbestos in certain paint products. Deaths do sometimes happen because the media and pressure groups run off in unusual directions (normally ones which draw attention to themselves and sell newspapers), but no one could imagine that the decision to cease manufacture of "innocuous" products like hair dryers could have any dramatic effect at all.

But people did die, in 1986 - in the space shuttle, Challenger.

Anyone who is 20 or over will remember the pictures. It was January 28, 11.38. The shuttle flight had been postponed five times (three times because of the weather, once because of difficulties with a closing fixture and once because of a failure in the launch processing system). That day was the last one which could extract maximum PR influence - President Reagan's "State of the Nation" address was scheduled, the "Teacher in Space" material was ready to be rolled out to all US schools and Halley's Comet, which would provide experimental source material, was rapidly vanishing into the distance, and would soon be too far away to provide any source material at all.

The crew could almost have been designed to fit every positive role model: apart from the teacher, there was the second woman in space, the second African-American in space, the fun loving friend/caring husband, the Hawaiian, the sailor and the experienced officer, who had flown more than 45 different types of aircraft.

So the eyes of the world were upon this flight of the only reusable spacecraft - the launch was televised live. Not only was there a civilian on board, but the mission objectives included observations of the tail of Halley's Comet, deployment of a Tracking Data Relay Satellite and the flying of a Shuttle-Pointed Tool for Astronomy (SPARTAN-203, for anyone who is into acronyms).

It was a cold, freezing morning - temperature 31 degrees Fahrenheit. The launch was perfect. There was a strong puff of smoke just after lift off which was rather worrying, though. Then Challenger encountered the first of several high-altitude wind shear conditions, but these were immediately sensed and countered by the guidance, navigation and control systems on board.

The very small flame which appeared on the right Solid Rocket Booster was only detected on image enhanced film. That was 58.78 seconds into the flight. This flame increased in size and changed both shape and colour at 64.66 seconds (this showed that it was mixing with the leaking hydrogen from the external tank).

Enough of the second by second commentary. The Challenger exploded at 73.13 seconds while travelling at Mach 1.92 at an altitude of 46,000 feet. The last recorded transmission was at 73.62 seconds and then Challenger debris fell into the ocean as both the Solid Rocket Boosters flew in different directions. The parachutes from them were observed floating down: initially they were thought to be escaping astronauts. However, Challenger had no escape facility, as it was considered to be a safe craft. (It was later found that a few of the crew members had activated their emergency air and locator devices, but, really, I don't think there was much time for them to realise what was happening).

It wasn't exactly the image NASA wanted to project and a Commission of Enquiry was immediately set up to establish the accident cause, headed by former Secretary of State, William P Rogers, and including a team from NASA. Physicist Richard Feynmann (my hero) was invited to participate. The problem with Feynmann was that he was under the impression that he was needed to discover the truth of the accident - he already had intense curiosity, almost to the point of obsession, and was truly independent, being too academically powerful and too much internationally recognised for his findings to be disregarded or side-lined. Feynmann looked for a technological reason for the failure. He postulated that the rubber used to seal the Solid Rocket Booster joints using O rings failed to expand sufficiently when the temperature was at or below 32 degree Fahrenheit.

Of course, it's not enough to glibly say that the O rings were to blame. You have to go into more details and describe the Tang and Clevis joints. I'm not going to do that, because there are many other references which can show the Solid Rocket Boosters more clearly than I can ... and when you say things like "the Tang joint was connected to the Clevis joint and the Clevis joint was connected to the ..." it sounds like the beginning of a bad joke and there have been more than enough bad jokes about Challenger as it is. It's also a gross simplification, because the straight line Tang "the bottom" would slide down the sides of the U-shaped Clevis "the top" ... I recommend the excellent Space Shuttle Challenger web page designed and written by Davinder S Mahal.

OK, back to technicalities. The O rings seal a necessary gap on the inside of the Tang and Clevis. During launch, the O ring should move to seal the delta gap opening. The main technological cause of the explosion was the failure of the right Solid Rocket Booster aft joint sealing.

This is where the statistics comes in (and about time, too!). On the launch day, the external temperature was 31 degrees Fahrenheit. The temperature of the right Solid Rocket Booster was 28 degrees Fahrenheit, plus or minus 5 degrees. Now, O rings do not seal properly in low temperatures - they get stiff and don't work as fast as they should. This problem was well documented, but when Morton Thiokol, the manufacturers, conveyed the information to NASA, somehow NASA interpreted what they wanted to hear. I'm sympathetic: it's easy to get suffocated with statistics. But the NASA management made two basic mistakes:

  • As far as statistics is concerned, the NASA attitude was that because there was no data showing that the O rings will work at low temperatures, that meant that there was no data showing that the O rings would fail at low temperature.

  • The NASA management specifically rejected the Morton Thiokol advice and excluded the Morton Thiokol engineers from the final decision making process with reference to the launch. The kindest interpretation of this is that the NASA managers did not understand the technicalities of the Morton Thiokol engineers.

So, basically, as far as NASA was concerned, there was no correlation between low temperature and O ring failure rate. And that was true, using the data they used. But they didn't use the whole data - some commentators suggest they only used data based on two launches and therefore disregarded 92% of the data. The NASA attitude seemed to be that each time nothing went wrong meant that the risk was going down and, anyway, the O ring problem was an acceptable flight risk. (NASA just couldn't afford a costly redesign and the bad PR from yet more delays). If NASA had used the whole data, they would have seen a negative correlation (and, in passing, this is what started me on looking at the Challenger disaster in the first place).

A space shuttle is an enormously complicated beast. Getting back to the Solid Rocket Boosters again, the Tang and Clevis joints were filled with putty to prevent the ferociously hot gases passing through the booster joint and burning the O rings. This has the feel of an obsolescence problem ... basically, the putty used for the first nine (successful) shuttle missions was manufactured by Fuller O'Brien and (gasp of horror) contained asbestos. Fuller O'Brien stopped manufacturing this after the 1977 Consumer Products Safety Commission ban - bluntly, they were scared of possible litigation and weren't too keen on the bad reputation which was gradually enveloping every asbestos-bearing product in the States.

The alternative putty selected by NASA designers was manufactured by Randolph Products. NASA engineers had to use some form of putty, as they had run out of the Fuller O'Brien stuff. Result: explosion of the two Titan rockets it was used for. (As a side note, it could be mentioned that this putty also contained asbestos). But, more seriously, this putty was not providing an adequate thermal barrier 100% of the time. To take a more tangible approach, the Fuller O'Brien putty has been compared to the La Brea tar pits. Once you get it on your hands, you'll have immense difficulty getting it off - it's tenacious and very sticky, even at low temperatures. The Randolph putty isn't - it's stiff to touch. At low temperatures, it is hard and does not cling. The use of the Randolph putty was certainly a contributory factor to the Challenger explosion.

NASA estimated in 1977 that the shuttle failure rate would be 1 in 10,000 flights. Feynmann estimated in 1986 that the failure rate was more like 1 in 100 flights. It's possible that NASA was deliberately optimistic with operational estimates; in addition, the proposed commercialisation of space was government top priority. In the 1980s, though, NASA was facing a shrinking budget and was desperate for a win. This may explain - but should not excuse - their management attitude.

The scenario, I guess, was an inevitable as a Greek tragedy. The Morton Thiokol engineers had an open discussion approach, by which the managers gathered all relevant information before coming to an informed decision which was: don't launch. Larry Mulloy, top manager at the Marshall Space Flight Center, had a closed management style. That is, he stated his opinion at the onset, did not encourage member participation, did not encourage divergent opinions and did not emphasise the importance of a wise decision. His attitude was: launch. When faced with a conference call from Morton Thiokol engineers suggested that the launch shouldn't happen, his reaction was strong, aggressive and unequivocal: "My God, Thiokol, when do you want us to launch? Next April?"

Mulloy carried the day. He insisted on the launch. He should be carrying the ghosts of the seven crew members on his shoulders for the rest of his life.

However ... since the accident, NASA has pushed through many improvements. Astronauts now have individual parachutes and their own oxygen supply. Hatches can be opened from the inside. The crew can bail out in less than a minute. So everything is fine if a similar accident happens. My questions would be:

  • What is the probability of such a set of circumstances happening again?

  • Has NASA learned anything in the intervening years about good leadership and communication?

A simple "don't launch in cold weather" if the design hasn't changed would be a major factor in stopping this particular accident happening again. Don't know about other types of accident, though. I have no chance of being an astronaut but, somehow, it's dropped off the list of my favourite careers to be.

  posted by Dovya R @ 6:30 PM : discuss

Friday, March 08, 2002  
Powered By Blogger TM