A Failure Analysis by Gabe Zichermann
The 2 recent Boeing 737 Max crashes are terrible, made only worse by the obvious mismanagement of the company and the FAA. There are plenty of people to blame here, from lax regulators to bad software developers and the greedy executives at both airlines and the airframer.
Underlying this tragedy is an inescapable conclusion: Boeing is bad at failing. Instead of killing bad ideas when they should have, the company ended up killing people. Let’s look at the anatomy of this failure, based on what we know so far.
A Series of Bad Decisions
The 737 Max had problems almost from the beginning. If you read pilot and avgeek enthusiast forums the way I do (which is way too often) you would know that the engineering and design decisions underlying the Max were fraught from the jump.
Many seasoned experts pushed Boeing to develop an entirely new plane, given that the original 737 design was over 40 years old, but the company resisted. There were a few reasons they pushed back on this:
- Airlines wanted the plane now, and they wanted to pay as little as possible.
- Airlines wanted to avoid retraining pilots.
- The regulations were lax and oversight was almost non-existent.
- Their competitors were growing strongly in the narrowbody category.
This all led Boeing to make a bad initial decision: to re-engine and tweak the existing frame instead of redesigning it from scratch. But in order to meet all the other criteria of the brief, they also had to do something extremely novel (and risky):
Write a piece of software to compensate for an aerodynamics issue caused by a fundamental engineering overreach.
The software implicated in the crashes (MCAS) was designed to help pilots fly the 737MAX just like a traditional 737, despite different handling characteristics. It’s like putting a new, slick interface on a 40 year old computer system.
Normally, aviation automation systems are designed to do three main things: be faster and more situationally aware than a pilot while reducing pilot workload. MCAS, conversely, is like control systems for military aircraft. Many fighter jets are not aerodynamically stable in all conditions. For example, if they need to be highly maneuverable at high speed, they are often unstable at lower speeds. In military aviation, software routinely compensates for this design choice..
In commercial aviation however, different rules apply. The basic premise has always been that the equipment should be aerodynamically stable, such that a pilot can fly it manually if needed. Designing a plane that cannot be flown safely without software is not a normal idea.
Now, in order to resolve this issue, Boeing is proposing a fix that is an indicator light to inform pilots of a condition that could cause the software to malfunction. This has now become a kafka-esque engineering exercise:
A warning light to solve for
Bad software to solve for
Pilot training to solve for
When seen from a macro level, it’s relatively obvious this strategy has simply been piling one mistake on top of another. But why?
The Failure Muscle
The profit motive alone doesn’t explain these terrible decisions. Boeing is one of the greatest engineering firms in the history of humanity. Their ability to design complex machines – and design them well – is unparalleled. And despite certain tech setbacks in the past, it has been a really long time since the company released a product that was so fundamentally flawed.
I believe their “project kill” process simply isn’t strong enough. Had it been part of Boeing’s culture to admit failure, cancel projects and learn from those mistakes, these (entirely avoidable) tragedies might not have happened. But like most big companies, I suspect the fear of failure and subsequent punishment was greater than their fear of risking the lives of customers.
Of course Boeing has safety and design approval processes in place, but they may not be adapted well to this scenario.
The typical schema over the last 50 years in airframe design is to throw around lots of crazy ideas (see Sonic Cruiser), and then settle on an advanced version of a current idea (see 787). This kind of incrementalism has historically bred a culture that’s very risk averse and one that front-loads all the risk “assessment”. In front-loaded risk cultures, once a project is underway there is great resistance to killing it. After all, every possible angle was already considered, commitments have been made, and changes would be hugely expensive.
But that is precisely the reason why companies must be better at admitting failure. The longer you ignore, try to bury or attempt to compensate for a poor decision, the bigger and badder it’s going to get. Building an organizational failosophy and the muscles to execute a kill is essential to creating value in the long term. Organizations that understand this dynamic well, such as Alphabet, know that there is never a bad time to kill a project if it’s failing. The sunk cost fallacy isn’t a factor, and they know that decisive action now will save money / create more value in the long run.
Many people inside Boeing are going to come forward with stories of how they tried to warn the company and customers. But these hindsight-based recriminations don’t change a corporate culture that cannot admit failure and shies away from making tough decisions once projects are underway.
Since the grounding, Boeing’s market cap has been cut by over $25Billion. That’s just a bit more than the high-end estimate for designing a new plane from scratch. The obvious question is: was it worth it?
Boeing will recover, and aviation will go back to normal – safer, more reliable and more efficient than ever before. But the blowback of Boeing’s inability to fail well will continue to haunt the company and its customers for a long time to come.
Embracing failure really is the safer option.