Error-Capturing Strategies and System Design
Building error tolerance, error mitigation, and error-capturing strategies into our systems should be one of the most important outputs of safety management. These strategies are a hallmark of good system design. And when system design flaws are discovered, risk controls are implemented. Monitoring of the affected process then helps us see if our strategies are effective. If not, our previously-implemented controls must be corrected to keep a hazard in check.
Way back in the 80’s I was serving as DO for Salair Air Cargo, operating a fleet of 10 old DC-3s and 30 young pilots. We landed 4 new contracts for Emery Worldwide, and suddenly saw a rash of events in which our DC-3 crews were failing to remove the gear pins before flight.
Why did this trend appear after thousands of hours of operation? What changed in our operating environment? Back then we didn’t investigate contributing and causal factors of minor events (we should have), and we didn’t do any kind of risk assessment. And although each of these events resulted in nothing more than a return for landing and a MISR under FAR 135.417, we recognized the hazard and its associated risk. Clearly, the undesired flight state of takeoff with the gear pins installed could become a contributing factor in an accident; the DC-3 won’t even begin to meet single-engine climb performance requirements on one engine with the gear down.
So we published a memo. We put bright new red flags on the gear pins. We put ‘Gear Pins… Aboard’ on the after-start checklist (they were already on the pre-start). We made the PIC explicitly accountable, and we changed the procedure to have the pins hung from inside the cargo door, so both crewmembers could see them. Today we refer to a multi-faceted corrective action plan as ‘defenses in depth’; but our plan was ineffective.
Sometimes ‘system design’ can actually induce errors. Since 1992, there have been more than 40 events involving the loss of Airbus A320 fan cowl doors (see Airbus A320 newsletter, page 10). These losses occurred in part because the cowlings appear to be closed when they are unlatched, and the latches are hard to see due to low ground clearance. Poor system design? Let’s just say it could be improved upon.
A320 operators have been managing this hazard much like we did with our gear pins, using warning decals and procedural controls (such as alert bulletins, training, visual inspections, and even a logbook entry each time the cowls are opened). But procedural controls are often unreliable, since they rely on human behavior to execute.
Airbus now has an improved design: A new fan cowl latch modification (at the forward position) with a key and a “remove before flight” flag attached to the key (2 keys with flag per aircraft). The key is required to open the new latch and the key cannot be removed until the latch is closed and secured. Airworthiness Directives from EASA and FAA are expected to follow. This error-capturing strategy doesn’t appear to have any substitute risk, but someone is going to have to keep track of the keys, and I’ll bet they will be misplaced more than once. So operators will probably end up keeping spare keys, preferably on the aircraft.
Looking back 30 years earlier, it wasn’t practical from a cost-benefit perspective to modify our DC-3s in order to solve our similar gear pin problem; all we had to work with were procedural controls. So we changed the procedure to stow gear pins in a cockpit receptacle that was visible to both pilots at their duty stations, and we placed ‘Gear Pins… Aboard’ on the taxi checklist, with visual confirmation required by both pilots.
Did we prevent our crews from being distracted during their preflight inspection, and forgetting the pins? No! But thereafter they did on more than one occasion catch their error before takeoff. This is a great example of an error-capturing strategy at work.