I’ve been enjoying the ongoing MIT STAMP workshop. In particular, I’ve been enjoying listening to Nancy Leveson talk about system safety. Leveson is a giant in the safety research community (and, incidentally, an author of my favorite software engineering study). She’s also a critic of root cause and human error as explanations for accidents. Despite this, she has a different perspective on safety than many in the resilience engineering community. To sharpen my thinking, I’m going to capture my understanding of this difference in this post below.
From Leveson’s perspective, the engineering design should ensure that the system is safe. More specifically, the design should contain controls that eliminate or mitigate hazards. In this view, accidents are invariably attributable to design errors: a hazard in the system was not effectively controlled in the design.
By contrast, many in the resilience engineering community claim that design alone cannot ensure that the system is safe. The idea here is that the system design will always be incomplete, and the human operators must adapt their local work to make up for the gaps in the designed system. These adaptations usually contribute to safety, and sometimes contribute to incidents, and in post-incident investigations we often only notice the latter case.
These perspectives are quite different. Leveson believes that depending on human adaptation in the system is itself dangerous. If we’re depending on human adaptation to achieve system safety, then the design engineers have not done their jobs properly in controlling hazards. The resilience engineering folks believe that depending on human adaptation is inevitable, because of the messy nature of complex systems.