Just listening to this experience was so powerful. It taught me to challenge the myth of “commercial pressure”. We tend to think that every organizational problem is the result of cost-cutting. Yet, the cost of a new drain pump was only $90… [that’s] nothing when you are running a ship. As it turned out, the purchase order had landed in the wrong department.Nippin Anand, Deep listening — a personal journey
Whenever we read about a public incident, a common pattern in the reporting is that the organization under-invested in some area, and that can explain why the incident happened. “If only the execs hadn’t been so greedy”, we think, “if they had actually invested some additional resources, this wouldn’t have happened!”
What was interesting as well around this time is that when the chemical safety board looked at this in depth, they found a bunch of BP organizational change. There were a whole series of reorganizations of this facility over the past few years that had basically disrupted the understandings of who was responsible for safety and what was that responsibility. And instead of safety being some sort of function here, it became abstracted away into the organization somewhere. A lot of these conditions, this chained-closed outlet here, the failure of the sensors, problems with the operators and so on… all later on seemed to be somehow brought on by the rapid rate of organizational change. And that safety had somehow been lost in the process.Richard Cook, Process tracing, Texas City BP explosion, 2005, Lectures on the study of cognitive work
Production pressure is an ever-present risk. his is what David Woods calls faster/better/cheaper pressure, a nod to NASA policy. In fact, if you follow the link to the Richard Cook lecture, right after talking about the BP reorgs, he discusses the role of production pressure in the Texas City BP explosion. However, production pressure is never the whole story.
Remember the Equifax breach that happened a few years ago? Here’s a brief timeline I extracted from the report:
- Day 0: (3/7/17) Apache Struts vulnerability CVE-2017-5638 is publicly announced
- Day 1: (3/8/17) US-CERT sends an alert to Equifax about the vulnerability
- Day 2: (3/9/17) Equifax’s Global Threat and Vulnerability Management (GTVM) team posts to an internal mailing list about the vulnerability and requests that app owners should patch within 48 hours
- Day 37: (4/13/17) Attackers exploit the vulnerability in the ACIS app
The Equifax vulnerability management team sent out a notification about the Struts vulnerability a day after they received notice about it. But, as in the two cases above, something got lost in the system. What I wondered reading the report was: How did that notification get lost? Were the engineers who operate the ACIS app not on that mailing list? Did they receive the email, but something kept them from acting on it? Perhaps there was nobody responsible for security patches of the app at the time the notification went out? Maddeningly, the report doesn’t say. After reading that report, I still feel like I don’t understand what it was about the system that enabled the notification to get lost.
It’s so easy to explain an incident by describing how management could have prevented it from investing additional resources. This is what Nippin Anand calls the myth of commercial pressure. It’s all too satisfying for us to identify short-sighted management decisions as the reason that an incident happened.
I’m calling this tendency the greedy exec trap, because once we identify the cause of an incident as greedy executives, we stop asking questions. We already know why the incident happened, so what more do we need to look into? What else is there to learn, really?