I’ve been trying to take a break from Twitter lately, but today I popped back in, only to be trolled by a colleague of mine:
Here’s a quote from the story:
The source of the problem was reportedly a single engineer who made a small mistake with a file transfer.
Here’s what I’d like you to ponder, dear reader. Think about all of the small mistakes that happen every single day, at every workplace, on the face of the planet. If a small mistake was sufficient to take down a complex system, then our systems would be crashing all of the time. And, yet, that clearly isn’t the case. For example, before this event, when was the last time the FAA suffered a catastrophic outage?
Now, it might be the case that no FAA employees have ever made a small mistake until now. Or, more likely, the FAA system works in such a way that small mistakes are not typically sufficient for taking down the entire system.
To understand this failure mode, you need to understand how it is that the FAA system is able to stay up and running on a day-to-day basis, despite the system being populated by fallible human beings who are capable of making small mistakes. You need to understand how the system actually works in order to make sense of how a large-scale failure can happen.
Now, I’ve never worked in the aviation industry, and consequently I don’t have domain knowledge about the FAA system. But I can tell you one thing: a small mistake with a file transfer is a hopelessly incomplete explanation for how the FAA system actually failed.
2 thoughts on “A small mistake does not a complex systems failure make”
Looks like “make” in the title of the post is in the wrong place.
Other than that it reminds me of what a senior engineer constantly told me.
“You’d be amazed how much things work when they are just left alone.”
I’d be more interested in WHY an engineer is transferring files in the first place. That is where the problem lies.