Whenever I interview someone after an incident, a question I try to always ask is “have you ever seen a failure mode like this before?“
If the engineer says, “yes”, then I will ask follow-up questions about what happened the last time they encountered something similar, and how long ago that happened. Experienced engineers’ perceptions are shaped by…well…their experiences, and learning about how they encountered a similar issue previously helps me understand how they reacted this time (e.g., why they looked in a log file for a particular error message, or why the reached out to a specific individual over Slack).
If the engineer says “no”, that tells me that the engineer was facing a novel failure mode. This is also a useful bit of context, because I want to learn how expert engineers deal with situations they haven’t previously encountered. How do they try to make sense of these signals they don’t recognize? Where do they look to gather more information? Who do they reach out to?
This is the sort of information that people are happy to share with you, but you have to ask them for it, because they’re unlikely to share it spontaneously unless you ask the right questions, because they don’t realize how relevant it is to understanding the incident.