The perils of outcome-based analysis

Imagine you wanted to understand how to get better at playing the lottery. You strike upon a research approach: study previous lottery winners! You collect a list of winners, look them up, interview them about how they go about choosing their numbers, collate this data, identify patterns, and use these to define strategies for picking numbers.

The problem with this approach is that it doesn’t tell you anything about how effective these strategies actually are. To really know how well these strategies work, you’d have to look at the entire population of people who employed them. For example, say that you find that most lottery winners use their birthdays to generate winning numbers. It may turn out, that for every winning ticket that has the ticket holder’s birthday, there are 20 million losing tickets that also have the ticket holder’s birthday. To understand a strategy’s effectiveness, you can’t just look at the winning outcomes: you have to look at the losing outcomes as well. The technical term for this type of analytic error is selecting on the dependent variable.

Here’s another example of this error in reasoning: according to the NHTSA, 32% of all traffic crash fatalities in the United States involve drunk drivers. That means that 68% of all traffic crash fatalities involve sober drivers. If you only look at scenarios that involve crash fatalities, it looks like being sober is twice as dangerous as being drunk! It’s a case of only looking at the dependent variable: crash fatalities. If we were to look at all driving scenarios, we’d see that there are a lot more sober drivers than drunk drivers, and that any given sober driver is less likely to get into a crash fatality than a given drunk driver. Being sober is safer, even though sober drivers appear more often in fatal accidents than drunk drivers.

Now, imagine an organization that holds a weekly lottery. But it’s a bizarro-world type of lottery: if someone wins, then they receive a bad outcome instead of a good one. And the bad outcome doesn’t just impact the “winner” (although they are impacted the most), it has negative consequences for the entire organization. Nobody would willingly participate in such a lottery, but everyone in the organization is required to: you can’t opt out. Every week, you have to buy a ticket, and hope the numbers you picked don’t come up.

The organization wants to avoid these negative outcomes, and so they try to identify patterns in how previous lottery “winners” picked their numbers, so that they can reduce the likelihood of future lottery wins by warning people against using these dangerous number-picking strategies.

At this point, the comparison to how we treat incidents should be obvious. If we only examine people’s actions in the wake of an incident, and not when things go well, then we fall into the trap of selecting on the dependent variable.

The real-world case is even worse than the lottery case: lotteries really are random, but that way that people do their work isn’t; rather, it’s adaptive. People do work in specific ways because they have found that it’s an effective way to get stuff done given that the constraints that they are under. The only way to really understand why people work the way they do is to understand how those adaptations usually succeed. Unless you’re really looking for it, you aren’t going to be able to learn how people develop successful adaptations if you only ever examine the adaptations when they fail. Otherwise, you’re just doing the moral equivalent of asking what lottery winners have in common.

One thought on “The perils of outcome-based analysis

Leave a comment