From Rasmussen to Moylan

I hadn’t heard of James Moylan until I read a story about him in the Wall Street Journal after he passed away in December, but it turns out my gaze had fallen on one his designs almost every day of my adult life. Moylan was the designer at Ford who came up with the idea of putting an arrow next to the gas tank symbol to indicate which side of the car the tank was on. It’s called the Moylan Arrow in his honor.

The Moylan Arrow put me in mind of another person we lost in 2025, the safety researcher James Reason. If you’ve heard of James Reason, it’s probably because of the Swiss cheese model of accidents that Reason proposed. But Reason made other conceptual contributions to the field of safety, such as organizational accidents and resident pathogens. The contribution that inspired this post was his model of human error described in his book Human Error. The model is technically called the Generic Error-Modeling System (GEMS), but I don’t know if anybody actually refers to it by that name. And the reason GEMS came to mind was because Reason’s model was itself built on top of another researcher’s model of human performance, Jens Rasmussen’s Skills, Rules and Knowledge (SRK) model.

Rasmussen was trying to model how skilled operators perform tasks, how they process information in order to do so, and how user interfaces like control panels could better support their work. He worked at a Danish research lab focused on atomic energy, and his previous work included designing a control room for a Danish nuclear reactor, as well as studying how technicians debugged problems in electronics circuits.

The part of the SRK model that I want to talk about here is the information processing aspect. Rasmussen draws a distinction between three different types of information, which he labels signals, signs, and symbols.

The signal is the most obvious type of visual information to absorb, where there is minimal interpretation required to make sense of the signal. Consider the example of the height of mercury in a thermometer to observe the temperature. There’s a direct mapping between the visual representation of the sensor and the underlying phenomenon in the environment – a higher level of mercury means a hotter temperature.

A sign requires some background knowledge in order to interpret the visual information, but once you have internalized that information, you will be able to very quickly to interpret its meaning sign. Traffic lights are one such example: there’s no direct physical relationship between a red-colored light and the notion of “stop”, it’s an indirect association, mediated by cultural knowledge.

A symbol requires more active cognitive work in order to make sense of. To take an example from my own domain, reading the error logs emitted by a service would be an example of a task that involves visual information processing of symbols. Interpreting log error messages are much more laborious than, say, interpreting a spike in an error rate graph.

(Note: I can’t remember exactly where I got the thermometer and traffic light examples from, but I suspect it was from A Meaning Processing Approach to Cognition by John Flach and Fred Voorhost).

From his paper, Rasmussen describes signals as representing continuous variables. That being said, I propose the Moylan arrow as a great example of a signal, even though the arrow does not represent a continuous variable. Moylan’s arrow doesn’t require background knowledge to learn how to interpret it, because there’s a direct mapping between the direction the triangle is pointing and the location of the gas tank.

Rasmussen maps these three types of information processing to three types of behavior (signals relate to skill-based behavior, signs relate to rule-based behavior, and symbols relate to knowledge-based behavior). James Reason created an error taxonomy based on these different behaviors. In Reason’s terminology, slips and lapses happen at the skill-based level, rule-based mistakes happen at the rule-based level, and knowledge-based mistakes happen at the knowledge-based level.

Rasmussen’s original SRK paper is a classic of the field. Even though it’s forty years old, because the focus is on human performance and information processing, I think it’s even more relevant today than when it was originally published: thanks to open source tools like Grafana and the various observability vendors out there, there are orders of magnitude more operator dashboards being designed today than there were back in the 1980s. While we’ve gotten much better at being able to create dashboards, I don’t think my field has advanced much at being able to create effective dashboards.

When there’s no gemba to go to

I’m finally trying to read through some Toyota-related books to get a better understanding of the lean movement. Not too long ago, I read Sheigo Shingo’s Non-Stock Production: The Shingo System of Continuous Improvement, and sitting on my bookshelf for a future read is James Womack, Daniel Jones, and Daniels Roos’s The Machine That Changed the World: The Story of Lean Production.

The Toyota-themed book I’m currently reading is Mike Rother’s Toyota Kata: Managing People for Improvement, Adaptiveness and Superior Results. Rother often uses the phrase “go and see”, as in “go to the shop floor and observe how the work is actually being done”. I’ve often heard lean advocates use a similar phrase: go the gemba, although Rother himself doesn’t use it in his book. There’s a good overview at the Lean Enterprise Institute’s web page for gemba:

Gemba (現場) is the Japanese term for “actual place,” often used for the shop floor or any place where value-creating work actually occurs. It is also spelled genba. Lean Thinkers use it to mean the place where value is created. Japanese companies often supplement gemba with the related term “genchi gembutsu” — essentially “go and see” — to stress the importance of empiricism.

The idea of focusing on understanding work-as-done is a good one. Unfortunately, in software development in particular, and knowledge work in general, the place that the work gets done is distributed: it happens wherever the employees are sitting in front of their computers. There’s no single place, no shop floor, no gemba that you can go to in order to go and see the work being done.

Now, you can observe the effects of the work, whether it’s artifacts generated (pull requests, docs), or communication (slack messages, emails). And you can talk to people about the work that they do. But, it’s not like going to the shop floor. There is no shop floor.

And it’s precisely because we can’t go to the gemba that incident analysis can bring so much value, because it allows you to essentially conduct a miniature research project to try to achieve the same goal. You get granted some time (a scarce resource!) to reconstruct what happened, by talking to people and looking at those work products generated over time. If we’re good at this, and we’re lucky, we can get a window into how the real work happens.

The value of social science research

One of the benefits of basic scientific research is the potential for bringing about future breakthroughs. Fundamental research in the physical and biological sciences might one day lead to things like new sources of power, radically better construction materials, remarkable new medical treatments. Social scientific research holds no such promise. Work done in the social sciences is never going to yield, say, a new vaccine, or something akin to a transistor.

The statistician (and social science researcher) Andrew Gelman goes so far as to say, literally, that the social sciences are useless. But I’m being unfair to Gelman by quoting him selectively. He actually advocates for social science, not against it. Gelman argues that good social science is important, because we can’t actually avoid social science, and if we don’t have good social science research then we will be governed by bad social “science”. The full title of his blog post is The social sciences are useless. So why do we study them? Here’s a good reason.

We study the social sciences because they help us understand the social world and because, whatever we do, people will engage in social-science reasoning.
Andrew Gelman

I was reminded of this at the recent Learning from Incidents in Software conference when listening to a talk by Dr. Ivan Pupulidy titled Moving Gracefully from Compliance to Learning, the Beginning of Forest Service’s Learning Journey. Pupulidy is a safety researcher who worked at the U.S. Forest Service and is now a professor at University of Alabama, Birmingham.

In particular, it was this slide from Pupulidy’s talk that struck me right between the eyes.

Thanks to work done by Ivan Pupulidy, the Forest Service doesn’t look at incidents this way anymore

The text in yellow captures what you might call the “old view” of safety, where accidents are caused by people not following the rules properly. It is a great example of what I would call bad social science, which invariably leads to bad outcomes.

The dangers of bad social science have been known for a while. Here’s the economist John Maynard Keynes:

The ideas of economists and political philosophers, both when they are right and when they are wrong, are more powerful than is commonly understood. Indeed the world is ruled by little else. Practical men, who believe themselves to be quite exempt from any intellectual influence, are usually the slaves of some defunct economist.
John Maynard Keynes, The General Theory of Employment, Interest and Money (February 1936)

Earlier and more succinctly, here’s the journalist H.L. Mencken:

Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
H.L. Mencken, “The Divine Afflatus” in New York Evening Mail (16 November 1917)

As Mencken notes, we can’t get away from explanations. But if we do good social science research, we can get better explanations. These explanations often won’t be neat, and may not even be plausible on first glance. But they at least give us a fighting chance at not being (completely) wrong about the social world.

Plus c’est la même chose, plus ça change

I’m re-reading a David Woods’s paper titled the theory of graceful extensibility: basic rules that govern adaptive systems. The paper proposes a theory to explain how certain types of systems are able to adapt over and over again to changes in their environment. He calls this phenomenon sustained adaptability, which we contrasts with systems that can initially adapt to an environment but later collapse when some feature of the environment changes and they fail to adapt to the new change.

Woods outlines six requirements that any explanatory theory of sustained adaptability must have. Here’s the fourth one (emphasis in the original):

Fourth, a candidate theory needs to provide a positive means for a unit at any scale to adjust how it adapts in the pursuit of improved fitness (how it is well matched to its environment), as changes and challenges continue apace. And this capability must be centered on the limits and perspective of that unit at that scale.

The phrase “adjust how it adapts” really struck me. Since adaptation is a type of change, this is referring to a second-order change process: these adaptive units have the ability to change the mechanism by which they change themselves! This notion reminded me of Chris Argyris’s idea of double-loop learning.

Woods’s goal is to determine what properties a system must have, what type of architecture it needs, in order to achieve this second-order change process. He outlines in the paper that any such system must be a layered architecture of units that can adapt themselves and coordinate with each other, which he calls a tangled, layered network.

Woods believes there are properties that are fundamental to systems that exhibit sustained adaptability, which implies that these fundamental properties don’t change! A tangled, layered network may reconfigure itself in all sorts of different ways over time, but it must still be tangled and layered (and maintain other properties as well).

The more such systems stay the same, the more they change.

The disaster meeting

This post is mostly an excerpt from the book Designing Engineers by Louis Bucciarelli. This book describes Bucciarelli’s observational study of engineers doing design work at three different engineering companies.

At some point, I’ll write a proper review of the book, but I wanted to highlight a specific passage, a meeting among engineers working to solve a specific problem.

The engineers attending this meeting work at a company that sells photograph processing machines. The company is planning on releasing a new product (“Atlas”) in a few months, but there’s a problem with the design: a phenomenon that they call dropout. Dropout happens when parts of the image that are barely visible end up not getting printed on the paper. The problem can be hard to notice unless someone looks very closely at the photo, but it’s enough of an issue that they are putting in resources to solving it.

This meeting is being led by Sergio, the engineer leading the effort to solve the dropout problem. Before this meeting, he identified fourteen potential solutions to the dropout problems. He’s called this engineering meeting in order to apply a structured decision-making process (the Pugh Method) to help him narrow down this list to the most promising-sounding solutions.

The meeting does not go as the organizer hoped. The transcript is long-ish, but worth reading in full. You might even find it familiar.

Sergio: OK. Let’s start. You all got this. [He holds up a description of Pugh methodology]. I sent it around last Thursday. It pretty much says what we’re going to try to do, except I’m going to make a few changes. You’ll see as we go along. The basic idea today is that we want to first set up some criteria to judge. Then we compare how the fourteen go, compare them against these criteria. By the end of this morning I’d like to have narrowed things down, not to one option, but to three, say, something we can get going on. Yeah, Harold.

Harold: It says in this method that we ought to pick a baseline option to compare against. How are we going to do that? It seems to me any one of the fourteen would be as good or bad, for that matter, as any of the others.

Sergio: I thought about that, and here is what I propose. Let’s pick the option we know best, OK? Say the QWP. We know how that works, and other than that it probably won’t fit in the space we have to play with, it still can be our reference. But first we have to set up some criteria. So, let me get this chart around here.

Hans: Obviously we need a criterion, something like “Gets the job done” or “Eliminates dropout.”

Sergio: Yeah, that’s got to be one. The thing has got to work, to solve the problem. How did you state it?

Marco: What do we mean when we go and claim that, say, the QWEP eliminates the dropout? I mean, all of those up there have a chance of doing the job.

Sergio: I know. But we score, not with numbers but say three, four marks—better than the baseline, say the QWP. This is where the baseline comes in. Second would be neutral—no better, no worse than the QWP—and third would be negative; that is, we think it won’t be as good as what we know works now.

Marco: Yeah, but some of these options I think might work as good, even better on some papers but probably won’t work at all on others. How do you grade it then?

Sergio: What do you mean? Give me a more specific example.

Marco: I mean like with the air knife. It might work with Z-weight paper, but with the heavier M-weight I don’t think it will work.

Hans: Why not make that another criterion: “Works with all papers.“

Sergio: Or “Sensitivity to paper.” Sort of pull that out from under “Does the job.”

Marco: You mean that there are some options that will do the job, but some of those won’t be able to handle the heavy paper?

Sergio: Yeah, that’s one way to look at it. “Does the job” is our best guess that the thing will work, but we give paper type a separate category. We may want to say something else has to be done to handle the heavy paper; that becomes another problem.

Fritz: How do we know whether paper type is critical for the air knife? It seems to me we don’t really know what the problem is. How can we compare options when we don’t know what is causing the problem?

Marco: Fritz, that’s a good point. Do we really know enough to—

Sergio: We know we have dropout on Atlas. We know that the QWP gives good results. We have a pretty good idea of what consistency it takes to give good print—print that a trained eye can’t find a hole in. (With a magnifying glass, you still see some.)

Fritz: Yes, but we can know, and should know, a lot more before we go judging these proposals on whether or not they will solve the problem. If this place hadn’t cut back on its chemistry research, we might have a chance of knowing what the hell is going on, not just with Atlas but we had it on Mars as well.

Sergio: Look, some things are beyond our control. We have no power over the powers-that-be. We don’t have a chemistry group working on this problem to call up and say “Get over here and help us evaluate these options.” We’ve got to go with what we have. Atlas is due to go out onto the streets in seven months.

Fritz: That’s the way it always goes around her. Someone wants your solutions yesterday.

Sergio: OK. So we have “Eliminates dropout” and “Sensitivity to paper.” What are some others?

Hans: Cost.

Marco: Have you guys thought about some kind of chemical pretreatment… different papers?

Sergio: Cost. Let’s think about that. Is cost really that important? Leonard says he doesn’t see cost as really significant unless it really is some huge sum. But I don’t see how we will ever get to that point. And Atlas—

Harold: Yeah, I don’t see how unit cost can be that great. We’re not going to be able to fool around much inside Atlas at this late date.

Marco: We ought to think about what we can do without going inside.

Hans: On the other hand, if we do convince them that they have to move the paper feed, say, it is going to get costly.

Harold: In terms of engineering change but not in terms of unit costs. We still aren’t going to go in there with some exotic machinery. All those options, except maybe the E&M device, are just bending metal, cams, gears… mechanical stuff, nothing fancy.

George: We might have a problem holding tolerances. Machining can get expensive. We ask too much of my people, even with the mechanical parts.

Sergio: Maybe we make that another category, another criterion: “Engineering change,” “Extent of engineering change.”

Harold: What you really want to say is something like “Compatible with existing product.” Like the QWP we know will work fine. It does in Mars, but we know it will be extremely hard to fit in Atlas, so… Or the E&M that’s going to require a power supply, right?

Fritz: But the QWP is our reference. That’s not a good example. And that’s not a good example. And, for that matter, what good is the criterion if we know the QWP won’t fit? If that’s the case, won’t all the options be scored a plus, all the same?

Sergio: Good point, good point. But I see some that will be just as hard to retrofit—for example, the cam with a solenoid. Solenoids aren’t any miniature electronic device. They’ve got to have room, especially with the forces and reaction times we’re going to be demanding.

Hans: And the air knife requires a plenum, or the E&M—Marco, was it you who said they will need a power supply?

Sergio: Fritz, you have a good point, but let’s put it up there for now. There won’t be maybe any negatives there, but still… OK? How did you say it?

Harold: “Compatible with existing product” or maybe we ought to say “products,” with Leonard in mind.

Sergio: Yeah, got it.

Fritz: That brings up another thing. Who are we making this design for? Leonard out in Colorado and Atlas are not in sync. Atlas is well along, they’re getting into the panic mode now. But Leonard has more time, another year at least, right?

Sergio: I spoke to Leonard yesterday, and even though he has another year past Atlas, he wants to se a solution to what he thinks is his dropout problem well before that. He doesn’t want to go the panic route.

Fritz: But we still have more time with him. And shouldn’t we be thinking about the long term?

Sergio: We can’t afford to do too much of that. I’ve got the higher-ups breathing down my neck to get something going here. That makes me think fo another criterion: How well can we meet a schedule? Let’s say “Ease of schedule.”

George: How about “Pain and suffering”? [Laughter]

Sergio: No, we want to be positive about this.

Marco: Yeah, so we can mark them down. [Laughter]

Fritz: That’s why we chose the QWP as a baseline. He knows that can’t possibly fit here.

Sergio: Come on guys. That’s not true. Let’s get serious. We want to get out of here by lunchtime. Jeez, is it already 10:30?

Hans: I’ve got 10:40.

Sergio: OK. So far we’ve got—

Harold: I think we’re missing a big one. You all know how difficult it is to keep the QWP clean. Anything mechanical you add in there is going to collect sludge. Some of those, like the cam, are going to have. areal problem there with that—keeping clean.

Sergio: Good. That’s another good one. The guys in Service are not going to like it if they get called out every week.

Marco: Does that figure into the cost, the cost of servicing? Do we need a separate category?

Sergio: I think we ought to break that one out, just like we did with the paper. That’s something we are liable not to think of—what it takes to maintain the fix in the field. So let’s add—

Fritz: We don’t even know if it will work.

Sergio: We got some interesting results yesterday with a mock-up. I think it looks promising.

Fritz: But still, it’s got a long way to go. That’s what I mean. We don’t really know. if it will work, and I, at least, can’t make a good judgment even though you may be able to, because I don’t think we understand enough about the problem!

Marco: I’m with Fritz on that. I don’t think. we have enough information about these different options. I’m finding it hard to do. this method, and I think the reason is because we don’t really understand the problem.

Sergio: How much do we need to know? I admit that the E&M is a long shot, that we’ve got to get it going, that it will take a longer time to evaluate than, say, the cam concepts, and we’ve been promised a machine for next week. When we get the hardware, we can do both, evaluate the E&Ms and, in the process, get a firmer grip on what is the problem. But we don’t have all year. Jeez, it’s 11:00. We don’t have all morning either. And besides, this is just an exercise; we are not going to pick a definite option. and go with that. We only want to narrow the field some this morning. Then we give it a hard look again, after we’ve done some work on the three, come back at it and evaluate again. In fact, I can see us running pretty far with, say, two or three options in parallel, as long as they don’t interfere. Maybe that’s another thing to consider.

Hans: Seeing what time it is, maybe we better cut off our criteria here. Serge, I think we better get to ranking.

Sergio: OK, OK. So far we’ve got ‘Does the job,” “Sensitive to paper,” “Cost,” “Compatible with existing hardware,” “Ease of schedule,” “Ease of maintenance.” Anyone think of any more?

George: How about “Ease of production?”

Marco: That’s in cost. I see that as a main factor in cost.

Fritz: Look, I think we have a problem with these criteria. I’m having a hell of a time keeping them straight, trying to fix what they might mean. Are they all to be considered as having the same priority? I still think this exercise is not useful unless we know more about what we have to do, what the problem is.

Marco: I think even then these criteria would get all mixed up. When we say “Do the job” I see costs, sludge all in that, too.

Sergio: We are always going to have that problem. Where we are now, we’ve got to move. All I want is to get us narrowed down.

Fritz: But you yourself think PT’s additional option is worth keeping. I don’t think we’re ready.

Sergio: It’s getting late. We’re not going. to get there today. That’s clear. I’ll tell you what. Can we meet again? [Grunts, groans]

Sergio: No, I promise you. In the meantime, Hans and I will go back and sort out these criteria, try to explain what we see as what they are meant to measure. At least in that way we will start on the same wavelength. I will send you that before we get together. Then we will narrow.

Marco: When? I’ve got to go out to Colorado next week for two days. Can you take that into account?

Fritz: And I’m tied up in the lab the early part of the week.

George: We’ve got a production trial scheduled sometime.

Sergio: Look, I’ll have Cheryl survey, but it might have to go another week. I’ve got to get out and back to Colorado myself sometime next week. OK? Is that it? That’s enough!

(pp. 152–156)

The Pugh technique is an appealing model in principle, but we see problems crop up as Sergio tries to apply it: the engineers work to define criteria, but the categories are slippy. They have different opinions about how to cut up the space into categories, and whether they have enough information to even evaluate these criteria.

Note how well defined the problem seems to be on first glance. It’s a specific problem (dropout) on a system that otherwise has been fully designed. Not only that, but potential solutions have already been identified! Sergio’s goal is just to narrow down the solution space so that they can explore three options instead of fourteen.

Instead of a structured process, we see a much messier interaction, one that ultimately frustrates Sergio, who used the phrase “the disaster meeting” to describe what happened. What we observe, though, is a kind of progress: a group of engineers who have different understandings of the situations trying to establish common ground, building a shared understanding so that they can work together to accomplish this task. Real engineering work is messy.

Engineering research reveals wrongdoing

The New York Times has a story today, Inside VW’s Campaign of Trickery, about how Volkswagon conspired to hide their excessive diesel emissions from California regulators.

What was fascinating to me was that the emission violations were discovered by mechanical engineering researchers at West Virginia University, Dan Carder, Hemanth Kappanna, and Marc Besch (Kappanna and Besch were graduate students at the time).

The presence of high levels of lead in Flint, Michigan drinking water was also discovered by an engineering researcher: Marc Edwards, a civil engineering professor at Virginia Tech.

It’s a reminder that regulators alone aren’t sufficient to ensure safety, and that academic engineering research can have a real impact on society.

Review: Software Developers’ Perceptions of Productivity

I wrote a paper review for It Never Work In Theory.

New paper review

I reviewed a paper at Never Work In Theory.

Paper review: Simple Testing Can Prevent Most Critical Failures

I posted a paper review at “It Will Never work In Theory”.

Good to great

A few months ago I read Good to Great, a book about the factors that led to companies making a transition from being “good” to being “great”. Collins, the author, defines “great” as companies whose stock performed at least three times better than the overall market over at least fifteen years. While the book is ostensibly about a research study, it feels packaged as a set of recommendations for executives looking to turn their good companies into great ones.

The lessons in the book sound reasonable, but here’s the thing: If Collins’s theory is correct, we should be able to identify companies that will outperform the market by a factor of three in fifteen-years time, by surveying employees to see if they meet the seven criteria outlined in the book.

It has now been almost thirteen years since the book has been published. Where are the “Good to Great” funds?