The British General’s problem

Half a league, half a league,
Half a league onward,
All in the valley of Death
Rode the six hundred.
“Forward, the Light Brigade!
Charge for the guns!” he said:
Into the valley of Death
Rode the six hundred.

“Forward, the Light Brigade!”
Was there a man dismay’d?
Not tho’ the soldier knew
Some one had blunder’d:
Theirs not to make reply,
Theirs not to reason why,
Theirs but to do and die:
Into the valley of Death
Rode the six hundred.

Cannon to right of them,
Cannon to left of them,
Cannon in front of them
Volley’d and thunder’d;
Storm’d at with shot and shell,
Boldly they rode and well,
Into the jaws of Death,
Into the mouth of Hell
Rode the six hundred.

Flash’d all their sabres bare,
Flash’d as they turn’d in air
Sabring the gunners there,
Charging an army, while
All the world wonder’d:
Plunged in the battery-smoke
Right thro’ the line they broke;
Cossack and Russian
Reel’d from the sabre-stroke
Shatter’d and sunder’d.
Then they rode back, but not
Not the six hundred.

Cannon to right of them,
Cannon to left of them,
Cannon behind them
Volley’d and thunder’d;
Storm’d at with shot and shell,
While horse and hero fell,
They that had fought so well
Came thro’ the jaws of Death,
Back from the mouth of Hell,
All that was left of them,
Left of six hundred.

When can their glory fade?
O the wild charge they made!
All the world wonder’d.
Honor the charge they made!
Honor the Light Brigade,
Noble six hundred!

The Charge of the Light Brigade, Alfred Lord Tennyson

One of the challenges of building distributed systems is that communications channels are unreliable. This challenge is captured in what is known as the Two Generals’ Problem. It goes something like this:

Two generals from the same army are camped at opposite sides of a valley. The enemy troops are stationed in the valley. The generals need to coordinate their attack in order to succeed. Neither general will risk their troops unless they know for certain that the other general will attack at the same time.

They only mechanism the generals have to communicate is by sending messages via carrier pigeon that fly over the valley. The problem is that enemy archers in the valley are sometimes able to shoot down these carrier pigeons, and so a general sending a message has no guarantee that it will be received by the other general.

Two generals at opposing sides of a valley
One general sends a message to another across the valley via carrier pigeon

The first general sends the message: Let’s attack Monday at dawn, and waits for an acknowledgment. If he doesn’t get an acknowledgment, he can’t be certain the message was received, and so he can’t attack. He receives an acknowledgment in return. But now he thinks, “What if the other general doesn’t know I’ve received the acknowledgment? He knows I won’t carry out the attack unless I know for certain that he agrees to the plan, and he doesn’t know whether I’ve received an acknowledgment or not. I’d better send a message acknowledging the message.”

This becomes an infinite regress problem: it turns out that no number of acknowledgments and acknowledgment-of-acknowledgments that can ensure that the two generals have common knowledge (“I know that he knows that I know that he knows…”) when communicating over an unreliable channel.

One implicit assumption of the Two Generals’ Problem is that when a message is received, it is understood perfectly by the recipient. In the world of real systems, understanding the intent of a message is often a challenge. I’m going to call this the British General’s Problem, in honor of General Raglan.

Raglan was a British general during the Crimean War. The messages he sent to Lord Lucan at the front were misinterpreted, and these misinterpretations contributed to a disastrous attack by British Light Calvary against fortified Russian troops, famously memorialized in Alfred Lord Tennyson’s poem at the top of this post.

Stick figure holding a doc and confused by it
Lord Lucan is confused by General Raglan’s message

Tim Harford tells the story of the misunderstanding, and the different personalities involved in the affair, in the wonderful episode of the Cautionary Tales podcast The Curse of Knowledge meets the Valley of Death.

This story is a classic example of a common ground breakdown, a phenomenon described in the famous paper Common Ground and Coordination in Joint Activity by Gary Klein, Paul Feltovich, Jeffrey Bradshaw and David Woods. In the paper, the authors describe how common ground is necessary for people to coordinate effectively to achieve a shared goal. A common ground breakdown happens when a misunderstanding occurs among the people coordinating.

If you do operations work, and you’ve been involved in remediating an incident while communicating over Slack, you’ve experience the challenge of maintaining common ground. Because common ground breakdowns are common, coordination requires ongoing effort to repair this common ground. This is the topic of Laura Maguire’s PhD thesis: Controlling the Costs of Coordination in Large-scale Distributed Software Systems (which, I must admit, I haven’t yet read).

The British General’s problem is a reminder of the challenges we inevitably face in socio-technical systems. The real problem is not unreliable channels: it’s building and maintaining the shared understanding in order to get the coordination work done.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s