Back when I was an engineering student, I wanted to know “How do the big companies develop software? How does it happen in the real world?”
Now that I work at a company that has to do large-scale software development, I understand better why it’s not something you can really teach effectively in a university setting. It’s not that companies doing large-scale software development are somehow better at writing software than companies that work on smaller-scale software projects. It’s that large-scale projects face challenges that small-scale projects don’t.
The biggest challenge at large-scale is coordination. My employer provides a single service, which means that, in theory, any project that anyone is working on inside of the company could potentially impact what anybody else is working on. In my specific case, I work on delivery tools, so we might be called upon to support some new delivery workflow.
You can take a top-down command-and-control style approach to the problem, by having the people at the top attempting to filter all of the information to just what they need, and them coordinating everyone hierarchically. However, this structure isn’t effective in dynamic environments: as the facts on the ground change, it takes too long for information to work its way up the hierarchy, adapt, and then change the orders downwards.
You can take a bottoms-up approach to the problem where you have a collection of teams that work autonomously. But the challenge there is getting them aligned. In theory, you hire people with good judgment, and provide them with the right context. But the problem is that there’s too much context! You can’t just firehose all of the available information to everyone, that doesn’t scale: everyone will spend all of their time reading docs. How do you get the information into the heads of the people that need it? becomes the grand challenge in this context.
It’s hard to convey the nature of this problem in a university classroom, if you’ve never worked in a setting like this before. The flurry of memos, planning documents, the misunderstandings, the sync meetings, the work towards alignment, the “One X” initiatives, these are all things that I had to experience viscerally, on a first-hand basis, to really get a sense of the nature of the problem.
3 thoughts on “Software engineering in-the-large: the coordination challenge”
The Reactive Principles (https://principles.reactive.foundation/principles/index.html) are just as applicable to the systems that design/build systems as to the systems that get built.
Accepting uncertainty, asserting autonomy, and tailoring consistency are the especially relevant principles when we talk about the system that makes the system.
That’s such a wonderful thought – distributed teams as nodes of a distributed system. Is there any author who’s explored that thought more deeply that you’d recommend?
Great insight Lorin. I’m facing into some similar coordination challenges at my company that develops large scale applications to support it’s business. Have you got any ideas on how to get the balance between control and autonomy right?