Books I wish existed

Here are some technical books I’d like to read:

OpenStack Networking: A Guide for the Perplexed
OpenStack Internals
PowerShell for Linux sysadmins
Debugging Web Apps with the Chrome Developer Tools

Unfortunately, these books don’t exist.

Software Analysis

At McGill University, the computer engineering program evolved out of the electrical engineering program, so it was very EE-oriented. I was required to take four separate courses that involved (analog) circuit analysis: fundamental of electrical engineering, circuit analysis, electronic circuits I, and electronic circuits II.

I’m struggling to think of what the equivalent of “circuit analysis” would be for software engineering. To keep the problem structure the same as circuit analysis, it would be something like: Given a (simplified model of?) a computer program, for a given input, what will the program output?

It’s hard to imagine even a single course in a software engineering program dedicated entirely to this type of manual “program analysis”. And yet, reading code is so important to what we do, and remains such a difficult task.

No Country for IT

Matt Welsh suggests that systems researchers should work on an escape from configuration hell. I’ve felt some of the pain he describes while managing a small handful of servers.

Coming from a software engineering background, I would have instinctively classified the problem of misconfiguration as a software engineering problem instead of a systems one. But, really it’s an IT problem more than anything else. And therein lies the tragedy: IT is not considered a respectable area of research in the computer science academic community.

Operator fault tolerance

Because “cloud” has become such a buzzword, it’s tempting to dismiss cloud computing as nothing new. But one genuine change is the rise in software designed to work in an environment where hardware failures are expected. The classic example of this trend is the Netflix Chaos Monkey, which tests a software system by initiating random failures. The IT community calls this sort of system “highly available”, whereas the academic community prefers the term “fault tolerant”.

If you plan to deploy a system like an OpenStack cloud, you need to be aware of the failure modes of the system components (disk failures, power failures, networking issues), and ensure that your system can stay functional when these failures occur. However, when you actually deploy OpenStack on real hardware, you quickly discover that the component that is most likely to generate a fault is you, the operator. Because every installation is different, and because OpenStack has so many options, the probability of forgetting an option or specifying the incorrect value in a config file on the initial deployment is approximately one.

And while developers now design software to minimize the impact due to a hardware failure, there is no such notion of minimizing the impact due to an operator failure. This would require asking questions at development time such as: “What will happen if somebody puts ‘eth1’ instead of ‘eth0’ for public_interface in nova.conf? How would they determine what has gone wrong?”

Designing for operator fault tolerance would be a significant shift in thinking, but I would wager that the additional development effort would translate into enormous reductions in operations effort.

Payroll systems, not yet a solved problem

I’m simultaneously unsurprised and shocked about this story about how SAP failed to deliver a payroll system that could properly handle 1300 employees, after the state of California spent $50 million on system development. We’ve been building payroll systems for decades now, and I believe that SAP is the largest software company on the planet that builds these kinds of systems.

This is a useful counterweight to Bertrand Meyer’s recent blog about how most of the software we interact with on a daily basis works well. He’s right, but we must also avoid falling prey to survivorship bias.

Open-source documentation in the wild

We wrote a book in a week. If you do anything that involves OpenStack, check it out.

ESEM 2013 Industry Track CFP

The Call for Papers for the Industry Track of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2013) is out. I’m serving as chair of the industry track this year.

If you’re reading this and you work in the software development world (and especially if you’re in the Baltimore/DC area), I encourage you to submit a paper that you think would be of interest to software engineering researchers or other developers.

I have a strong suspicion that the software engineering research community doesn’t have a good sense of the kinds of problems that software developers really face. What I’d really like to do with the industry track is bring professional developers and software engineering researchers together to talk about these sorts of problems.

Also, if you’re reading this and you live in the software world, I encourage you to check out what ESEM is about, even if you’re not interested in publishing a paper. This is a conference that’s focused on empirical study and measurement. If you ask me, every software engineering conference should be focused on empirical study. Because, you know, science.

This really should be easier

I tried to purchase a full version of Windows with a downloadable ISO image so that I can run Windows on my MacBook Pro with VMWare Fusion. I failed.

Tech blog

I’ve started a second blog for documenting the little problems that I run into as I develop software.

The good ones tell you when they’re wrong

I’m very sympathetic to Jay Rosen’s critique of American , particular The View from Nowhere perspective which renders so much journalism writing sterile and context-free.

There are some journalists out there who are a pleasure to read because they write in their own voice. They don’t shy away from the subjective nature of good reporting. Instead, these journalists will actually interpret events they report on and present events within a wider context. And, when they get things wrong, they tell you. To wit:

Spencer Ackerman of Wired’s Danger Room admitting he was wrong in his earlier hagiographic coverage of David Petraeus.
David Weigel of Slate admitting he was wrong about the effect of Presidential debates on close races.
Felix Salmon of Reuters admitting he was wrong in his critique of the composition of the Goldman Sachs board of directors.

This is what real intellectual honesty looks like.