No Country for IT

Matt Welsh suggests that systems researchers should work on an escape from configuration hell. I’ve felt some of the pain he describes while managing a small handful of servers.

Coming from a software engineering background, I would have instinctively classified the problem of misconfiguration as a software engineering problem instead of a systems one. But, really it’s an IT problem more than anything else. And therein lies the tragedy:  IT is not considered a respectable area of research in the computer science academic community.

Operator fault tolerance

Because “cloud” has become such a buzzword, it’s tempting to dismiss cloud computing as nothing new. But one genuine change is the rise in software designed to work in an environment where hardware failures are expected. The classic example of this trend is the Netflix Chaos Monkey, which tests a software system by initiating random failures. The IT community calls this sort of system “highly available”, whereas the academic community prefers the term “fault tolerant”.

If you plan to deploy a system like an OpenStack cloud, you need to be aware of the failure modes of the system components (disk failures, power failures, networking issues), and ensure that your system can stay functional when these failures occur. However, when you actually deploy OpenStack on real hardware, you quickly discover that the component that is most likely to generate a fault is you, the operator. Because every installation is different, and because OpenStack has so many options, the probability of forgetting an option or specifying the incorrect value in a config file on the initial deployment is approximately one.

And while developers now design software to minimize the impact due to a hardware failure, there is no such notion of minimizing the impact due to an operator failure. This would require asking questions at development time such as: “What will happen if somebody puts ‘eth1’ instead of ‘eth0’ for public_interface in nova.conf? How would they determine what has gone wrong?”

Designing for operator fault tolerance would be a significant shift in thinking, but I would wager that the additional development effort would translate into enormous reductions in operations effort.

Payroll systems, not yet a solved problem

I’m simultaneously unsurprised and shocked about this story about how SAP failed to deliver a payroll system that could properly handle 1300 employees, after the state of California spent $50 million on system development. We’ve been building payroll systems for decades now, and I believe that SAP is the largest software company on the planet that builds these kinds of systems.

This is a useful counterweight to Bertrand Meyer’s recent blog about how most of the software we interact with on a daily basis works well. He’s right, but we must also avoid falling prey to survivorship bias.

ESEM 2013 Industry Track CFP

The Call for Papers for the Industry Track of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2013) is out. I’m serving as chair of the industry track this year.

If you’re reading this and you work in the software development world (and especially if you’re in the Baltimore/DC area), I encourage you to submit a paper that you think would be of interest to software engineering researchers or other developers.

I have a strong suspicion that the software engineering research community doesn’t have a good sense of the kinds of problems that software developers really face. What I’d really like to do with the industry track is bring professional developers and software engineering researchers together to talk about these sorts of problems.

Also, if you’re reading this and you live in the software world, I encourage you to check out what ESEM is about, even if you’re not interested in publishing a paper. This is a conference that’s focused on empirical study and measurement. If you ask me, every software engineering conference should be focused on empirical study. Because, you know, science.

The good ones tell you when they’re wrong

I’m very sympathetic to Jay Rosen’s critique of American , particular The View from Nowhere perspective which renders so much journalism writing sterile and context-free. 

There are some journalists out there who are a pleasure to read because they write in their own voice. They don’t shy away from the subjective nature of good reporting.  Instead, these journalists will actually interpret events they report on and present events within a wider context. And, when they get things wrong, they tell you. To wit:

  • Spencer Ackerman of Wired’s Danger Room admitting he was wrong in his earlier hagiographic coverage of David Petraeus.
  • David Weigel of Slate admitting he was wrong about the effect of Presidential debates on close races.
  • Felix Salmon of Reuters admitting he was wrong in his critique of the composition of the Goldman Sachs board of directors.

This is what real intellectual honesty looks like.

Line by line

Through iTunes University , I’m following along in the lectures of a Yale course on modern American literature, authors like Hemingway, Faulkner and Fitzgerald. The professor talks about three registers of analysis: the macro, middle, and micro registers. At the micro register, the focus of the analysis is on things like the role of sensory information such as smell or sound. At the middle register, the focus of the analysis is on how authors of the time would experiment with narrative structure, such as the non-linear approach that Faulkner uses in The Sound and the Fury. At the macro register, the focus is on the larger historical context of the books. It’s only at the micro-level that you can do analysis by examining individual sentences. And, yet, the only way an author can write a book is to generate it by indvidual sentences.

We also talk about software at different levels of analysis, such as architecture for the higher levels, design patterns for the middle level, and lines of code at the micro level. There’s long been a yearning to be able to create new software by working at a higher level of abstraction. In today’s jargon, this is known as model-driven-development, where some kind of high-level graphical or textual model is created, and then is ultimately transformed into code. And this approach has found success in certain niches, such as Simulink, LabView, and Yahoo! Pipes.

For most applications, I suspect that the only way to write the software will continue to be the same as the only way to write novels: line by line.

Publications trump ideology

Dylan Matthews interviews Sasha Issenberg, the author of “The Victory Lab”, which is about  how political campaigns are increasingly applying social science research techniques.

It turns out that Democrat campaigns tend to apply these techniques more than Republicans, unsurprisingly, since academic researchers with knowledge of these techniques tend to lean left. However, Matthews notes that a lot of the important research in this area was done during the campaign of Republican governor Rick Perry in 2006. And why is that? According to Issenberg:

The reason Perry developed that partnership is that he made them an unusual offer, which is that they could publish their work.

Via Kevin Drum.