Estimating confidence intervals, part 2

Here is another data point from my attempt to estimate 90% confidence intervals. This plot shows my daily estimates for completing a feature I was working on.

90% confidence intervals

 

The dashed line is the “truth”: it’s what my estimate would have been if I had estimated perfectly each day. The shaded region represents my 90% confidence estimate: I was 90% confident that the amount of time left fell into that region. The solid line is the traditional pointwise effort estimate: it was my best guess as to how many days I had left before the feature would be complete.

For this feature, I significantly underestimated the effort required to complete it. For the first four days, my estimates were so off that my 90% confidence interval didn’t include the true completion time: it was only correct 60% of the time.

This plot shows the error in my estimates for each day:

 

Error in effort estimate

Apparently, I’m not yet a well-calibrated estimator. Hopefully, that will improve with further estimates.

Presentation as text

I gave a talk last week at Camp DevOps about Ansible and EC2. The talk is written in present format, which is a very lightly marked up text format, similar to Markdown. You can see the source file in a Github repo.

It was liberating to focus entirely on content and not worry too much about the exact appearance of the slide.

I also went for a minimalistic approach where I often didn’t even use titles. The slides won’t make much sense if you just look at them without me talking. Hopefully, they made some sense when I was talking in front of them.

“Who owns the fish” in Alloy

Hacker News linked to a logic puzzle with the following constraints:

There are five houses in five different colors starting from left to right. In each house lives a person of a different nationality. These owners all drink a certain type of beverage, smoke a certain brand of cigarette and keep a certain type of pet. No two owners have the same pet, smoke the same brand or drink the same beverage. The question is: WHO OWNS THE FISH??? Hints:

  1. The Brit lives in the red house
  2. The Swede keeps dogs as pets
  3. The Dane drinks tea
  4. The green house is on the left of the white house
  5. The green house’s owner drinks coffee
  6. The person who smokes Pall Mall rears birds
  7. The owner of the yellow house smokes Dunhill
  8. The man living in the centre house drinks milk
  9. The Norwegian lives in the first house
  10. The person who smokes Marlboro lives next to the one who keeps cats
  11. The person who keeps horses lives next to the person who smokes Dunhill
  12. The person who smokes Winfield drinks beer
  13. The German smokes Rothmans
  14. The Norwegian lives next to the blue house
  15. The person who smokes Marlboro has a neighbor who drinks water

Alloy is very well-suited to solving this type of problem, so I gave it a go. Here’s what my Alloy model looks like:

open util/ordering[House]

sig House {
 color: one Color
}

abstract sig Person {
 occupies: one House,
 drinks: one Beverage,
 smokes: one Cigarette,
 keeps: one Pet
}
one sig Brit, Swede, Dane, Norwegian, German extends Person {}

abstract sig Color {}
one sig White, Yellow, Blue, Red, Green extends Color {}

abstract sig Beverage {}
one sig Tea, Coffee, Milk, Beer, Water extends Beverage {}

abstract sig Pet {}
one sig Birds, Cats, Dogs, Horses, Fish extends Pet {}

abstract sig Cigarette {}
one sig PallMall, Dunhill, Marlboro, Winfield, Rothmans extends Cigarette {}

fact allRelationsAreOneToOne {
 color.~color in iden
 occupies.~occupies in iden
 drinks.~drinks in iden
 smokes.~smokes in iden
 keeps.~keeps in iden
}

pred problemConstraints {

 // The Brit lives in the red house
 Red in Brit.occupies.color

 //The Swede keeps dogs as pets
 Dogs in Swede.keeps

 // The Dane drinks tea
 Tea in Dane.drinks

 // The green house is on the left of the white house
 Green in prev[color.White].color

 // The green house's owner drinks coffee
 Coffee in (occupies.(color.Green)).drinks

 // The person who smokes Pall Mall rears birds
 Birds in (smokes.PallMall).keeps

 // The owner of the yellow house smokes Dunhill
 Dunhill in (occupies.(color.Yellow)).smokes

 // The man living in the centre house drinks milk
 (drinks.Milk).occupies in first[].next.next

 // The Norwegian lives in the first house
 Norwegian in occupies.first[]

 // The person who smokes Marlboro lives next to the one who keeps cats
 (smokes.Marlboro).occupies in (keeps.Cats).occupies.(next + prev)

 // The person who keeps horses lives next to the person who smokes Dunhill
 (keeps.Horses).occupies in (smokes.Dunhill).occupies.(next + prev)

 // The person who smokes Winfield drinks beer
 Beer in (smokes.Winfield).drinks

 // The German smokes Rothmans
 German in smokes.Rothmans

 // The Norwegian lives next to the blue house
 Blue in Norwegian.occupies.(next+prev).color

 // The person who smokes Marlboro has a neigbor who drinks water
 (drinks.Water).occupies in (smokes.Marlboro).occupies.(next+prev)

}

run problemConstraints for exactly 5 House

Alloy’s “Magic Layout” did a surprisingly good job at displaying the results. I had to manually rearrange the output so the houses would be displayed in the correct order, but otherwise no fiddling was required. Here’s what it looks like:

alloy-fish

I also put it up on Github.

Results from estimating confidence intervals

A few weeks ago, I decided to estimate 90% confidence intervals for each day that I worked on developing a feature.

Here are some results over 10 days from when I started estimating until when the feature was deployed into production.

Effort estimates

The dashed line is the “truth”: it’s what my estimate would have been if I had estimated perfectly each day. The shaded region represents my 90% confidence estimate: I was 90% confident that the amount of time left fell into that region. The solid line is the traditional pointwise effort estimate: it was my best guess as to how many days I had left before the feature would be complete.

If we subtract out the “truth” from the other lines, we can see the error in my estimate for each day:

Error in estimate

Some observations:

  • The 90% confidence interval always included the true value, which gives me hope that this an effective estimation approach.
  • My pointwise estimate underestimated the true time remaining for 9 out of 10 days.
  • My first pointwise estimate started off by a factor of two (estimate of 5 days versus an actual of 10 days), and got steadily better over time.

I generated these plots using IPython and the ggplot library. You can see my IPython notebook on my website with details on how these plots were made.

Tackling Hofstadter’s Law with confidence intervals

Hofstadter’s Law: It always takes longer than you expect, even when you take into account Hofstadter’s Law.

One of the frustrating things about developing software is that tasks always seem to take longer to complete than you expected before you started. Somehow, we’re always almost done on whatever feature we’re working on.

I’ve also long been a fan of the idea of using 90% confidence intervals instead of point estimates. Hubbard discusses this in his wonderful book How to measure anything. Instead of trying to pick how long a task will take (e.g., 4 days), you try to predict a range where you are 90% certain that the time will fall within that range (e.g., 3 – 15 days).

I’m going to put my money where my mouth is and try doing confidence interval estimates when working on a feature or bug. I ginned up a quick form using Google Forms and my aim is to fill it in each day, and then evaluate how well I can come up with 90% estimates.

Effort estimation

People don’t understand computer science

From a recent Daily Beast essay.

Certainly, it is more practical to study engineering than philosophy. The country has a high demand for engineers. America also needs doctors, computer programmers, chemists, mechanics, and janitors. Does America not also need art historians, artists, philosophers, novelists, journalists, and well-rounded, thoughtful, and intellectually independent adults?

Gore Vidal defined an intellectual as “someone who can deal with abstractions.” Does the mediocrity of the job market mean that America no longer needs people who deal with abstractions? Only someone already painfully unable to deal with abstraction would draw such a suicidal conclusion.

I’m pretty sure that a computer programmer is someone who can deal with abstractions.

Software Analysis

At McGill University, the computer engineering program evolved out of the electrical engineering program, so it was very EE-oriented. I was required to take four separate courses that involved (analog) circuit analysis: fundamental of electrical engineering, circuit analysis, electronic circuits I, and electronic circuits II.

I’m struggling to think of what the equivalent of “circuit analysis” would be for software engineering. To keep the problem structure the same as circuit analysis, it would be something like: Given a (simplified model of?) a computer program, for a given input, what will the program output?

It’s hard to imagine even a single course in a software engineering program dedicated entirely to this type of manual “program analysis”. And yet, reading code is so important to what we do, and remains such a difficult task.

No Country for IT

Matt Welsh suggests that systems researchers should work on an escape from configuration hell. I’ve felt some of the pain he describes while managing a small handful of servers.

Coming from a software engineering background, I would have instinctively classified the problem of misconfiguration as a software engineering problem instead of a systems one. But, really it’s an IT problem more than anything else. And therein lies the tragedy:  IT is not considered a respectable area of research in the computer science academic community.

Operator fault tolerance

Because “cloud” has become such a buzzword, it’s tempting to dismiss cloud computing as nothing new. But one genuine change is the rise in software designed to work in an environment where hardware failures are expected. The classic example of this trend is the Netflix Chaos Monkey, which tests a software system by initiating random failures. The IT community calls this sort of system “highly available”, whereas the academic community prefers the term “fault tolerant”.

If you plan to deploy a system like an OpenStack cloud, you need to be aware of the failure modes of the system components (disk failures, power failures, networking issues), and ensure that your system can stay functional when these failures occur. However, when you actually deploy OpenStack on real hardware, you quickly discover that the component that is most likely to generate a fault is you, the operator. Because every installation is different, and because OpenStack has so many options, the probability of forgetting an option or specifying the incorrect value in a config file on the initial deployment is approximately one.

And while developers now design software to minimize the impact due to a hardware failure, there is no such notion of minimizing the impact due to an operator failure. This would require asking questions at development time such as: “What will happen if somebody puts ‘eth1’ instead of ‘eth0’ for public_interface in nova.conf? How would they determine what has gone wrong?”

Designing for operator fault tolerance would be a significant shift in thinking, but I would wager that the additional development effort would translate into enormous reductions in operations effort.