Estimating confidence intervals, part 6

Our series of effort-estimation-in-the-small continues. This feature took a while to complete (where “complete” means “deployed to production”). I thought this was the longest feature I’d worked on, but looking at the historical data, there was another feature that took even longer.

Somehow, the legend has disappeared from the plot. The solid line is my best estimate of time remaining each day, and the dashed line is the true amount of time left. The grey area is my 90% confidence interval estimate.

As usual, my “expected” estimate was much too optimistic. I initially estimated 10 days, where it actually took 17 days. I did stay within my 90% confidence interval, which gives me hope that I’m getting better at those intervals.

When I started this endeavor, my goal was to do a from-scratch estimate each day, but that proved to require too much mental effort, and I succumbed to the anchoring effect. Typically, I would just make an adjustment to the previous day’s estimate.

Interestingly, when I was asked in meetings how much time was left to complete this feature, I gave off-the-cuff (and, unsurprisingly, optimistic) answers instead of consulting my recorded estimates and giving the 90% interval.

Estimating confidence intervals, part 5

This was a quick feature. I accidentally finished on day 4 before I estimated, so I just set the max/min/expected value to 1.

Estimating confidence intervals, part 4

Here’s the latest installment in my continuing saga to estimate effort with 90% confidence intervals. Here’s the plot:

In this case, my estimate of the expected time to completion was fairly close to the actual time. The upper end of the 90% confidence interval is extremely high, largely because there was some work that I considered optional to complete the feature that decided to put off to some future data.

Here’s the plot of the error:

It takes a non-trivial amount of mental efforts to do these estimates each day. I may stop doing these soon.

Estimating confidence intervals, part 3

Another episode in our continuing series of effort estimation in the small with 90% confidence intervals. I recently finished implementing another feature after doing the effort estimates for each day. Here’s the plot:

Once again, I underestimated the effort even at the 90% level, although not as badly as last time. Here’s a plot of the error.

I also find it takes real mental energy to do these daily effort estimates.

Estimating confidence intervals, part 2

Here is another data point from my attempt to estimate 90% confidence intervals. This plot shows my daily estimates for completing a feature I was working on.

The dashed line is the “truth”: it’s what my estimate would have been if I had estimated perfectly each day. The shaded region represents my 90% confidence estimate: I was 90% confident that the amount of time left fell into that region. The solid line is the traditional pointwise effort estimate: it was my best guess as to how many days I had left before the feature would be complete.

For this feature, I significantly underestimated the effort required to complete it. For the first four days, my estimates were so off that my 90% confidence interval didn’t include the true completion time: it was only correct 60% of the time.

This plot shows the error in my estimates for each day:

Apparently, I’m not yet a well-calibrated estimator. Hopefully, that will improve with further estimates.

Results from estimating confidence intervals

A few weeks ago, I decided to estimate 90% confidence intervals for each day that I worked on developing a feature.

Here are some results over 10 days from when I started estimating until when the feature was deployed into production.

If we subtract out the “truth” from the other lines, we can see the error in my estimate for each day:

Some observations:

The 90% confidence interval always included the true value, which gives me hope that this an effective estimation approach.
My pointwise estimate underestimated the true time remaining for 9 out of 10 days.
My first pointwise estimate started off by a factor of two (estimate of 5 days versus an actual of 10 days), and got steadily better over time.

I generated these plots using IPython and the ggplot library. You can see my IPython notebook on my website with details on how these plots were made.

Tackling Hofstadter’s Law with confidence intervals

Hofstadter’s Law: It always takes longer than you expect, even when you take into account Hofstadter’s Law.

One of the frustrating things about developing software is that tasks always seem to take longer to complete than you expected before you started. Somehow, we’re always almost done on whatever feature we’re working on.

I’ve also long been a fan of the idea of using 90% confidence intervals instead of point estimates. Hubbard discusses this in his wonderful book How to measure anything. Instead of trying to pick how long a task will take (e.g., 4 days), you try to predict a range where you are 90% certain that the time will fall within that range (e.g., 3 – 15 days).

I’m going to put my money where my mouth is and try doing confidence interval estimates when working on a feature or bug. I ginned up a quick form using Google Forms and my aim is to fill it in each day, and then evaluate how well I can come up with 90% estimates.