Training is a dirty word

Two posts caught my eye this week. The first was Anil Dash’s The Blue Collar Coder, and the second was Greg Wilson’s Dark Matter, Public Health, and Scientific Computing. Anil wrote about high school students and Greg spoke about scientists, but ultimately they’re both about teaching computer skills to people without a formal background in computing. In other words, training.

In the hierarchy of academia, training is pretty firmly at the bottom. Education at least gets some lip service, being the primary mission of the university and all. But training is a base, vulgar activity. And it’s a real shame, because the problems that Anil and Greg are trying to address are important ones that need solving. Help will need to come from somewhere else.

Relative confidence in scientific theories

One of the challenges of dealing with climate change is that it’s difficult to communicate to the public how much confidence the scientific community has in a particular theory. Here’s a hypothesis: people have a better intuitive grasp of relative comparisons (A is bigger than B) than they do with absolutes (we are 90% confident that “A” is big).

Assuming this hypothesis is true, we could do a broad survey of scientists and use them to rank-order confidence in various scientific theories that the general public is familiar with. Possible examples of theories:

  • Plate tectonics
  • Childhood vaccinations cause autism
  • Germ theory of disease
  • Theory of relativity
  • Cigarette smoking cause lung cancer
  • Diets rich in saturated fats cause heart disease
  • AIDS is caused by HIV
  • The death penalty reduces violent crime
  • Evolution by natural selection
  • Exposure to electromagnetic radiation from high-voltage power lines cause cancer
  • Intelligence is inherited biologically
  • Government stimulus spending reduces unemployment in a recession

Assuming the survey produced a (relatively) stable rank-ordering across these theories, the end goal would be to communicate confidence in a scientific theory by saying: “Scientists are more confident in theory X than they are in theories Y,Z, but not as confident as they are in theories P,Q”.

How do you capture that?

This email from the OpenStack mailing list is a good illustration of the design rationale capture problem:

To: Yun Mao <yunmao@xxxxxxxxx>
From: Vishvananda Ishaya <vishvananda@xxxxxxxxx>
Date: Thu, 1 Mar 2012 12:36:43 -0800

Yes it does. We actually tried to use a pool at diablo release and it was very broken. There was discussion about moving over to a pure-python mysql library, but it hasn’t been tried yet.

Vish

On Mar 1, 2012, at 11:45 AM, Yun Mao wrote:

> There are plenty eventlet discussion recently but I’ll stick my
> question to this thread, although it’s pretty much a separate
> question. 🙂
>
> How is MySQL access handled in eventlet? Presumably it’s external C
> library so it’s not going to be monkey patched. Does that make every
> db access call a blocking call? Thanks,
>
> Yun
>

The problem here is that a database query can block an OpenStack Compute service from running until the query completes, because the implementation uses a green threads library (eventlet) instead of native threads. The OpenStack developers implemented  a non-blocking solution, but the solution broke things, so it was abandoned.

This is really a challenge problem for software engineering: how do you capture this type of information so that a new developer can understand why the code was implemented that way, without depending on the existence of something like the OpenStack mailing list?

My hypothesis is that building up this knowledge incrementally is the best way to go, using a StackOverflow-style Q&A approach. It would be great if we could write a comprehensive design document, but I don’t think it’s possible to know in advance what sort of questions your future reader will want answered. But if you build it based on answering people’s questions, then that frees you up from trying to guess what it is they will need to know.

Here’s another study I’ve always wanted to run: evaluate how well an author of software documentation can predict:

  • what sort of questions the documentation reader will want answered by the docs
  • the amount of prior knowledge the reader will already have