IEEE and the newspaper industry

Ian Sommerville went on a tear the other day about IEEE not supporting open access content. I suspect that IEEE uses the revenues it gets from publication subscriptions to subsidize other activities like conferences. I think the open access model for research publications is inevitable, especially in tech-savvy fields like electrical engineering and computer science. When that happens, the IEEE may find itself in the position that the newspapers did when one of their profit centers (classifieds) got disrupted by free online services like Craigslist, making it difficult to fund the cost centers (news reporting).

Mind you, if free journals and fewer subsidized IEEE conferences means that computer science shifts from being conference-oriented to being journal-oriented like the rest of academia, it will probably be a benefit to the CS research community. But IEEE will have to either reinvent itself or go the way of the local newspaper.

Bifurcation

The IT community:

Shared memory programming (pthreads) is awful! Message-passing (Erlang) would make our lives so much easier!

The high-performance computing community:

Message-passing (MPI) is awful! Shared-memory programming (OpenMP, PGAS) would make our lives so much easier!

Weird and gross

I always get a kick out of the error message that appears when trying to install TeX with Homebrew.

$ brew install tex
Error: No available formula for tex
Installing TeX from source is weird and gross, requires a lot of patches,
and only builds 32-bit (and thus can't use Homebrew deps on Snow Leopard.)

We recommend using a MacTeX distribution: http://www.tug.org/mactex/

A preliminary empirical study to compare MPI and OpenMP

Part of my dissertation work involved controlled experiments to measure the effect of parallel programming model on programmer productivity. Unfortunately, I didn’t have much luck getting these studies published. I just made one of them into a tech report (after being rejected from multiple journals), it’s an experiment that measures the difference in programming effort of MPI vs OpenMP on a programming task. You can judge for yourself whether it’s any good. Here’s the abstract:

Context: The rise of multicore is bringing shared-memory parallelism to the masses. The community is struggling to identify which parallel models are most productive.

Objective: Measure the effect of MPI and OpenMP models on programmer productivity.

Design: One group of programmers solved the sharks and fishes problem using MPI and a second group solved the same problem using OpenMP, then each programmer switched models and solved the same problem again. The participants were graduate students in an HPC course.

Measures: Development effort (hours), program correctness (grades), program performance (speedup versus serial implementation).

Results: Mean OpenMP development time was 9.6 hours less than MPI (95% CI, 0.37-19 hours), a 43% reduction. No statistically significant difference was observed in assignment grades. MPI performance was better than OpenMP performance for 4 out of the 5 students that submitted correct implementations for both models.

Conclusions: OpenMP solutions for this problem required less effort than MPI, but insufficient power to measure the effect on correctness. The performance data was insufficient to draw strong conclusions but suggests that unoptimized MPI programs perform better than unoptimized OpenMP programs, even with a similar parallelization strategy. Further studies are necessary to examine different programming problems, models, and levels of programmer experience.

Newer interfaces aren’t always better

I was at the car dealer today, dropping of the car for safety and emission inspections. The service advisor brought up a VT100-style terminal interface on her computer to punch in the service details. She made an offhand remark about a problem with the computer system today, and slammed her way through some text screens on the numeric keypad.

If that system was implemented today, it would have a web interface. It would be more pleasant to look at, and easier for a new employee to learn. And, yet, once that employee became proficient, I’d wager that tasks would take longer using a web interface compared to the original terminal interface.

Rankings are hard

Good Communications of the ACM article on the challenge of generating reliable university rankings. I wonder if NSF funds research to solve this problem.

Apple and published research

This makes me sad.

Constraints and complex software deployment

OpenStack can be tricky to get up and running: it’s composed of multiple services, some of which are broken out into separate projects (nova, glance, keystone, novaclient), and each has multiple external dependencies. Plus, the various Linux distributions do things differently, so what works on Ubuntu may fail on, say, RedHat Enterprise Linux. For example, someone on our team just discovered that a problem was related to using the wrong version of the WebOb Python library (he was using 1.2, and it worked when reverting back to 1.1).

It would be useful if there was a very easy way to capture these constraints, (e.g.: WebOb != 1.2), and then run a checker before starting things to up to see if any constraints were violated. Maybe annotate the constraints with some metadata (e.g., date, version of nova and keystone used, OS) that could also be checked, to give a heuristic about how likely it is this contraint applies in the current context. These contraints would have to be:

Easy to capture incrementally (so people would actually document them once they discovered them)
Easy to check (so other people could actually use them)
Easy to update (so that they could be removed or invalidated when they no longer applied)

Basically, something akin to the checks done by the configure script generated by GNU autootools, but simpler for an end-user to update.

Code is for reading, docs are for writing

The documentation is out of date, you’ll have to read the code. Make sure the code you write will be readable by others.
Write documentation. Yes, it will go quickly out of date, but in the act of explaining how the software works to others, you’ll force yourself to acquire a better understanding of how it works.

Swing and a miss

Just got a collaborative proposal rejected from the NSF Human-Centered Computing program. It was about an approach to improving software engineering for computational science & engineering projects, by tailoring techniques that are known to work in traditional IT. This line in the panel summary writeup broke my heart:

A significant concern was that the work had an unclear scientific contribution: the work proposed seems like an application of known principles (indeed, only a few principles) and so not likely to provide an advance in knowledge or understanding in a scientific field.

I was hoping applying known principles would have been a plus (more likely to succeed!), but instead it turned out to be a minus. *sigh*