A preliminary empirical study to compare MPI and OpenMP

Part of my dissertation work involved controlled experiments to measure the effect of parallel programming model on programmer productivity. Unfortunately, I didn’t have much luck getting these studies published. I just made one of them into a tech report (after being rejected from multiple journals), it’s an experiment that measures the difference in programming effort of MPI vs OpenMP on a programming task.  You can judge for yourself whether it’s any good. Here’s the abstract:

Context: The rise of multicore is bringing shared-memory parallelism to the masses. The community is struggling to identify which parallel models are most productive.

Objective: Measure the effect of MPI and OpenMP models on programmer productivity.

Design: One group of programmers solved the sharks and fishes problem using MPI and a second group solved the same problem using OpenMP, then each programmer switched models and solved the same problem again. The participants were graduate students in an HPC course.

Measures: Development effort (hours), program correctness (grades), program performance (speedup versus serial implementation).

Results: Mean OpenMP development time was 9.6 hours less than MPI (95% CI, 0.37-19 hours), a 43% reduction. No statistically significant difference was observed in assignment grades. MPI performance was better than OpenMP performance for 4 out of the 5 students that submitted correct implementations for both models.

Conclusions: OpenMP solutions for this problem required less effort than MPI, but insufficient power to measure the effect on correctness. The performance data was insufficient to draw strong conclusions but suggests that unoptimized MPI programs perform better than unoptimized OpenMP programs, even with a similar parallelization strategy. Further studies are necessary to examine different programming problems, models, and levels of programmer experience.

4 thoughts on “A preliminary empirical study to compare MPI and OpenMP

    1. Greg:

      If you’re curious, here are the reviews from IEEE Transactions in Software Engineering:

      Reviewer: 1

      Recommendation: Author Should Prepare A Minor Revision

      Comments:
      this is a very good paper about a flawed study.  the flaws are identified in the paper (e.g. sample size was too small) so that is OK.  In fact, its important to publish this study flaws and all so this type of research can grow.  And that is exactly what I hope happens … that others read this paper, note the flaws, and try to repeat similar stuides on their own.  So in that light, please understand that I am not being harshly ciritcal when I mention the flaws up front.

      Another important flaws is that the problem assigned to the students should have been recasted to include optimization. Those of us with years of experience in MPI and OpenMP know that OpenMP programs are easy to create but hard to optimize.  when you include optimization, the benefits over MPI often evaporate.

      Your metric of correctness (page 15) is to vague.  in a future study, it would be nice to have a more quantitative and reproducible measure of correctness.

      there are a few other flaws … but that’s OK.  You identify them up front and in doing so make them part of the contribution of this paper; i.e. people can see them and learn from them so later studies (and I hope there are later studies) can do a better job.

      Along this line of helping additional studies of this sort get off the ground, i really appreciate the care you took to provide a thorough survey of related research.   The list of references included in this paper might be excessive, but given the sparsity of empirical studies of programmers in the software engineering literature, I would rather see you err on the side of too many references rather than too few.

      i have some specific changes you need to make in the paper.

      In general, the captions you use with your figures are too short. It’s important that figures and captions provide enough information so they stand alone.

      Page 7 line 19 … you imply that MPI is not well-liked.   That is not true.  there is a large community of programmers (myself among them) that likes it very much … even for programming shared memory systems.

      page 7 line 56 … you state that “one way of forking threads” is the parallel region.  It is important to point out when describing openMP that “the only way to fork threads” is with a parallel region.  If you don’t have a parallel region, you don’t have multiple threads.

      page 8 line 56 you state “another way of forking threads is to use a work-sharing construct”.  This is not correct.  A worksharing construct split up loop iterations between an already existing team of threads.  New threads are not forked by a worksharing construct.  This is an extremely important point when trying to understand OpenMP.  So in your program on page 9 line 6, you must use the pragma “pragma omp parallel for”.  With out the parallel, you do not get multiple threads.  Notice that you get it right near the bottom of the page on line 52.

      On page 10 line 34, I don’t understand what you mean by the phrase “Stated in GQM form [6]”.  did I miss something from earlier in the paper?  Or is GQM an undefined term.

      on page 11 line 4, you suggest that people expect OpenMP to out perform MPi on a shared memory system. Actually, those of us working in the field know that its often the other way around.  the problem is openMP doesn’t help you manage data locality while MPI does a pretty good job of this …even on a shared memory platform.

      Page 15 line 54 … you are missing a section number.

      Additional Questions:
      How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Very Relevant

      Is the manuscript technically sound? Please explain under the Public Comments section below.: Yes

      1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes

      2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate

      3. Please rate the organization and  readability of this manuscript. Please explain under the Public Comments section below.: Easy to read

      Please rate the manuscript. Explain your rating under the Public Comments section below.: Good

      Reviewer: 2

      Recommendation: Reject

      Comments:
      The paper reports on a within-subjects controlled experiment comparing MPI and OpenMP with respect to coding speed, quality, and performance. One group solved a problem with MPI first, then again with OpenMP (same programming problem). The other group also solved the same problem twice, but started with OpenMP and used MPI second. Coding time for OpenMP was 43% less than for MPI. Other differences were not statistically significant.

      The paper suffers from many inadequacies, which make its conclusion highly questionable.

      1. The groups were unbalanced. 8 students started with OpenMP, and only 3 with MPI. Given large individual differences of programmers, using only 3 students for one of the groups makes the results random.
      2. There is no power analysis for establishing the minimum number of participants. The inconclusive results could be entirely due to lack of power.
      3. Only 5 students completed both a working OpenMP and MPI implementation. So in fact, only those 5 pairs of data values can be compared, because the others produced buggy software for at least one implementation. (The paper doesn’t say which ones of the 5 belonged to which of the
      two groups.)
      Nevertheless, the analysis uses all 11 pairs of data points.
      4. A serious flaw is the interaction between development time and order of treatment.  The authors actually observe an interaction effect, which is approximately two times 25% in development time. So the observed difference in development time could be almost entirely due to this interaction effect. The correct approach would have been a fully counterbalanced design with two problems and 4 groups. The order of problems and the order of treatment would be counterbalanced, and no participant would solve the same problem twice. Then the interaction effect could be avoided. Because the authors didn’t take this precaution and did observe a large interaction effect, the result is basically worthless.
      5. The experiment doesn’t control for quality. It is obvious that poor quality can be produced quickly, while good quality requires time. In fact, the authors note that several programs don’t run with every MPI implementation, pointing to serious bugs. The differences in coding time observed could be entirely due to differences in time spent on quality code. The authors should have made sure a certain minimal quality was achieved, for instance  by providing test cases or inspections.
      6. The experiment does not control for speedup. Achieving a worth-while speedup also requires development time. Instead, the instructions for the experiment contain the following hint (page 38): “The goal is not to write the most efficient implementations, but rather to learn parallel programming”
      So the students didn’t need to spend any time on getting the program to run fast, and indeed, the achieved speedups are sometimes acceptable (2.8 on 4 processors), and sometimes a slowdown (0.7)! The results observed could be entirely due to the fact that students spent different times on getting good parallel performance. The results would be quite different if the students were parallelizing under realistic goals, namely achieving a minimal speedup.
      The authors should have done a preliminary implementation to find out what speedups are reasonable and then requiring students to achieve a certain portion of that. Accepting slowdowns  in this simple-to-parallelize problem is certainly
      unreasonable and unrealistic.
      7. The problem was easy to parallelize. It is a nearest-neighbor computation on a square grid, further simplified by red-black-ordering. A second problem requiring a different parallelization should have been included.

      The authors mention 9 previous studies comparing MPI and OpenMP, all with consistent results.  This paper adds nothing, in particular because the experimental design is seriously flawed.

      Minor comment: Fig. 6 is unintelligible. What do the lines show?
      Provide LOC for sequential, MPI and OpenMP versions.

      Additional Questions:
      How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Irrelevant

      Is the manuscript technically sound? Please explain under the Public Comments section below.: No

      1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes

      2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate

      3. Please rate the organization and  readability of this manuscript. Please explain under the Public Comments section below.: Easy to read

      Please rate the manuscript. Explain your rating under the Public Comments section below.: Poor

      Reviewer: 3

      Recommendation: Revise and Resubmit as “new”

      Comments:

      The title fails to reflect the sparseness of the data set.

      Presenting the results from both this study (OpenMP vs MPI) as well as the referenced paper (MPI vs XMTC), while skipping the introductory sections describing the various programming models might make the paper more suited for publication within the Transactions.

      Hypothesis H2 seems improperly formulated, and inappropriate for this study scale.

      Finally, looking at Figure 6, the interaction diagram, within the context of Figure 5 and the variance in efforts for each programming model, suggests that there is no interaction effect contrary to the conclusion drawn in lines 35 – 45 on Page 19.

      Additional Questions:
      How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Relevant

      Is the manuscript technically sound? Please explain under the Public Comments section below.: Partially

      1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: No

      2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate

      3. Please rate the organization and  readability of this manuscript. Please explain under the Public Comments section below.: Easy to read

      Please rate the manuscript. Explain your rating under the Public Comments section below.: Fair

      Here are the reviews from Concurrency and Computation: Practice and Experience (Wiley):

      Reviewer: 1
      Comments to the Author
      This paper presents a systematic research through experimentation with human subjects on comparing two widely used models in HPC community – OpenMP and MPI. Two groups of students are chosen to compare their productivity on MPI and OpenMP. A null hypothesis significance testing (NHST) method from psychology research is used. It aims in the development of a body of evidence about how the choice of MPI vs. OpenMP affects programmer effort and program
      performance.

      The paper first shows the related work, and brief introduction to NHST. Then it gives the detailed description of the study, the goal of the study is to analyze two programming models with their development effort, correctness and performance. The study includes hypotheses, study design, participants and groups, study task, procedure and apparatus. From the data analysis portion,  it is found that writing OpenMP requires less development time than writing MPI programs; OpenMP programs are more likely to be correct than writing MPI programs; and novice programmers are more likely to get better performance with MPI than OpenMP. The paper also talks about the concern of the validity, the internal threat is the source of variation in the dependent variables could be the human-subject; the construct threat is how to faily abstract concepts like “correctness” and “performance” and define suitable metrics to be collected during the experiment and use for statistical analysis; and the external threat is that participant might or might not have experience in parallel programming. The author knows the limits of this experiment, and provides the realistic result which is useful for the scientific community.

      The experiment is done about 4 and half years ago, maybe a bit outdated, and sample size seems too small to give the convinced answer.A repeated experiment conducts in the following year for the same course may be more attractive. And all bibliography are pre-2006, which might not include most recent research in this area. The paper attaches the course assignment and shows the well documented work, but in page 31 says “a sample output file for the sample input data will also be made available soon” which shows this may not be the final version of assignment.

      Things to change:

      pg.15 “result in result in” -> result in

      Reviewer: 2
      Comments to the Author
      The goal of this paper is to measure the effect of the MPI and OpenMP models on programmer productivity in the context of shared-memory parallelism. A group of graduate students in an HPC course was asked to provide a solution to the “sharks and fishes” problem using both paradigms. The authors measured the development effort (in hours), the performance of the parallel solution (speedup against a sequential solution), and decided on the correctness
      of the delivered algorithms based on the grading given by the students’ advisors.

      This paper has several serious deficiencies, including the following:

      (1) The technical content–in the sense of discussing the properties of MPI and OpenMP and the pros and cons for addressing issues in shared-memory programming–is close to 0. For example, the    effect of caches and cache hierarchies, different approaches to synchronization, and dealing with
      dependences are not explored in any serious way.

      (2) Likewise, the focus on just one problem does not provide a solid basis for judging the quality of different programming models.

      (3) Judging the correctness of a solution by relying on the grading of an advisor, without any formal analysis, is not acceptable.

      (4) A lot of statistical analysis is done, however the sample size is 11! And 8 of these students used first OpenMP, while 3 used MPI first. Even if the
      underlying methodology was acceptable, no general conclusions of any kind can be drawn from such a sample and asymmetric approach.

    2. I have my own bias, as I’ve always believed that the shared-memory paradigm for parallel programming is far clearer than the message-passing one. Just compare the number of lines of code involved, and you can already see that the shared-memory model is simpler. Yes, that simplicity can be deceiving, but as a general rule, simplicity leads to faster development times and fewer bugs. Hey, isn’t that one of the reasons we don’t write in assembly language?

      As to your paper’s getting rejected, I only skimmed through the paper and skimmed through the reviews, but the latter look really typical. In my view, (a) the quality of reviewing in CS has gone way down and (b) the profession seems to have developed an ethos in which reviewers somehow feel a compulsion to reject papers. I don’t know if there is agreement on (a), but (b) was written up in a recent CACM essay (or CRA, something like that); this is an extremely serious problem. I certainly don’t think you should give up on publishing the paper, though.

      I did notice a couple of things in both your paper and the reviews regarding statistical methodology that I might disagree with. In my former life, I was a statistics professor, and still use statistics in much of my CS research. If you’re interested, I can make some suggestions.

      Norm

      1. Norm:

        I’d be happy to hear your feedback on how the statistical analysis could be improved. I really know just enough to be dangerous.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s