A preliminary empirical study to compare MPI and OpenMP

Greg:

If you’re curious, here are the reviews from IEEE Transactions in Software Engineering:

Reviewer: 1

Recommendation: Author Should Prepare A Minor Revision

Comments:
this is a very good paper about a flawed study. the flaws are identified in the paper (e.g. sample size was too small) so that is OK. In fact, its important to publish this study flaws and all so this type of research can grow. And that is exactly what I hope happens … that others read this paper, note the flaws, and try to repeat similar stuides on their own. So in that light, please understand that I am not being harshly ciritcal when I mention the flaws up front.

Another important flaws is that the problem assigned to the students should have been recasted to include optimization. Those of us with years of experience in MPI and OpenMP know that OpenMP programs are easy to create but hard to optimize. when you include optimization, the benefits over MPI often evaporate.

Your metric of correctness (page 15) is to vague. in a future study, it would be nice to have a more quantitative and reproducible measure of correctness.

there are a few other flaws … but that’s OK. You identify them up front and in doing so make them part of the contribution of this paper; i.e. people can see them and learn from them so later studies (and I hope there are later studies) can do a better job.

Along this line of helping additional studies of this sort get off the ground, i really appreciate the care you took to provide a thorough survey of related research. The list of references included in this paper might be excessive, but given the sparsity of empirical studies of programmers in the software engineering literature, I would rather see you err on the side of too many references rather than too few.

i have some specific changes you need to make in the paper.

In general, the captions you use with your figures are too short. It’s important that figures and captions provide enough information so they stand alone.

Page 7 line 19 … you imply that MPI is not well-liked. That is not true. there is a large community of programmers (myself among them) that likes it very much … even for programming shared memory systems.

page 7 line 56 … you state that “one way of forking threads” is the parallel region. It is important to point out when describing openMP that “the only way to fork threads” is with a parallel region. If you don’t have a parallel region, you don’t have multiple threads.

page 8 line 56 you state “another way of forking threads is to use a work-sharing construct”. This is not correct. A worksharing construct split up loop iterations between an already existing team of threads. New threads are not forked by a worksharing construct. This is an extremely important point when trying to understand OpenMP. So in your program on page 9 line 6, you must use the pragma “pragma omp parallel for”. With out the parallel, you do not get multiple threads. Notice that you get it right near the bottom of the page on line 52.

On page 10 line 34, I don’t understand what you mean by the phrase “Stated in GQM form [6]”. did I miss something from earlier in the paper? Or is GQM an undefined term.

on page 11 line 4, you suggest that people expect OpenMP to out perform MPi on a shared memory system. Actually, those of us working in the field know that its often the other way around. the problem is openMP doesn’t help you manage data locality while MPI does a pretty good job of this …even on a shared memory platform.

Page 15 line 54 … you are missing a section number.

Additional Questions:
How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Very Relevant

Is the manuscript technically sound? Please explain under the Public Comments section below.: Yes

1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes

2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate

3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read

Please rate the manuscript. Explain your rating under the Public Comments section below.: Good

Reviewer: 2

Recommendation: Reject

Comments:
The paper reports on a within-subjects controlled experiment comparing MPI and OpenMP with respect to coding speed, quality, and performance. One group solved a problem with MPI first, then again with OpenMP (same programming problem). The other group also solved the same problem twice, but started with OpenMP and used MPI second. Coding time for OpenMP was 43% less than for MPI. Other differences were not statistically significant.

The paper suffers from many inadequacies, which make its conclusion highly questionable.

1. The groups were unbalanced. 8 students started with OpenMP, and only 3 with MPI. Given large individual differences of programmers, using only 3 students for one of the groups makes the results random.
2. There is no power analysis for establishing the minimum number of participants. The inconclusive results could be entirely due to lack of power.
3. Only 5 students completed both a working OpenMP and MPI implementation. So in fact, only those 5 pairs of data values can be compared, because the others produced buggy software for at least one implementation. (The paper doesn’t say which ones of the 5 belonged to which of the
two groups.)
Nevertheless, the analysis uses all 11 pairs of data points.
4. A serious flaw is the interaction between development time and order of treatment. The authors actually observe an interaction effect, which is approximately two times 25% in development time. So the observed difference in development time could be almost entirely due to this interaction effect. The correct approach would have been a fully counterbalanced design with two problems and 4 groups. The order of problems and the order of treatment would be counterbalanced, and no participant would solve the same problem twice. Then the interaction effect could be avoided. Because the authors didn’t take this precaution and did observe a large interaction effect, the result is basically worthless.
5. The experiment doesn’t control for quality. It is obvious that poor quality can be produced quickly, while good quality requires time. In fact, the authors note that several programs don’t run with every MPI implementation, pointing to serious bugs. The differences in coding time observed could be entirely due to differences in time spent on quality code. The authors should have made sure a certain minimal quality was achieved, for instance by providing test cases or inspections.
6. The experiment does not control for speedup. Achieving a worth-while speedup also requires development time. Instead, the instructions for the experiment contain the following hint (page 38): “The goal is not to write the most efficient implementations, but rather to learn parallel programming”
So the students didn’t need to spend any time on getting the program to run fast, and indeed, the achieved speedups are sometimes acceptable (2.8 on 4 processors), and sometimes a slowdown (0.7)! The results observed could be entirely due to the fact that students spent different times on getting good parallel performance. The results would be quite different if the students were parallelizing under realistic goals, namely achieving a minimal speedup.
The authors should have done a preliminary implementation to find out what speedups are reasonable and then requiring students to achieve a certain portion of that. Accepting slowdowns in this simple-to-parallelize problem is certainly
unreasonable and unrealistic.
7. The problem was easy to parallelize. It is a nearest-neighbor computation on a square grid, further simplified by red-black-ordering. A second problem requiring a different parallelization should have been included.

The authors mention 9 previous studies comparing MPI and OpenMP, all with consistent results. This paper adds nothing, in particular because the experimental design is seriously flawed.

Minor comment: Fig. 6 is unintelligible. What do the lines show?
Provide LOC for sequential, MPI and OpenMP versions.

Additional Questions:
How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Irrelevant

Is the manuscript technically sound? Please explain under the Public Comments section below.: No

1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes

2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate

3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read

Please rate the manuscript. Explain your rating under the Public Comments section below.: Poor

Reviewer: 3

Recommendation: Revise and Resubmit as “new”

Comments:

The title fails to reflect the sparseness of the data set.

Presenting the results from both this study (OpenMP vs MPI) as well as the referenced paper (MPI vs XMTC), while skipping the introductory sections describing the various programming models might make the paper more suited for publication within the Transactions.

Hypothesis H2 seems improperly formulated, and inappropriate for this study scale.

Finally, looking at Figure 6, the interaction diagram, within the context of Figure 5 and the variance in efforts for each programming model, suggests that there is no interaction effect contrary to the conclusion drawn in lines 35 – 45 on Page 19.

Additional Questions:
How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Relevant

Is the manuscript technically sound? Please explain under the Public Comments section below.: Partially

1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: No

2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate

3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read

Please rate the manuscript. Explain your rating under the Public Comments section below.: Fair

Here are the reviews from Concurrency and Computation: Practice and Experience (Wiley):

Reviewer: 1
Comments to the Author
This paper presents a systematic research through experimentation with human subjects on comparing two widely used models in HPC community – OpenMP and MPI. Two groups of students are chosen to compare their productivity on MPI and OpenMP. A null hypothesis significance testing (NHST) method from psychology research is used. It aims in the development of a body of evidence about how the choice of MPI vs. OpenMP affects programmer effort and program
performance.

The paper first shows the related work, and brief introduction to NHST. Then it gives the detailed description of the study, the goal of the study is to analyze two programming models with their development effort, correctness and performance. The study includes hypotheses, study design, participants and groups, study task, procedure and apparatus. From the data analysis portion, it is found that writing OpenMP requires less development time than writing MPI programs; OpenMP programs are more likely to be correct than writing MPI programs; and novice programmers are more likely to get better performance with MPI than OpenMP. The paper also talks about the concern of the validity, the internal threat is the source of variation in the dependent variables could be the human-subject; the construct threat is how to faily abstract concepts like “correctness” and “performance” and define suitable metrics to be collected during the experiment and use for statistical analysis; and the external threat is that participant might or might not have experience in parallel programming. The author knows the limits of this experiment, and provides the realistic result which is useful for the scientific community.

The experiment is done about 4 and half years ago, maybe a bit outdated, and sample size seems too small to give the convinced answer.A repeated experiment conducts in the following year for the same course may be more attractive. And all bibliography are pre-2006, which might not include most recent research in this area. The paper attaches the course assignment and shows the well documented work, but in page 31 says “a sample output file for the sample input data will also be made available soon” which shows this may not be the final version of assignment.

Things to change:

pg.15 “result in result in” -> result in

Reviewer: 2
Comments to the Author
The goal of this paper is to measure the effect of the MPI and OpenMP models on programmer productivity in the context of shared-memory parallelism. A group of graduate students in an HPC course was asked to provide a solution to the “sharks and fishes” problem using both paradigms. The authors measured the development effort (in hours), the performance of the parallel solution (speedup against a sequential solution), and decided on the correctness
of the delivered algorithms based on the grading given by the students’ advisors.

This paper has several serious deficiencies, including the following:

(1) The technical content–in the sense of discussing the properties of MPI and OpenMP and the pros and cons for addressing issues in shared-memory programming–is close to 0. For example, the effect of caches and cache hierarchies, different approaches to synchronization, and dealing with
dependences are not explored in any serious way.

(2) Likewise, the focus on just one problem does not provide a solid basis for judging the quality of different programming models.

(3) Judging the correctness of a solution by relying on the grading of an advisor, without any formal analysis, is not acceptable.

(4) A lot of statistical analysis is done, however the sample size is 11! And 8 of these students used first OpenMP, while 3 used MPI first. Even if the
underlying methodology was acceptable, no general conclusions of any kind can be drawn from such a sample and asymmetric approach.

4 thoughts on “A preliminary empirical study to compare MPI and OpenMP”

Greg Wilson says:

December 20, 2011 at 9:24 pm

Not that I want to dredge up anything painful, but why did you have trouble getting them published?

1. Lorin says:
  
  December 20, 2011 at 9:56 pm
  
  Greg:
  
  If you’re curious, here are the reviews from IEEE Transactions in Software Engineering:
  
  Reviewer: 1
  
  Recommendation: Author Should Prepare A Minor Revision
  
  Comments:
  this is a very good paper about a flawed study. the flaws are identified in the paper (e.g. sample size was too small) so that is OK. In fact, its important to publish this study flaws and all so this type of research can grow. And that is exactly what I hope happens … that others read this paper, note the flaws, and try to repeat similar stuides on their own. So in that light, please understand that I am not being harshly ciritcal when I mention the flaws up front.
  
  Another important flaws is that the problem assigned to the students should have been recasted to include optimization. Those of us with years of experience in MPI and OpenMP know that OpenMP programs are easy to create but hard to optimize. when you include optimization, the benefits over MPI often evaporate.
  
  Your metric of correctness (page 15) is to vague. in a future study, it would be nice to have a more quantitative and reproducible measure of correctness.
  
  there are a few other flaws … but that’s OK. You identify them up front and in doing so make them part of the contribution of this paper; i.e. people can see them and learn from them so later studies (and I hope there are later studies) can do a better job.
  
  Along this line of helping additional studies of this sort get off the ground, i really appreciate the care you took to provide a thorough survey of related research. The list of references included in this paper might be excessive, but given the sparsity of empirical studies of programmers in the software engineering literature, I would rather see you err on the side of too many references rather than too few.
  
  i have some specific changes you need to make in the paper.
  
  In general, the captions you use with your figures are too short. It’s important that figures and captions provide enough information so they stand alone.
  
  Page 7 line 19 … you imply that MPI is not well-liked. That is not true. there is a large community of programmers (myself among them) that likes it very much … even for programming shared memory systems.
  
  page 7 line 56 … you state that “one way of forking threads” is the parallel region. It is important to point out when describing openMP that “the only way to fork threads” is with a parallel region. If you don’t have a parallel region, you don’t have multiple threads.
  
  page 8 line 56 you state “another way of forking threads is to use a work-sharing construct”. This is not correct. A worksharing construct split up loop iterations between an already existing team of threads. New threads are not forked by a worksharing construct. This is an extremely important point when trying to understand OpenMP. So in your program on page 9 line 6, you must use the pragma “pragma omp parallel for”. With out the parallel, you do not get multiple threads. Notice that you get it right near the bottom of the page on line 52.
  
  On page 10 line 34, I don’t understand what you mean by the phrase “Stated in GQM form [6]”. did I miss something from earlier in the paper? Or is GQM an undefined term.
  
  on page 11 line 4, you suggest that people expect OpenMP to out perform MPi on a shared memory system. Actually, those of us working in the field know that its often the other way around. the problem is openMP doesn’t help you manage data locality while MPI does a pretty good job of this …even on a shared memory platform.
  
  Page 15 line 54 … you are missing a section number.
  
  Additional Questions:
  How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Very Relevant
  
  Is the manuscript technically sound? Please explain under the Public Comments section below.: Yes
  
  1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes
  
  2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate
  
  3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read
  
  Please rate the manuscript. Explain your rating under the Public Comments section below.: Good
  
  Reviewer: 2
  
  Recommendation: Reject
  
  Comments:
  The paper reports on a within-subjects controlled experiment comparing MPI and OpenMP with respect to coding speed, quality, and performance. One group solved a problem with MPI first, then again with OpenMP (same programming problem). The other group also solved the same problem twice, but started with OpenMP and used MPI second. Coding time for OpenMP was 43% less than for MPI. Other differences were not statistically significant.
  
  The paper suffers from many inadequacies, which make its conclusion highly questionable.
  
  1. The groups were unbalanced. 8 students started with OpenMP, and only 3 with MPI. Given large individual differences of programmers, using only 3 students for one of the groups makes the results random.
  2. There is no power analysis for establishing the minimum number of participants. The inconclusive results could be entirely due to lack of power.
  3. Only 5 students completed both a working OpenMP and MPI implementation. So in fact, only those 5 pairs of data values can be compared, because the others produced buggy software for at least one implementation. (The paper doesn’t say which ones of the 5 belonged to which of the
  two groups.)
  Nevertheless, the analysis uses all 11 pairs of data points.
  4. A serious flaw is the interaction between development time and order of treatment. The authors actually observe an interaction effect, which is approximately two times 25% in development time. So the observed difference in development time could be almost entirely due to this interaction effect. The correct approach would have been a fully counterbalanced design with two problems and 4 groups. The order of problems and the order of treatment would be counterbalanced, and no participant would solve the same problem twice. Then the interaction effect could be avoided. Because the authors didn’t take this precaution and did observe a large interaction effect, the result is basically worthless.
  5. The experiment doesn’t control for quality. It is obvious that poor quality can be produced quickly, while good quality requires time. In fact, the authors note that several programs don’t run with every MPI implementation, pointing to serious bugs. The differences in coding time observed could be entirely due to differences in time spent on quality code. The authors should have made sure a certain minimal quality was achieved, for instance by providing test cases or inspections.
  6. The experiment does not control for speedup. Achieving a worth-while speedup also requires development time. Instead, the instructions for the experiment contain the following hint (page 38): “The goal is not to write the most efficient implementations, but rather to learn parallel programming”
  So the students didn’t need to spend any time on getting the program to run fast, and indeed, the achieved speedups are sometimes acceptable (2.8 on 4 processors), and sometimes a slowdown (0.7)! The results observed could be entirely due to the fact that students spent different times on getting good parallel performance. The results would be quite different if the students were parallelizing under realistic goals, namely achieving a minimal speedup.
  The authors should have done a preliminary implementation to find out what speedups are reasonable and then requiring students to achieve a certain portion of that. Accepting slowdowns in this simple-to-parallelize problem is certainly
  unreasonable and unrealistic.
  7. The problem was easy to parallelize. It is a nearest-neighbor computation on a square grid, further simplified by red-black-ordering. A second problem requiring a different parallelization should have been included.
  
  The authors mention 9 previous studies comparing MPI and OpenMP, all with consistent results. This paper adds nothing, in particular because the experimental design is seriously flawed.
  
  Minor comment: Fig. 6 is unintelligible. What do the lines show?
  Provide LOC for sequential, MPI and OpenMP versions.
  
  Additional Questions:
  How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Irrelevant
  
  Is the manuscript technically sound? Please explain under the Public Comments section below.: No
  
  1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: Yes
  
  2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate
  
  3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read
  
  Please rate the manuscript. Explain your rating under the Public Comments section below.: Poor
  
  Reviewer: 3
  
  Recommendation: Revise and Resubmit as “new”
  
  Comments:
  
  The title fails to reflect the sparseness of the data set.
  
  Presenting the results from both this study (OpenMP vs MPI) as well as the referenced paper (MPI vs XMTC), while skipping the introductory sections describing the various programming models might make the paper more suited for publication within the Transactions.
  
  Hypothesis H2 seems improperly formulated, and inappropriate for this study scale.
  
  Finally, looking at Figure 6, the interaction diagram, within the context of Figure 5 and the variance in efforts for each programming model, suggests that there is no interaction effect contrary to the conclusion drawn in lines 35 – 45 on Page 19.
  
  Additional Questions:
  How relevant is this manuscript to the readers of this periodical? Please explain under the Public Comments section below.: Relevant
  
  Is the manuscript technically sound? Please explain under the Public Comments section below.: Partially
  
  1. Are the title, abstract, and keywords appropriate? Please explain under the Public Comments section below.: No
  
  2. Does the manuscript contain sufficient and appropriate references? Please explain under the Public Comments section below.: References are sufficient and appropriate
  
  3. Please rate the organization and readability of this manuscript. Please explain under the Public Comments section below.: Easy to read
  
  Please rate the manuscript. Explain your rating under the Public Comments section below.: Fair
  
  Here are the reviews from Concurrency and Computation: Practice and Experience (Wiley):
  
  Reviewer: 1
  Comments to the Author
  This paper presents a systematic research through experimentation with human subjects on comparing two widely used models in HPC community – OpenMP and MPI. Two groups of students are chosen to compare their productivity on MPI and OpenMP. A null hypothesis significance testing (NHST) method from psychology research is used. It aims in the development of a body of evidence about how the choice of MPI vs. OpenMP affects programmer effort and program
  performance.
  
  The paper first shows the related work, and brief introduction to NHST. Then it gives the detailed description of the study, the goal of the study is to analyze two programming models with their development effort, correctness and performance. The study includes hypotheses, study design, participants and groups, study task, procedure and apparatus. From the data analysis portion, it is found that writing OpenMP requires less development time than writing MPI programs; OpenMP programs are more likely to be correct than writing MPI programs; and novice programmers are more likely to get better performance with MPI than OpenMP. The paper also talks about the concern of the validity, the internal threat is the source of variation in the dependent variables could be the human-subject; the construct threat is how to faily abstract concepts like “correctness” and “performance” and define suitable metrics to be collected during the experiment and use for statistical analysis; and the external threat is that participant might or might not have experience in parallel programming. The author knows the limits of this experiment, and provides the realistic result which is useful for the scientific community.
  
  The experiment is done about 4 and half years ago, maybe a bit outdated, and sample size seems too small to give the convinced answer.A repeated experiment conducts in the following year for the same course may be more attractive. And all bibliography are pre-2006, which might not include most recent research in this area. The paper attaches the course assignment and shows the well documented work, but in page 31 says “a sample output file for the sample input data will also be made available soon” which shows this may not be the final version of assignment.
  
  Things to change:
  
  pg.15 “result in result in” -> result in
  
  Reviewer: 2
  Comments to the Author
  The goal of this paper is to measure the effect of the MPI and OpenMP models on programmer productivity in the context of shared-memory parallelism. A group of graduate students in an HPC course was asked to provide a solution to the “sharks and fishes” problem using both paradigms. The authors measured the development effort (in hours), the performance of the parallel solution (speedup against a sequential solution), and decided on the correctness
  of the delivered algorithms based on the grading given by the students’ advisors.
  
  This paper has several serious deficiencies, including the following:
  
  (1) The technical content–in the sense of discussing the properties of MPI and OpenMP and the pros and cons for addressing issues in shared-memory programming–is close to 0. For example, the effect of caches and cache hierarchies, different approaches to synchronization, and dealing with
  dependences are not explored in any serious way.
  
  (2) Likewise, the focus on just one problem does not provide a solid basis for judging the quality of different programming models.
  
  (3) Judging the correctness of a solution by relying on the grading of an advisor, without any formal analysis, is not acceptable.
  
  (4) A lot of statistical analysis is done, however the sample size is 11! And 8 of these students used first OpenMP, while 3 used MPI first. Even if the
  underlying methodology was acceptable, no general conclusions of any kind can be drawn from such a sample and asymmetric approach.
  
2. Norm Matloff says:
  
  December 22, 2011 at 12:45 am
  
  I have my own bias, as I’ve always believed that the shared-memory paradigm for parallel programming is far clearer than the message-passing one. Just compare the number of lines of code involved, and you can already see that the shared-memory model is simpler. Yes, that simplicity can be deceiving, but as a general rule, simplicity leads to faster development times and fewer bugs. Hey, isn’t that one of the reasons we don’t write in assembly language?
  
  As to your paper’s getting rejected, I only skimmed through the paper and skimmed through the reviews, but the latter look really typical. In my view, (a) the quality of reviewing in CS has gone way down and (b) the profession seems to have developed an ethos in which reviewers somehow feel a compulsion to reject papers. I don’t know if there is agreement on (a), but (b) was written up in a recent CACM essay (or CRA, something like that); this is an extremely serious problem. I certainly don’t think you should give up on publishing the paper, though.
  
  I did notice a couple of things in both your paper and the reviews regarding statistical methodology that I might disagree with. In my former life, I was a statistics professor, and still use statistics in much of my CS research. If you’re interested, I can make some suggestions.
  
  Norm
  
  1. Lorin says:
    
    December 22, 2011 at 4:34 pm
    
    Norm:
    
    I’d be happy to hear your feedback on how the statistical analysis could be improved. I really know just enough to be dangerous.

A preliminary empirical study to compare MPI and OpenMP

Published by Lorin Hochstein

4 thoughts on “A preliminary empirical study to compare MPI and OpenMP”

Leave a comment Cancel reply

Share this:

Published by Lorin Hochstein

4 thoughts on “A preliminary empirical study to compare MPI and OpenMP”

Leave a comment Cancel reply