0

Following this question, for the code below (from MS OpenMP docs example)

// omp_critical.cpp
// compile with: /openmp
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define SIZE 10

int main() { int i; int max; int a[SIZE];

for (i = 0; i &lt; SIZE; i++)
{
    a[i] = rand();
    printf_s(&quot;%d\n&quot;, a[i]);
}

max = a[0];
#pragma omp parallel for num_threads(4)
for (i = 1; i &lt; SIZE; i++)
{
    if (a[i] &gt; max)
    {
        #pragma omp critical
        {
            // compare a[i] and max again because max
            // could have been changed by another thread after
            // the comparison outside the critical section
            if (a[i] &gt; max)
                max = a[i];
        }
    }
}

printf_s(&quot;max = %d\n&quot;, max);

}

Can I remove the outside if test and do

max = a[0];
#pragma omp parallel for num_threads(4)
for (i = 1; i < SIZE; i++)
{
    #pragma omp critical
    {
        // compare a[i] and max again because max
        // could have been changed by another thread after
        // the comparison outside the critical section
        if (a[i] > max)
            max = a[i];
    }
}
user123
  • 679
  • 1
  • 5
  • 16
  • It will almost certainly be much quicker to find a thread local value for max in the loop, and then after the loop find the max of those values, and this avoids the problems above. But why not use a reduction? That's designed to do just this – Ian Bush Aug 19 '20 at 08:06

1 Answers1

1

You can, but this effectively results in sequential execution. The threads are constantly waiting to enter the critical section such that only one thread executes the loop body at a time. Hence, you get the same performance (maybe even worse due to synchronization overhead) than a plain serial loop.

The example from the MS docs only synchronizes if a new maximum value has been encountered. This allows to process all lower values up to this point in parallel.

As suggested in the comments, use a reduction construct.

cos_theta
  • 451
  • 3
  • 7