C++ comma operator and parallelization

I have never had a reason to use the comma operator, however, writing some modern code, it seems required.

Say you have some series of variables and you want to perform a common operation on the group.

std::vector<std::size_t> a, b, c;

auto generateSortingIndexes = [&](auto&...args) {

     auto gen = [&](auto& arg){

          arg.resize(poolSize);
          std::iota(arg.begin(), arg.end(), 0);
     };
            
     (gen(args), ...);
};
generateSortingIndexes(a,b,c);

New C++ is always a fun thing, I know.

We could change this to be a generic function that could be a good addition to any C++ library:

std::vector<std::size_t> a, b, c;

auto generate = [&](auto gen, auto&...args) {
   (gen(args), ...);
};

generate([&](auto& arg) {

   arg.resize(poolSize);
   std::iota(arg.begin(), arg.end(), 0);

   }, a, b, c);

In this case, maybe the best code would actually be as follows:

std::vector<std::size_t> a, b, c;

a.resize(poolSize);
std::iota(a.begin(), a.end(), 0);

b = c = a;

Though this is only because they’re all identical.

The real problem or benefit of this code is that it is somewhat a compile time expansion and also not multi-threaded at runtime. In reality, I would probably want all of that initialization to occur in parallel rather than in sequence.

I could go parallelizing this code in two easy ways:

with the standard library, I keep with the trend of passing naked variables, so the first step is to wrap them and group them. After that is the parallel part:

std::vector<std::reference_wrapper<std::vector<std::size_t>>> vectors = {  
      a
      ,b
      ,c };

std::for_each(std::execution::par, vectors.begin(), vectors.end(), [&](auto& ref) {
           
      auto& vector = ref.get();

      vector.resize(100);
      std::iota(vector.begin(), vector.end(), 0);
 });

Another way is with openmp, which looks like as follows:

#pragma omp parallel
{
 for (int i = omp_get_thread_num(); i < vectors.size(); i += omp_get_num_threads()) {

     auto& vector = vectors[i].get();

     vector.resize(100);
     std::iota(vector.begin(), vector.end(), 0);
 }
}

What a fascinating loop paradigm that openmp produces, the loop is executed in parallel stripes.

A simpler way would be

std::vector<std::reference_wrapper<std::vector<std::size_t>>> vectors = {  
        b
      , c };

#pragma omp parallel
{
 for (int i = omp_get_thread_num(); i < vectors.size(); i += omp_get_num_threads()) {

     auto& vector = vectors[i].get();

     vector = a;
 }
}

Share this:

Related

Leave a comment Cancel reply