In my project, I have been thinking about something, data-oriented-design, quite a lot during the creation of new code. I’m going to present some of these concepts in current implementation and then describe moving to a specifically DOD language, opencl++. Moving into OpenCl++ seems like a natural extention of DOD C++ I use.
So, in C++, a simple Data-Oriented technology is as follows:
std::for_each( std::execution::par, container.begin(), container.end(), [&,front = &container.front()](auto& item) {
std::size_t idx = &item- front;
//item = container[idx];
item = x; // some algorithm
someOtherContainer[idx] = // some algorithm
});
Basically, we have vector of items, which are going to be iterated over and can be easily synchronized with mutexes or other mechanisms, and in some hardware parallel fashion. For each item in the container, often, that item is also associated with other containers, such that you can use idx to access a variety of information in various containers.
When containers are vectors then they could even be further optimized, potentially, by SIMD code. This could also be ran instead as std::seq to perform better in some circumstance or during debuging.
Heading in the direction of SIMD, I have done before fairly successfully, I think. However, since I am seeking parallel-power to test, I want processors other than CPU, potentially, like GPU or some super-computer. I’m pretty sure the std::par is not attempting to use gpu or other hardware, at least, or the moment it doesn’t. Therefore, I think learning and implementing OpenCL++ is the next step toward a good process.
OpenCL is a standard language for operating in a parallel fashion, on a variety of types of processing platforms, such as GPU or CPU. It is developed and maintained by Khronos Group so I expect it to be an excellent product. It is based on a set of C and there is also a C++ version which I intend to explore.
Studying OpenCL++, so far, I have had to compile three libraries and also install ruby. In the source sdk there are examples which I proceed into. The first example compiles but then crashes at runtime with a specific error code. This is apparently due to an open issue which is on their forums, where I also found a solution, and it has been an open ticket since 2021.
Implementing the solution, it seems to work just fine, though – it is particularly concerning if that is the correct solution or not – because it appears to not be. But it does work in some circimstance.
The C++ written is modern and pretty interesting, and I am just still learning about it. I have encoutered several interesting code features so far…
Some lines of C++ code:
setArgs<index + 1, T1s...>(std::forward<T1s>(t1s)...);
std::vector<int, cl::SVMAllocator<int, cl::SVMTraitCoarse<>>> output2(numElements / 2, 0xdeadbeef);
Some lines of OpenCl++ code:
"output[get_global_id(0)] = inputA[get_global_id(0)] + inputB[get_global_id(0)] + val + *(aNum->bar);"
" enqueue_kernel(childQueue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, "
" ^{"
" output[get_global_size(0)*3 + get_global_id(0)] = inputA[get_global_size(0)*3 + get_global_id(0)] + inputB[get_global_size(0)*3 + get_global_id(0)] + globalA + 2;"
" });"
Until next time!