Since a large property of a neural network is indeed its initial or current random state, maybe we can consider this entire state as just a vector of randomly generated values. In reality, that’s what it is on the computer. Each of these values is generated via a random number generator in a specific sequence. A question that I want to examine is whether or not the nature of the random number generator effects a network’s capacity to learn. I’m going to write a series of random number generators and see if they affect learning capacity in some way. One generator I would like to make or find, which I have only just thought up, is a pi digit generator, where each new digit represents a random number. The idea is that maybe instead of always doing back propagation, and trial and error, maybe there is a way to generate intelligent network initial random state via specific random number initialization. I think, each digit of pi can be viewed as a random number since it is non-repeating, and given any series of values, and random, I think, if you don’t consider the digit where you started counting. At the same time, the initial or starting digit of pie is the random seed which can be use deterministically. Do random number generators affect network intelligence? Where does this go? We will see, maybe.
Configurations of random variables
In my neural network program, I refactored some code and produced an error which I did not notice for some time. When I would run the program, eventually, out of 100 networks, a single or few networks would learn the pathing problem. With no difference in how they are trained, they are all taught exactly the same way – and with a bug, too. All taught the same, but the results range from correct to odd, the differnence in training success is enormous. The only difference between these networks is their initial random configuration. The initial state of the network’s weights and biases, which are produced through some random number generation sequence.
Ok, onto the bug in code. I had a function which for some reason changed through error into merely:
LEARN_RATE = cos(trainNum)
This causes the learn rate to vary by cos function from -1 to 1. When learn rate is above zero, a correction is made toward the correct solution, and when it is below zero, a correction is made away from the correct solution. With all networks trained exactly the same way, sometimes learning, sometimes unlearning, maybe 1% learn in a reasonable amount of time.
What does this all mean, unlearning and learning, I don’t know! But I think it may show the profoundness of Lottery Ticket. It seems to boil down to “configurations of random variables” being an important part in learning capacity and ultimately problem solving capacity.
Writing custom Random Number Generators and performance Comparisions
For some reason, I became concerned about the amount of time being spent by random number generators, so I took a look into them. I wrote a program which compares the speed of std::mt19937 ( mersenne twister ), a custom “twister loop”, a custom “pcg”, and custom “xorshift” generators. The standard twister was the slowest, though I’m unsure if it matters since they’re all quite fast. The fastest turned out to be my “twister loop”, though, this is also the least random, and it’s barely the fastest.
This code shows how to write a custom random number generator that can be used with std functions such as shuffle and distributions. Here are the results, followed by the code.

It turns out that std::twister is pretty fast compared to the custom “faster” generators I discovered online. The fastest generator, twister_loop, is just an array filled with twister values being referenced rather than an actual generator.
Here is the code:
#include <random>
#include <iostream>
#include <chrono>
class TwisterLoop{
public:
using result_type = uint32_t;
static constexpr result_type(min)() { return 0; }
static constexpr result_type(max)() { return UINT32_MAX; }
//
std::mt19937 mTwister;
std::size_t mCurrentIdx{ 0 };
std::vector< result_type > mValues;
result_type generateOne() {
return mTwister();
}
void refreshLoop() {
std::generate(mValues.begin(), mValues.end(), [&]() {
return generateOne();
});
}
TwisterLoop(result_type rdseed, std::size_t loopSize=1000){
seed(rdseed);
mValues.resize(loopSize);
refreshLoop();
}
void seed(result_type rdseed){
mTwister.seed(rdseed);
}
result_type operator()(){
if (mCurrentIdx >= mValues.size()) mCurrentIdx = 0;
return mValues[mCurrentIdx++];
}
void discard(unsigned long long n){
for (unsigned long long i = 0; i < n; ++i)
operator()();
}
};
class pcg{
public:
using result_type = uint32_t;
static constexpr result_type(min)() { return 0; }
static constexpr result_type(max)() { return UINT32_MAX; }
pcg()
: m_state(0x853c49e6748fea9bULL)
, m_inc(0xda3e39cb94b95bdbULL)
{}
pcg(std::random_device& rd){
seed(rd);
}
void seed(std::random_device& rd){
uint64_t s0 = uint64_t(rd()) << 31 | uint64_t(rd());
uint64_t s1 = uint64_t(rd()) << 31 | uint64_t(rd());
m_state = 0;
m_inc = (s1 << 1) | 1;
(void)operator()();
m_state += s0;
(void)operator()();
}
result_type operator()(){
uint64_t oldstate = m_state;
m_state = oldstate * 6364136223846793005ULL + m_inc;
uint32_t xorshifted = uint32_t(((oldstate >> 18u) ^ oldstate) >> 27u);
int rot = oldstate >> 59u;
return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
}
void discard(unsigned long long n){
for (unsigned long long i = 0; i < n; ++i)
operator()();
}
private:
uint64_t m_state;
uint64_t m_inc;
};
class xorshift{
public:
using result_type = uint32_t;
static constexpr result_type(min)() { return 0; }
static constexpr result_type(max)() { return UINT32_MAX; }
xorshift() : m_seed(0xc1f651c67c62c6e0ull) {}
xorshift(std::random_device& rd){
seed(rd);
}
void seed(std::random_device& rd){
m_seed = uint64_t(rd()) << 31 | uint64_t(rd());
}
result_type operator()(){
uint64_t result = m_seed * 0xd989bcacc137dcd5ull;
m_seed ^= m_seed >> 11;
m_seed ^= m_seed << 31;
m_seed ^= m_seed >> 18;
return uint32_t(result >> 32ull);
}
void discard(unsigned long long n){
for (unsigned long long i = 0; i < n; ++i)
operator()();
}
private:
uint64_t m_seed;
};
int main() {
std::random_device rd;
std::mt19937 twister(rd());
TwisterLoop twisterLoop(rd(), 10000);
pcg pcg1(rd);
xorshift shift(rd);
std::uniform_real_distribution<float> reals(-100.0f, 100.0f);
std::vector<float> results; results.resize(1000);
auto time = [&](auto& generator) -> std::size_t{
auto start = std::chrono::high_resolution_clock::now();
std::generate(results.begin(), results.end(), [&]() { return reals(generator); });
std::shuffle(results.begin(), results.end(), generator);
auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
};
auto test = [&](const std::string& caption, auto& generator) {
std::size_t testCount = 100;
std::size_t avg = 0;
std::cout << "testing: " << caption << " " << testCount << " times" << std::endl;
for (std::size_t i = 0; i < testCount; ++i) {
avg += time(generator);
}
avg /= testCount;
std::cout << "took: " << avg << " nanoseconds"
<< std::endl << std::endl;
};
test("std::twister", twister );
test("twister_loop", twisterLoop);
test("pcg", pcg1);
test("shift", shift);
std::cout << "\nprogram finished, press enter";
std::cin.get();
std::cin.get();
return 0;
}
The pcg and xorshift generators are credited to /* Copyright (c) 2018 Arvid Gerstmann. https://gist.github.com/Leandros */
Dumb Neural Networks
Lottery Ticket Theory says that some networks do not train well if at all, based on random properties of the network. In my current project, I have 100 agents, each with their own neural network, same shape. Each of these networks trains exactly the same way, but some end up acting smart and others dumb. This is a pool of just 100 networks.
In this experiment, the agents are trying to path to a given point on the terrain, outlined with a large yellow circle. As I move this circle, agents should change their direction toward the new position. What I do is I select a point, then enter the program into training mode, where all of the agents are training using exactly the same backpropagation ( but not genetics, ATM). Then, I turn off training and see how they behave.
As you can see, a small amount of the networks have failed to learn very well.
Neural Networks and Constants
In my research, it is possible to train a network and have it learn about a constant such as PI, and put it to use in its function. However, if we pass PI in as an input, rather than having to ‘teach’ PI, then training is worlds faster. The network merely learns to use PI rather than derive it and use it simultaneously (though this is possible). It seems like if there are any sort of constants which could be useful to the function, then they should be passed in. Maybe this can be extended to full equations. Rather than the network learning to derive a given equation or function, those definitions can be passed into the network along with constants and other inputs instead. In this way, networks can be ‘educated’.
Brute Force Collision Detection with Data Oriented Design and modern C++.
I have started rewriting my game engine, and I’ve just written a de-collision algorithm for objects. It runs purely through brute-force, using no space-partitioning. In my opinion, this is a good demonstration of what’s possible through DoD and modern C++. The code for the decollision algorithm will be presented at the end, but here is the associated video: ( The black boxes at the begining are lighting issues which resolve shortly)
When the video starts, the 5000 objects are alll intersecting multiple other objects, and they slowly vibrate outward until settling and causing no further collisions.
Normally in a game, you want things to collide as little as possible, using pathing and collision-avoidance methods. When things colliide and intersect, you must undo this and that tends to propagate a wave of decollision through crowds, which is observable in the video. This is the worst-case scenario. In the past, this problem is mitigated using some sort of space-partitioning method to reduce the amount of collision comparisons in addition to avoidance methods. In this example, however, there is no such space partitioning and it is entirely brute-forced using mutlithreading and some Data Oriented concepts.There are further improvements that could be made still to this simple method. The de-collision routine works in real time mostly, and the performance problems in the video have more to do with the 5k lights.
Using a collision-avoidance algorithm of some sort is far more performant than using a dumb decollision algorithm like the one used in this video. That is easy to do, too, since the objects ultimately will be managed by AI and other systems, they should only rarely collide if ever.
The code in the decollision routine is very simple and compares all 5000 objects to all other 5000 objects and modifies their position each time a collision is detected. It is implemented fairly simply using DOD and modern c++. I am going to present this code as a snippit to go with the video but will not describe it much as it is still being developed. The main things that you may notice are the parallel for loops, and in them, looping over *iota (misspelled in code atm) indexes which is a DOD design I’m trying out. I am going to try using a mutex rather than an atomic since atomic seems to be problematic in this algorithm.
void RtsObjectManager::decollide() {
std::vector< AtomicWrapper<Ogre::Vector2> > collisions; collisions.resize(mObjects.size());
std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& ai) {
auto& p = mObjects[ai].mPosition;
collisions[ai]._a = { p.x, p.z };
});
std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& ai) {
auto& a = mObjects[ai];
Ogre::Vector2 ap = collisions[ai]._a.load();
Ogre::Real ar = a.mTemplate->m2DRadius;
std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& bi) {
if (ai == bi) return;
auto& b = mObjects[bi];
Ogre::Vector2 bp = collisions[bi]._a.load();
Ogre::Real br = b.mTemplate->m2DRadius;
Ogre::Real dist = ap.distance(bp);
if (dist < ar + br) {
//collision
//push each away
Ogre::Vector2 dir;
if (dist == 0) {
dir = { 1.0, 0 };
dist = 1.0;
}
else
dir = (ap - bp);
dir.normalise();
dir *= dist;
ap.x += dir.x; ap.y += dir.y;
collisions[ai]._a = ap;
bp.x -= dir.x; bp.y -= dir.y;
collisions[bi]._a = bp;
}
});
});
std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& ai) {
auto& a = mObjects[ai];
Ogre::Vector2 p = collisions[ai]._a.load() ;
a.mPosition.x = p.x;
a.mPosition.z = p.y;
});
}
C++ Binary Serializer
Binary Serialization can be sort of scary, maybe, to some people. Fortunately, it is super simple, straight forward, and even elegant, in C++. Binary data is the raw representation of information on the computer, and there are easy ways to work with it in C++. The benefits of using Binary are that it can save a lot of space compared to ascii, which will give speed benefits on the computer and accross the network.
Here is an example of ASCII data:
Nick is nearly 40. PI is 3.14159265359
There are two types of data here, text and number. In binary, Text looks like Text. Numbers, however, look quite different.
PI for instance, in binary, looks like the following:
ê.DTû! @
This is a double and represented by eight bytes or characters. As a float, we get four bytes or characters.
ÛI@
This is considered a non-human readable format and common editors would be hex editors. In a hex editor, the float looks like so:
DB 0F 49 40
OK, onto C++.
There are two simple says to work with Binary in C++. With streams and with strings. To start, we will examine an fstream object.
We open the fstream using the binary and write flags:
class BinaryWriter {
std::ofstream fout;
public:
BinaryWriter(const std::string& fileName) {
fout.open(fileName, std::ofstream::binary | std::ofstream::out);
}
Next, there is a straight forward templated method for primitives and simple objects:
template<typename T>
BinaryWriter& operator<<(const T& data) {
std::size_t size = sizeof(data);
for (int i = 0; i < size; ++i) {
fout.put(reinterpret_cast<const char*>(&data)[i]);
}
return *this;
}
we can overload this method for more complicated or special data types:
BinaryWriter& operator<<(const std::string& data) {
fout.write(data.data(), data.size() );
return *this;
}
Using these methods is simple and could look like so:
BinaryWriter bw("out.bin");
float pi = 3.14159265359;
bw << pi;
std::string str("nick");
bw << str;
The generated binary file looks like so:
ÛI@nick
Normally, when working with ascii files, content tends to be white-space delimited, or null-terminated. In binary, this is not the case, we can have white spaces and null characters anywhere. This means that all fields must be constant in size, the size of the field must be detailed somewhere in the file, or in a file-format.
To read the binary file, we have a straight forward solution:
class BinaryReader {
std::ifstream fin;
public:
BinaryReader(const std::string& fileName) {
fin.open(fileName, std::ifstream::binary | std::ifstream::in );
}
~BinaryReader() {
fin.close();
}
bool fail() {
return fin.fail();
}
template<typename T>
BinaryReader& operator>>( T& data) {
std::size_t size = sizeof(data);
for (int i = 0; i < size; ++i) {
fin.get(reinterpret_cast<char*>(&data)[i]);
}
return *this;
}
BinaryReader& operator>>(std::string& data) {
fin.read(data.data(), data.size());
return *this;
}
Keeping in mind that in our case the string field is a constant size which is not defined in the file, here is the reading code:
BinaryReader br("out.bin");
float pi;
br >> pi;
std::string str; str.resize(4);
br >> str;
This has been an example of working with binary using fstream. Many folks do not realize that you can also store binary data in std::string. std::string is not any sort of c-string, it can store any sort of characters. Unless you call the c_str method, you do not need to work with a c-string.
Here is an example of using std::string with c-str vs binary:
std::string stra = "\0\0\0b\0e"; //this is a c-str assignment, does not work
std::string strb = { '\0','\0','\0','b', '\0', 'e' }; //binary-ok initialization
std::cout << strb; //works with cout, writes: be, all the data is present in the string/array
As long as the string.size() is correct, all binary data in the string will be preserved, and can be worked with considering it contains binary data and like any other array. It is possible to use std::string to store simple BLOBS, or any sort of binary, for use over the network or in any other way.
Lottery Ticket Hypothesis in action
I have written an application which creates pools of 1000 neural networks. One test performs backpropagation training on them. A second test performs backpropagation, and a genetic algorithm. The amount of times training is called for each test is the same. The genetic algorithm seems to actually be able to converge on a lottery ticket and seems to always outperform backpropagation training alone.
Here is example output which will be explained:

In this example there is a total network poolsize of 1000 random networks. This network pool size is used for both the tests. To compare networks, their radius is examined, and the smaller it is, the better.
In the first test, backpropagation, the best network converges pretty well, to 0.0022~. The wost network, however, converges to 0.031~, a much worse solution. The amount of training called on the networks is constant, and yet there is a vast difference in performance based on Lottery Ticket, in a pool of just 1000 networks.
Lottery Ticket means that more ideal networks can be found and distributed periodically. The thing is, backpropagation, in its most simple form, is not a Lottery Ticket Algorithm. Genetic Algorithms are indeed a Lottery Ticket Algorithm, and it seems that backpropagation can be put into them naturally. This causes networks to evolve and change in ways that backpropagation does not even consider, for example: neuron biases. It also makes backpropagation consider itself, as it is performed socially among a pool of existent networks.
The second test is genetic. It starts from scratch with 1000 new random networks, runs for 400 generations, and calls training (backpropagation) the same amount of times as the earlier test. On generation 65, a Lotto is won for a particular network, and it ends up outperforming all others found in either tests. The amount of trainings on the winning network is the sum of its parents and the remaining trainings it did. That winning network, went into all future generations along with various other algorithm and mutations of networks. I guess I am going to consider this a Lottery Network.
What we need to do is somehow, find these networks that exist, maybe, or could, but are hard to find, and the search could never end. I think to do both backpropagation and genetic algorithms and other functions such as pruning, growth, or other mutation. What else can be used? It seems like there are likely concepts in math or other sciences which could contribute. Recently, in creating mutation functions, I applied a normal distribution function and am performing math on normal distributions, which is new to me, or what else could be available or how this should evolve is unknown.
the basic normal distribution function I’ve made looks like the following on desmos:
https://www.desmos.com/calculator/jwblzdekz0
written in C++ like:
std::normal_distribution<long double> normal_dist(0.22);
auto normal_mostlyZero_seldomOne = [&]()->long double {
return std::abs((normal_dist(random)/2.0));
};
It is used to ‘decide’ when to mutate something and by what magnitude. Most of the time this magnitude is small and otherwise approaches one. It seems like evolutionary algorithms could be benefited with the study of probability maths or other maths and I may do this at some point.
Next is to add methods of pruning and growth to networks to further investigate Lottery Ticket.
My AI study so far
I have been studying neural networks for some time, and recently during a YSU hackthon, I managed to make interesting progress. After about a year long break, I return to this code and make large amounts of progress and a number of topics have presented in C++ software.
I’m going to describe some of my journey into AI in C++ and talk about AI in blog format from the persepective of a modern C++ developer.
Neural networks are in concept similar to brains. What they are is a pattern processing algorithm. In typical neural networks, there are layers of patterns, each modified by the previous in sequence. When this sequence goes forward, we do something called Feed-forward, which is taking a pattern, running it through the network, and producing a pattern-deduced result. So, a network could take input images/patterns of dogs, and output whether it is a dog or not. Feed Forward can be reversed in a way through something called Back Propagation. During Back Propagation, Calculus is used to adjust the pattern’s error throughout the network. The result is training to recognize the pattern, making Feed Forward less error-prone.
This, at least, was my understanding after completing the hackthon code. What I had done during the hackthon was reference a blog-paper which describes the calculus of backpropagation and produce a seemingly functional algorithm in C++.
These sequenced Neural Networks are easily represented as series of vectors and matrices, and Feed Forward is as simple as performing maths across the series to its end, the output. Feed-forward can produce accurate results and the effect of pattern recognition. A thought is the vector result of Feed Forward for any arbitrary input. The thought about each training data represents the network’s accuracy about the training data, and is said to converge to some accuracy. Matrices may not be the best way to represent this data in light of modern Data-Oriented concepts.
In my study of Backpropagation, it seems like the Backpropagation function produces a magnitude of error. That magnitude then is used to adjust existing error. Adjusting by this magnitude does not train the network in a single iteration, (for reasons that will be explained), therefore Backpropagation is done iteratively.
One reason it is done iteratively, has to do with the initial solution surface of the network, the initial configuration. The initial configuration of the network can have beneficial or detrimental effects on how the network converges if at all. This seems problematic and so we could move the problem from training networks toward finding solution surfaces for initial or final configurations of networks. In searching vast random solution space there is an immediate ally, genetic algorithms. Backpropagation and Genetic Algorithms combine to make an error direction and ability to move across solution surfaces efficiently with averaging and bounded random movement. This sort of algorithm also scales by increasing parallel hardware, which is ideal. The question is whether there is a better way to search for ideal networks. Out of a potential infinity of vast solution space, what is the most efficient way to converge on an ideal network? How long should it take? What is the ideal network, or how could it emerge?
This leads to something called: Lottery Ticket Hypothesis. The idea goes, for a given network, there is a smaller network that converges the same or similarly. Finding this network, that is winning the lottery. The question is how to find it, and in the end: it could be easy to find, stumbled upon eventually, or entirely impossible. In the algorithm for this theory, a network is trained using backpropagation. Next, it is pruned by the least contributing factors and then trained more, in a repeated fashion. I think what must happening, is that when a least contributing factor is eliminated, existing factors must take up slack or the network will not converge.
The existence of a converging network or its ability to take up slack, depends on the numbers that can be found within it. Future computers are going to need ever better and more accurate floating-point number representation, in order to make finding these networks more likely. It is entirely possible that the existence of networks has to do about their ultimate hardware representation and they may not be very portable.
Comments, Concerns, feel free to post replies
Introduction to Data Oriented Design
Object oriented programming is the typical go-to method of programming, and it is super useful, especially in a program’s “sculpting”/exploration or design phase. The problem with it, however, is that modern computer hardware has evolved and typical OOP practices have not. The main issues are that hardware is getting more parallelized and cache-centric. What this means is that programs benefit when they can fetch data from the catch rather than RAM, and when they can be parallelized. Typical, Object Oriented Programming, it turns out, does little to take advantage of these new hardware concepts, if indeed it does not oppose them.
Data Oriented Design is a design philosophy being popularized in video game programming, where there is a lot of opportunity for both parallelization and cache consideration. I think this concept can be useful to some extent everywhere, however, and a programmer should be aware of what it is and how to make use of it. Cache speed is enormous compared to RAM, and by default, there are absolutely no considerations about it being made by programmers. This should change. Unfortunately, a lot of popular languages, developed and evolved in the concept of infinite resources, a logical fallacy, offer little to no support for cache-centric programming (Java/.Net). Fortunately, in C++, we have future. Working with DoD is going to be beneficial until (if) computer hardware ever becomes unified and the “cache” becomes gigabytes in size. Until then, however, we must work with both RAM and Cache in our designs, if they are to be performant.
To start, lets look at typical OOP and what sort of performance characteristics it has.
First off, the simple Object and an Array of them:
class Object{
public:
int x, y, z;
float a, b, c;
int* p;
};
std::vector<Object> objects;
In this example, we have various variables composed into an object. There are several things going on here:
a) The Object is stored in a contiguous segment of memory, and when we go to reference a given variable, ( x, y, z, a, b, c, p ), the computer goes out to fetch that data, and if it is not already in the Cache, it gets it from Ram, and memory near to it, and puts it into the Cache. This is how the Cache fills up, recently referenced data and data near to it. Because the cache is limited in size, it periodically will flush segments to RAM or retrieve new segments from RAM. This read/write process is takes time. The key concept here is that it also fetches *nearby* data, anticipating that you may be planning to access it sequentially or that it may be related and soon used.
b) The variable, (p), is a pointer. When this variable is dereferenced, the cache is additionally filled by that data ( and near data) additionally. Fetching memory from a pointer may cause additional catch filling/flushing in addition to merely considering the pointer itself.
c) considering the vector: objects; objects is a contiguous segment of memory and accessing non-pointer data sequentially is ideal. When element x is referenced, so then there is a good chance that x+y and x-y elements were all stored into the cache simultaneously for speedy access. Because the vector is contiguous, all of the bytes of every object are stored in sequence. The amount of objects accessible in the Cache is going to depend on the size of the Object in bytes.
We will examine an example application comparing the performance of OOP and DOD.
To begin, lets create a more realistic OOP object to make a comparison with:
class OopObject {
public:
Vector3 position, velocity;
float radius;
std::string name;
std::chrono::steady_clock::time_point creationTime;
void load() {
//initialize with some random values;
position = reals(random);
velocity = reals(random);
radius = reals(random);
name = "test" + std::to_string(radius);
creationTime = std::chrono::steady_clock::now();
}
void act() {
//this is a function we call frequently on sequences of object
position += velocity;
}
};
OK, and now the unit test, using GoogleTest:
TEST(OopExample1, LoadAndAct) {
std::vector<OopObject> objects;
objects.resize(numObjects);
time("load", [&]() {
for (auto& o : objects) {
o.load();
}
});
time("act", [&]() {
for (auto& o : objects) {
o.act();
}
});
float avg = 0;
time("avg", [&]() {
float avg1 = reals(random);
for (auto& o : objects) {
avg1 += o.radius;
}
avg1 /= objects.size();
avg += avg1;
});
std::cout << avg;
EXPECT_TRUE(true);
}
We interact with this object in a typical OOP fashion, we create a vector, dimension it, loop through, initialize the elements, then we loop through again and do the “act” transform, and then we loop through again and do an “avg” calculation.
To start programming DoD, we need to consider the Cache rather than using it by happenstance. This is going to involve organizing data differently. In Object Orientation, all data is packed into the same area and therefore is always occupying cache, even if you only care about a specific variable and no others. If we are iterating over an array of objects and only accessing the same 20% of data the whole time, we are constantly filling the cache with 80% frivolous data. This leads to a new type of array and a typical DoD Object.
The first thing we do is determine what data is related and how it is often accessed or what we want to tune towards, then we group that data together in their own structures.
The next thing we do is create vectors of each of these structures. The DoDObject is in fact a sort of vector manager. Rather than having a large vector of all data, we have series of vectors with specific data. This increases the chances of related data being next to each other, while optimizing the amount that’s actually in the cache because there’s no space taken by frivolous data.
class DoDObject {
public:
struct Body {
Vector3 position, velocity;
};
struct Shape {
float radius;
};
struct Other {
std::string name;
std::chrono::steady_clock::time_point creationTime;
};
std::vector< Body > bodies;
std::vector<Shape > shapes;
std::vector<Other> others;
DoDObject(int size) {
bodies.resize(size);
shapes.resize(size);
others.resize(size);
}
void load() {
for (auto& b : bodies) {
b.position = reals(random);
b.velocity = reals(random);
}
for (auto& s : shapes) {
s.radius = reals(random);
}
int i = 0;
for (auto& o : others) {
o.name = "test" + std::to_string(shapes[i++].radius);
o.creationTime = std::chrono::steady_clock::now();
}
}
void act() {
for (auto& b : bodies) {
b.position += b.velocity;
}
}
};
Lets look at the results, at the end will be the complete source file.
(AMD Ryzen 7 1700)

(AMD FX 4300)

(Intel(R) Core(TM) i7-4790K)

DOD has won on every test. This is a simple example program, too. Actual programs are going to see even larger gains.
Here’s the complete source of this example: (GoogleTest C++)
#include "pch.h"
#include <vector>
#include <chrono>
#include <string>
#include <random>
std::uniform_real_distribution<float> reals(-10.0f, 10.0f);
std::random_device random;
int numObjects = 100000;
struct Vector3 {
float x, y, z;
Vector3() = default;
Vector3(float v) : x(v), y(v + 1), z(v + 2) {}
Vector3& operator+=(const Vector3& other) {
x += other.x;
y += other.y;
z += other.z;
return *this;
}
};
class OopObject {
public:
Vector3 position, velocity;
float radius;
std::string name;
std::chrono::steady_clock::time_point creationTime;
void load() {
//initialize with some random values;
position = reals(random);
velocity = reals(random);
radius = reals(random);
name = "test" + std::to_string(radius);
creationTime = std::chrono::steady_clock::now();
}
void act() {
//this is a function we call frequently on sequences of object
position += velocity;
}
};
class DoDObject {
public:
struct Body {
Vector3 position, velocity;
};
struct Shape {
float radius;
};
struct Other {
std::string name;
std::chrono::steady_clock::time_point creationTime;
};
std::vector< Body > bodies;
std::vector<Shape > shapes;
std::vector<Other> others;
DoDObject(int size) {
bodies.resize(size);
shapes.resize(size);
others.resize(size);
}
void load() {
for (auto& b : bodies) {
b.position = reals(random);
b.velocity = reals(random);
}
for (auto& s : shapes) {
s.radius = reals(random);
}
int i = 0;
for (auto& o : others) {
o.name = "test" + std::to_string(shapes[i++].radius);
o.creationTime = std::chrono::steady_clock::now();
}
}
void act() {
for (auto& b : bodies) {
b.position += b.velocity;
}
}
};
template<typename Action>
void time(const std::string& caption, Action action) {
std::chrono::microseconds acc(0);
//each action will be executed 100 times
for (int i = 0; i < 100; ++i) {
auto start = std::chrono::steady_clock::now();
action();
auto end = std::chrono::steady_clock::now();
acc += std::chrono::duration_cast<std::chrono::microseconds>(end - start);
}
std::cout << caption << ": " << (acc.count() / 100) << "\t";
};
TEST(DoDExample1, LoadAndAct) {
DoDObject dod(numObjects);
time("load", [&]() {
dod.load();
});
time("act", [&]() {
dod.act();
});
float avg = 0;
time("avg", [&]() {
float avg1 = reals(random);
for (auto& o : dod.shapes) {
avg1 += o.radius;
}
avg1 /= dod.shapes.size();
avg += avg1;
});
std::cout << avg;
EXPECT_TRUE(true);
}
TEST(OopExample1, LoadAndAct) {
std::vector<OopObject> objects;
objects.resize(numObjects);
time("load", [&]() {
for (auto& o : objects) {
o.load();
}
});
time("act", [&]() {
for (auto& o : objects) {
o.act();
}
});
float avg = 0;
time("avg", [&]() {
float avg1 = reals(random);
for (auto& o : objects) {
avg1 += o.radius;
}
avg1 /= objects.size();
avg += avg1;
});
std::cout << avg;
EXPECT_TRUE(true);
}