Featured

## Programming Basics with C++ explained in Assembly

Like I said earlier, this post is a sort of introduction to programming, however, it is not a typical, “Hello World” introduction. We’ll be using C++ and inline Assembly to investigate what exactly is going on in a program and C++. The following will be explained:

• what a C++ program is and looks like to a computer
• what variables are and how the stack and memory works
• what functions are and how they work
• how pointers work
• how math and logic works, such as when you evaluate an equation

What a C++ program is and what it looks like to the computer

C++ and any programming language is just a bunch of words and syntax used to describe a process for the computer to perform.

You write a bunch of stuff, and the computer does exactly what you commanded it to. That said, while a program is a series of words and syntax to us, to the computer, it ends up being quite different. Computers only understand numbers, so somewhere between when a program is written and when it is executed by the computer, it gets interpreted from the programming language into numbers – Machine Code.

For example, the following program:

r = x + x * y / z - z * x

Looks like this in Machine Code:

Programming languages exist for a several reasons:

• to provide an alternative to writing in Machine Code
• to make writing maintainable and manageable code bases possible
• to allow programmers to express themselves in different ways
• to accomplish a task with efficiency given a language customized to that task

Finally, the basic C++ program that writes, “Hello World!”, to the console screen:

int main(){
std::cout << "Hello World!";
return 0;
}

In Machine Code, it looks like this:

C++ is quite the improvement.

The language that translates closely to Machine Code is Assembly Language. While both Machine Code and Assembly Language are unpractical to work with, Assembly Language is actually something people do use, and we are going to use it to link what’s going on in the computer to what’s going on in C++.

Before we can move into Assembly, however, we need to review some basics of how programs work.

Memory

As we know, computers work with numbers, and a program is ultimately a bunch of numbers. The numbers are stored in the the computer in a place called memory, and lots of them, more numbers than you could count in your lifetime. It can be thought of as an array, grid or matrix where each cell contains a number and is a specific location in memory.

A visualization of memory.

These cells exist in sequence and that sequence describes its location. For instance, we could start counting memory at cell 1, and go to cell 7, and each of those locations could be referred to by their sequence number: 1 to 7.

The numbers in memory can represent one of two things:

• a Machine Code or operand used to execute a command
• some sort of Variable – an arbitrary value to be used by, or that has been saved by, a program

In programming languages such as C++, variables are referred to using names rather than their sequence number, and Machine Code is abstracted away with verbose language.

A snippet of C++ that declares three variables and prints them out.

The same snippet in Assembly; for glancing only.

The Stack: a type of Memory for Variables

Whenever a Variable is declared, it gets allocated in an area of Memory called the Stack. The Stack is a special type of Memory which is essential to how a C++ program executes. When a Variable enters scope, it gets allocated onto the Stack, and then later, when it exits scope, it’s removed from the Stack. The Stack grows in size as Variables enter scope and shrinks as Variables exit scope.

In C++, curly braces “{ “ and “ } “ are used to define a variable’s scope, also known as a block or epoch.

A C++ program with several blocks:

void main(){	//begin block a:

//x is allocated, stack size goes to 1
int x = 0;

{	// begin block b:

// k, l and m are allocated, stack size goes to 4
int  k = 2,  l = 3, m = 4;

// an example operation
x = k + l + m;

}	// end of block b, stack size shrinks back to 1

// - back to block a - y is allocated, stack size goes to 2
int y = 0;

{	// begin block c:

// l and o are allocated, stack size goes to 4
int l = 12, o = 2;

// an example operation
y = l * o;

}	// end of block c, stack size shrinks back 2

// - back to block a - z is allocated, stack size goes to 3
int z = 0;

// an example operation
z = x - y;

// write result, z, to the console
std::cout << z << std::endl;

}	// end of block a, stack shrinks to size 0


x86 Assembly Brief on the Stack

Assembly is really simple. Simple but lengthy and hard to work with because of its lengthiness. Lets jump right in.

A line of Assembly consists of a command and its operands. Operands are numbers of course, but these numbers can refer to a few different things: Constants ( plain numbers ), Memory locations (such as a reference to a Stack Variable), and Registers, which are a special type of Memory on the processor.

The Stack’s current location is in the esp register. The esp register starts at a high memory location, the top of the stack, such as 1000, and as it grows, the register is decremented, towards say, 900 and 500, and eventually to point where the stack can grow no further. General purpose registers include eax and are used to store whatever or are sometimes defaulted to with certain operations.

There are two commands that manipulate the stack implicitly:

• push [operand]

put the operand value onto the Stack, and decrement esp

• pop [operand]

remove a value from the Stack, store it in the operand register, and increment esp

Registers:

• esp

the Stack’s top, moves up or down depending on pushes/pops, or plain adds/subtracts.

• eax

a general purpose register

Given these two instructions we can create and destroy stack variables. For instance, similar to the program above but without the example equations:

_asm{

push 0		//create x, with a value of 0

push 2		//create k
push 3		//create l
push 4		//create m
//do an equation
pop eax		//destroy m
pop eax		//destroy l
pop eax		//destroy k

push 0		//create y

push 12		//create l
push 2		//create o
//do an equation
pop eax		//destroy o
pop eax		//destroy l

push 0		//create z
//do an equation

pop eax		//destroy z
pop eax		//destroy y
pop eax		//destroy x
}

Another way to manipulate the Stack/esp involves adding or subtracting from esp. There is an offset in bytes which is a multiple of the Stack alignment, on x86, this is 4 bytes.

add esp, 12 	// reduce the stack by 12 bytes – same as 3 * pop
sub esp, 4 	// increase the stack by 4 bytes – same as 1 * push

The difference in usage is that add and sub do not copy any values around. With add, we trash the contents, and with sub, we get uninitialized space.

Once Stack space has been allocated, it can be accessed through dereferencing esp. Dereferencing is done with brackets.

[esp]  		// the top of the stack, the last push
[esp + 4] 	// top + 4 bytes, the second to last push
[esp + 8] 	// top + 8 bytes, the third to last push
//[esp – x ] 	makes no sense

• sub [subract from, store result], [ operand to subtract ]

subtracts the operand from the register

• mul [operand to multiply by eax, store in eax ]

multiplies the operand by the eax register

• mov [ copy to ], [ copy from ]

copies a value from a Register to Memory or vice versa.

_asm{

push 0			//create x, with a value of 0

push 2			//create k
push 3			//create l
push 4			//create m

//k + l + m;
mov eax, [esp + 8]	// copy k into eax

add esp, 12		//destroy k through m

//store the value in x
mov [esp], eax

push 0			//create y

push 12			//create l
push 2			//create o

// l * o
mov eax, [esp + 4]	 //copy l into eax
mul [esp]		 //multiply o by eax

add esp, 8		//destroy l and o

//store the value in y
mov [esp], eax

push 0			//create z

//x - y;
mov eax, [esp + 8]
sub eax, [esp + 4]

mov [esp], eax		//write to z; z = -15

add esp, 12		//destroy z through x

}


Functions

Functions are a way to associate several things:

• A Stack block
• Code
• A useful name

As we know, rather than writing out the same code over and over, to reuse it, we can use a function.

But that is not the end of what functions are, they are also Named Code Segments, a segment of code with has a useful name, a name that both describes what the function does, and that be used to call that function in the language.

In C++ functions have four properties:

• return type
• function name
• parameters
• function body

And the following prototype:

[return type] [name]([parameters]){[body]}

In C++ a sum function would look like so:

int sum( int x, int y ){
return x + y;
}

In assembly to sum these two values we’d do the following:

push 5 			// y = 5
push 4 			// x = 4
mov eax, [esp] 		//eax = x
add eax, [esp+4]	//eax = 9
add esp, 8		//clear stack

You can see there are three stages here:

• Setting up the function’s parameters using push
• the actual function code
• resetting the stack

We are also missing three things, the function name, the return type, and actually calling the function. In assembly, a function name is the same as a C++ label:

sum:

Now comes an interesting part – assembly – since we can do anything in it, is going to implement something called a: Calling Convention – how the function’s parameters are passed, how the stack is cleaned and where, and how the return value is handled.

In C++ the default calling convention is called cdecl. In this calling convention, arguments are pushing onto the stack by the caller, and in right to left order. Integer return values are saved in eax, and the caller cleans up the stack.

First off, to actually call a function we could use a command, jmp. Jmp is similar to the C++ goto statement. It sends the next instruction to be processed to a label. Here’s how we could implement a function using jmp:

_asm{

jmp main

sum:		//sum function body
mov eax, [esp+4]	//we have a default offset due to the return address

main:		//main program body
push 5			//we plan to call sum, so push arguments
push 4
push sumreturn		//push the return address
jmp sum			//call the function

sumreturn:
add esp, 12		//clean the stack
//result is still in eax
}

To make this much easier, there are additional instructions:

• call [operand]

automatically pushes the return address onto the stack and then jumps to the function

• ret [ optional operand ]

automatically pops the return address and jumps to it

These instructions wrap up some of the requirements of function calling – the push/pop of the return address, the jmp instructions, and the return label. With these additional instructions we’d have the following:

_asm{

jmp main
sum:
mov eax, [esp+4]	//we have a default offset due the to return address
ret
main:
push 5			//we plan to call sum, so push arguments
push 4
call sum

add esp, 8		//clean the stack
//result is still in eax
}


In other calling conventions, specifically x64 ones, most parameters are passed in registers instead of on the stack by default.

See: https://en.wikipedia.org/wiki/X86_calling_conventions

See: https://scc.ustc.edu.cn/zlsc/sugon/intel/compiler_c/main_cls/bldaps_cls/common/bldaps_calling_conv.htm

With cdecl, every time we call sum, we have an equivalent assembly expansion:

push 5			//pass an argument
push 4			//pass an argument
call sum		//enter function
add esp, 8		//clear arguments

Recursive functions run out of stack space if they go too deep because each call involves allocations on the stack and it has a finite size.

Pointers

Pointers confuse a lot of folks. What I think may be the source of the confusion is that a pointer is a single variable with two different parts to it, and indeed, a pointer is in fact its own data type. All pointers are of the same data type, pointer. They all have the same size in bytes, and all are treated the same by the computer.

The two parts of a pointer:

• The pointer variable itself.
• The variable pointed to by the pointer.

In C++, non-function pointers have the following prototype:

[data type]* [pointer variable name];

So we actually have a description of two different parts here.

• The Data Type

this is the type of the variable that is pointed to by the pointer.

• The Pointer Variable

this is a variable which points to another variable.

What a lot of folk may not realize, is that all Dynamic Memory ultimately routes back to some pointer on the Stack, and to some pointer period – if it doesn’t, that Dynamic Memory has been leaked and is lost forever.

Consider the following C++

int* pi = new int(5); 

With this statement, we put data into two different parts of memory; the first part is the Stack, it now has pi on it, the second, int(5), is on the heap. The value contained by the pi variable is the address of int(5). To work with a value that is pointed-to by a pointer, we need to tell the computer to go and fetch the value at the pointed-to address, and then we can manipulate it. This is called Dereferencing.

int i = 10;		//declare i on the stack
int* pi = &i;		//declare pi on the stack and set it's value to the
*pi = *pi + 7;		//dereference pi, add 7, assign to i
// i = 17

So in assembly, there is an instruction used to Dereference:

• lea [operand a ] [operand b]

lea a, value performs the following process:

• stores value in a
• such that

lea eax, [ eax + 1] will increment eax by 1

• and

lea eax, [ esp + 4 ] takes the value of esp, an address, adds 4, and stores it in eax

_asm {

//initialization
push 10			//a value, i
push 0			//a pointer, pi

lea eax, [esp + 4]	//store the address of i in eax
mov [esp], eax		//copy the address of i into pi

//operations
mov ebx, [esp] 		//dereference esp, a value on the stack, and get the value
//of pi - the address of i

//[esp+4] now == 17

add esp, 8		//clean the stack
}

So these examples pointed to memory on the stack, but the same process goes for memory on the heap as well, that memory is just allocated with a function rather than push.

Logic and Math

Probably the key aspect of programming is the use of if-statements. An if-statement can be a long line of code and obviously that means it turns into lots of assembly statements. What you may not realize though, is that a single C++ if-statement actually often turns into multiple assembly if-statements. So lets examine what an if-statement is in assembly.

Consider the following C++:

int x = 15;
if( x == 10 ) ++x; 

Can result in the following assembly:

_asm {

push 15 // x

cmp [esp], 10		//compare x with 10 and store a comparison flag.
//in this case, ne, jump if not-equal.
//addition. If x was 10, it would be executed.

add [esp], 1		//increment x by 1
untrue:
//proceed whether it was evaluated true or not
}

Ifstatements generate multiple jumps when there are logical operators in the statement. This results in an if-statement property called short-circuiting. Short Circuiting is a process in which only some of the if-statement may be executed. In the case where the first failure of a logical operation evaluates the if-statement as false, the subsequent logical operations are not evaluated at all.

An example of and logical operator:

int x = 10, y = 15;
if( x == 10 && y == 12) ++x; 
_asm {

push 10 // x
push 15 // y

//the first statement
cmp [esp+4], 10			//if the current logic operation evaluates
jne untrue			//the statement to false, then all other
//operations are skipped.
//in our case it was true so we continue to
//the second statement
cmp [esp], 12			//this comparison is false, so we jump
jne untrue

//only evaluated if both and operands were true
add [esp+4], 1			//increment x by 1
untrue:
//proceed whether it was evaluated true or not
}


In the above example, if x had been != 10, then y == 12 would not have been evaluated.

An example with the or logical operator.

int x = 10, y = 15;
if( x == 10 || y == 12) ++x; 
_asm {
push 10 // x
push 15 // y

cmp [esp+4], 10
je true				//if the first statement is true, no need to
//evaluate the second

cmp [esp], 12			//if the first statement evaluated false,
jne untrue			//check the next statement

true:
//evaluated if either or operand was true
add [esp+4], 1			//increment x by 1
untrue:
//proceed whether it was evaluated true or not
}


Logical order of operations is going to be left to right order, which ensures that sort circuiting works as expected.

Back to the top, we had the following equation:

r = x + x * y / z - z * x

This entire equation needs to be evaluated, and as you realize, this is a bunch of assembly steps. The order in which the steps execute is almost certainly not left to right, but is some sort of logical order-of-operations which is also optimized to be speedy and effecient.

The equation has four steps:

a = + x*y/z
b = + x
c = + z*x

result = a + b  - c

Again, the order is going to respect mathematical order of operations, and try to run efficiently on the computer. Here’s how it may turn out:

_asm {

//setup
//all of these variables are integers and the operations are integer
push 10	//x
push 15	//y
push 2		//z

//cache x its used 3 times: ebx = x
mov ebx, [esp+8]

//eax = x*y/ z
mov eax, ebx
mul [esp+4]
div [esp]

//ecx = eax + x
mov ecx, eax

//eax = ez*ex
mov eax, [esp]
mul ebx

//finally, subtract: perform ( ex*ey/ez + ex ) - (ez*ex) = ecx
sub ecx, eax

//result is in ecx

}

On a final note, if you want to get output from these examples into c++, here’s an easy way for Visual Studio:

int main(int argc, char**argv){

int result =0;

_asm{
mov eax, 5
mov dword ptr[result], eax
}

std::cout << result;
std::cin.get();

return 0;
}


That’s it for this post! In the future I’ll mostly post about C++ programming itself. I would love to hear your comments below! Good Luck!

## Introduction to Data Oriented Design

Object oriented programming is the typical go-to method of programming, and it is super useful, especially in a program’s “sculpting”/exploration or design phase. The problem with it, however, is that modern computer hardware has evolved and typical OOP practices have not. The main issues are that hardware is getting more parallelized and cache-centric. What this means is that programs benefit when they can fetch data from the catch rather than RAM, and when they can be parallelized. Typical, Object Oriented Programming, it turns out, does little to take advantage of these new hardware concepts, if indeed it does not oppose them.

Data Oriented Design is a design philosophy being popularized in video game programming, where there is a lot of opportunity for both parallelization and cache consideration. I think this concept can be useful to some extent everywhere, however, and a programmer should be aware of what it is and how to make use of it. Cache speed is enormous compared to RAM, and by default, there are absolutely no considerations about it being made by programmers. This should change. Unfortunately, a lot of popular languages, developed and evolved in the concept of infinite resources, a logical fallacy, offer little to no support for cache-centric programming (Java/.Net). Fortunately, in C++, we have future. Working with DoD is going to be beneficial until (if) computer hardware ever becomes unified and the “cache” becomes gigabytes in size. Until then, however, we must work with both RAM and Cache in our designs, if they are to be performant.

To start, lets look at typical OOP and what sort of performance characteristics it has.

First off, the simple Object and an Array of them:

class Object{
public:
int x, y, z;
float a, b, c;
int* p;
};
std::vector<Object> objects;


In this example, we have various variables composed into an object. There are several things going on here:

a) The Object is stored in a contiguous segment of memory, and when we go to reference a given variable, ( x, y, z, a, b, c, p ), the computer goes out to fetch that data, and if it is not already in the Cache, it gets it from Ram, and memory near to it, and puts it into the Cache. This is how the Cache fills up, recently referenced data and data near to it. Because the cache is limited in size, it periodically will flush segments to RAM or retrieve new segments from RAM. This read/write process is takes time. The key concept here is that it also fetches *nearby* data, anticipating that you may be planning to access it sequentially or that it may be related and soon used.

b) The variable, (p), is a pointer. When this variable is dereferenced, the cache is additionally filled by that data ( and near data) additionally. Fetching memory from a pointer may cause additional catch filling/flushing in addition to merely considering the pointer itself.

c) considering the vector: objects; objects is a contiguous segment of memory and accessing non-pointer data sequentially is ideal. When element x is referenced, so then there is a good chance that x+y and x-y elements were all stored into the cache simultaneously for speedy access. Because the vector is contiguous, all of the bytes of every object are stored in sequence. The amount of objects accessible in the Cache is going to depend on the size of the Object in bytes.

We will examine an example application comparing the performance of OOP and DOD.

To begin, lets create a more realistic OOP object to make a comparison with:

class OopObject {
public:
Vector3 position, velocity;
std::string name;

//initialize with some random values;
position = reals(random);
velocity = reals(random);

}

void act() {
//this is a function we call frequently on sequences of object
position += velocity;
}
};



OK, and now the unit test, using GoogleTest:

TEST(OopExample1, LoadAndAct) {

std::vector<OopObject> objects;

objects.resize(numObjects);

for (auto& o : objects) {
}

});

time("act", [&]() {

for (auto& o : objects) {
o.act();
}

});

float avg = 0;
time("avg", [&]() {

float avg1 = reals(random);

for (auto& o : objects) {
}

avg1 /= objects.size();

avg += avg1;
});
std::cout << avg;

EXPECT_TRUE(true);
}


We interact with this object in a typical OOP fashion, we create a vector, dimension it, loop through, initialize the elements, then we loop through again and do the “act” transform, and then we loop through again and do an “avg” calculation.

To start programming DoD, we need to consider the Cache rather than using it by happenstance. This is going to involve organizing data differently. In Object Orientation, all data is packed into the same area and therefore is always occupying cache, even if you only care about a specific variable and no others. If we are iterating over an array of objects and only accessing the same 20% of data the whole time, we are constantly filling the cache with 80% frivolous data. This leads to a new type of array and a typical DoD Object.

The first thing we do is determine what data is related and how it is often accessed or what we want to tune towards, then we group that data together in their own structures.

The next thing we do is create vectors of each of these structures. The DoDObject is in fact a sort of vector manager. Rather than having a large vector of all data, we have series of vectors with specific data. This increases the chances of related data being next to each other, while optimizing the amount that’s actually in the cache because there’s no space taken by frivolous data.

class DoDObject {
public:

struct Body {
Vector3 position, velocity;
};

struct Shape {

};

struct Other {
std::string name;
};

std::vector< Body > bodies;
std::vector<Shape > shapes;
std::vector<Other> others;

DoDObject(int size) {
bodies.resize(size);
shapes.resize(size);
others.resize(size);
}

for (auto& b : bodies) {
b.position = reals(random);
b.velocity = reals(random);
}

for (auto& s : shapes) {
}

int i = 0;
for (auto& o : others) {
}
}

void act() {
for (auto& b : bodies) {
b.position += b.velocity;
}
}
};



Lets look at the results, at the end will be the complete source file.

(AMD Ryzen 7 1700)

(AMD FX 4300)

(Intel(R) Core(TM) i7-4790K)

DOD has won on every test. This is a simple example program, too. Actual programs are going to see even larger gains.

Here’s the complete source of this example: (GoogleTest C++)

#include "pch.h"

#include <vector>
#include <chrono>
#include <string>
#include <random>

std::uniform_real_distribution<float> reals(-10.0f, 10.0f);
std::random_device random;

int numObjects = 100000;

struct Vector3 {
float x, y, z;

Vector3() = default;
Vector3(float v) : x(v), y(v + 1), z(v + 2) {}

Vector3& operator+=(const Vector3& other) {
x += other.x;
y += other.y;
z += other.z;
return *this;
}
};

class OopObject {
public:
Vector3 position, velocity;
std::string name;

//initialize with some random values;
position = reals(random);
velocity = reals(random);

}

void act() {
//this is a function we call frequently on sequences of object
position += velocity;
}
};

class DoDObject {
public:

struct Body {
Vector3 position, velocity;
};

struct Shape {

};

struct Other {
std::string name;
};

std::vector< Body > bodies;
std::vector<Shape > shapes;
std::vector<Other> others;

DoDObject(int size) {
bodies.resize(size);
shapes.resize(size);
others.resize(size);
}

for (auto& b : bodies) {
b.position = reals(random);
b.velocity = reals(random);
}

for (auto& s : shapes) {
}

int i = 0;
for (auto& o : others) {
}
}

void act() {
for (auto& b : bodies) {
b.position += b.velocity;
}
}
};

template<typename Action>
void time(const std::string& caption, Action action) {

std::chrono::microseconds acc(0);

//each action will be executed 100 times
for (int i = 0; i < 100; ++i) {

action();

acc += std::chrono::duration_cast<std::chrono::microseconds>(end - start);
}
std::cout << caption << ": " << (acc.count() / 100) << "\t";

};

DoDObject dod(numObjects);

});

time("act", [&]() {
dod.act();
});

float avg = 0;
time("avg", [&]() {

float avg1 = reals(random);

for (auto& o : dod.shapes) {
}

avg1 /= dod.shapes.size();

avg += avg1;

});
std::cout << avg;

EXPECT_TRUE(true);
}

std::vector<OopObject> objects;

objects.resize(numObjects);

for (auto& o : objects) {
}

});

time("act", [&]() {

for (auto& o : objects) {
o.act();
}

});

float avg = 0;
time("avg", [&]() {

float avg1 = reals(random);

for (auto& o : objects) {
}

avg1 /= objects.size();

avg += avg1;
});
std::cout << avg;

EXPECT_TRUE(true);
}


## Division By Zero Exploration pt 2

This is a continuation of an earlier post called, Division By Zero Exploration, which was a continuation of an even earlier post, Division By Zero.

In the exploration post, I concluded that, if dividing by zero is valid, then, “it is basically possible to make anything equal anything”. Further thought has lead to an additional conclusion: one does not necessarily equal one. This becomes evident from the basic properties of Q and 0, and specifically, infinity times zero equals one, or Q*0=1. If this holds true, then one does not necessarily equal one. Here we go.

What I mean is that: one may not equal one, all the time. This is because we can rewrite Q*0 = 1 an infinite amount of ways by adding additional zeros or Qs. ( At least, if 0*0 = 0 and Q*Q=Q). The way the expression is factored causes it to have a potential series of possible answers, some true, and others false.

There are two basic expansions, each which results in two different possible outcomes, for a potential of three answers. The first is Q*0*0. If it holds true that 0*0 always equals zero, then this should be a rational procedure. This results in two possible factorings: (Q*0)*0 and Q*(0*0). The first factoring equals 1, the second equals 0. The other basic expansion is Q*Q*0. This ends up resulting in either 1 or Q.

When we add in the x variable, Q*0*X, we get outcomes 1 or X.

Using this process, I go back to the previous example of a=b and start with a+b=b. From here, there are a lot of potential procedures, but the simplest path to a=b is two steps, I think.

$a + 1*b = b$

$a + Q*0*b = b$

Q*0*b resolves to 1 which resolves to Q*0*0

$a + Q*0*0 = b$

Q*0*0 resolves to 0

$a + 0 = b$

Of course, there are other outcomes from this procedure that are false. These are the other generated outcomes:

$a + b = b$

$a + 1 = b$

This leads to another conclusion, zero does not necessarily equal zero, nor does Q equal Q.

I think inventing math is my new hobby, comments are appreciated.

## Structured Bindings and replacing by reference

C++ 17 has Structured Bindings, what they do is create a temporary object and provide access to the object’s members locally.

struct S{
int a{5};
float b{3.2};
};

auto [x,y] = S();


What happens is we create a temporary S object, and then copy the values memberwise into the binding structure. x now equals 5 and b equals 3.2. the S instance goes away.

Since functions only can return one value normally, to get additional values out, we use by-reference variables.

void f( int& a, float& b ){
a = 10; b = 3.2f;
}


This leads to a programming style where we do two things:

1. create variables to hold the return values before the call
2. pass the variable to the function as a parameter.
int x=0;
float y=0;
f( x, y );
//x == 10; y = 3.2f;


It has never been glancingly apparent that the values are not only being passed by reference, which is often the case for performance, but they are also output from the function with changed values.

Structured Bindings provides a solution to this potential issue:

auto f(){
int a{10};
float b{3.2f};

return std::make_tuple(a,b);
}

auto [x,y] = f();


Since tuples can have as many parameters as we want, they are an easy way to accomplish creating the return structure for the binding.

In C++11 there is a less effective way to do return type Binding:

int x; float y;
std::tie( x, y ) = f();


Structured Bindings also works with arrays, and can be used to create variable aliases as well.

auto f(){
int a{10};
float b{3.2f};

return std::make_tuple(a,b);
}

auto f2( const int& i ){
return std::make_tuple(std::ref(i));
}

auto [x, y] = f();

auto [r] = f2(x);

++x;

x = r + 1;
//r is a reference to x, x = 12


This does not mean that by-reference should never be used – it should be where appropriate. What it means is that if a variable is just a return argument, and not also a function argument, then maybe it should be passed out via a Structured Binding and not passed in by reference.

## Asymptotic Movement

I discovered a GDC talk on a topic called: Asymptotic Averaging. This is sort of a form of interpolation, and can be used to eliminate jerky motions – in particular, they focus on camera movement. Naturally, when you want to change a camera’s direction, you just set the direction. However, if you are doing this frequently or if the distance is large, it results in a jerk or jump which can be disorienting to some extent. This is demonstrated easily by having a camera follow a player character.

Here’s a video of a jerky camera from my current project:

Asymptotic means always approaching a destination but never reaching it. In math it occurs as a curve approaches an asymptote but never actually reaches it. Besides being easy to implement, we get a curve to our movement as well.

The easy to implement concept is from the following form:

//changeMultiplier = 0.01f;

void translate( const Vector3& destination ){
offsetPosition = destination - camera->currentPosition;
}

void reduce(Vector3& position) {

position = offsetPosition;

position *= changeMultiplier;

offsetPosition *= (1 - changeMultiplier);
}

void update(){
Vector3 position;
reduce(position)
camera->translate( position )
}


In this variation we are working with a translation. What happens is offsetPosition is set to the translation offset (dest-current). Then each update (frame) we call reduce, which returns a smaller amount of the translation to perform each time. This translation continues to shrink and approaches zero.

Using variations of this, it is easy to produce Trailing and Leading Asymptotic Movement.

Here’s a vid of Trailing Camera Rotation + Trailing Camera Movement:

Here’s a vid of Leading Camera Rotation + Trailing Camera Movement:

## Fold Expressions and Lamda Overloads

C++17 has a powerful template feature – Fold Expressions – that combines with a C++11 feature – Parameter Pack. This allows us to do away with C-style variable arguments (variadic functions). I introduce Lamda Overloading as well, to work with these two concepts in a wonderful way.

First off, the Parameter Pack. This is a template feature that allows a variable amount of parameters to be passed to the template. Here’s a sum function declaration:

template<typename ...Args>
auto sum(Args... args);


In the template area, there are a variable amount of parameters, called Args. In the function area, the arguments accepted are the various parameters, Args. Each element in that pack of parameters will be referred to as, args, inside the function body.

template<typename ...Args>
auto sum(Args... args) {

return (args + ...);
}


A fold is contained within parenthesis. Sequentially, left most parameters to right most are included, each called args. Next, each specific parameter, each args, is combined with some operator, in this case, addition. Finally, ellipses signals repeat for each arg.

int i = sum(0,3,5);


Expands to the following, at compile-time:

return (0 + 3 + 5);


So what we’re going to do is write a more modern printf sort of function without using a c variadic function. There are a lot of ways to do this, but because I love lamdas, I want to include lamdas as well. Here we go:

So, in order to vary the generation of strings based on the type of the input parameter, at compile-time, we need to use additional templates. The first thing necessary was an ability to do function overloading with lamdas, hence the LamdaOverload class. This class basically incorporates a bunch of lamdas under one name and overloads the execution operator for each one.

template<typename... Bases>
struct LambdaOverload : public Bases... {
using Bases::operator()...;

LambdaOverload(const Bases&... bases) : Bases(bases)... { }
};

template<typename ...Args>
void new_printf(Args... args) {

[](const char* c) -> std::string { return c; },
[](const std::string& str)->const std::string& { return str; },
[](const auto& t) { return std::to_string(t); }
);
std::string o = (( deduce(args) + " " ) + ...);
OutputDebugStringA(o.c_str());
std::cout << o  << std::endl;
}


Next, inside the new_printf function, we define the deduce LamdaOverload object. This object is used to define routes from data types to code-calling. There are different types accepted, and one is a catch-all type which is going to be used for natives that work with the std::to_string function. Note that if the catch-all doesn’t work with std::to_string, an error will be generated.

 [](const MyClass& c) ->std::string { return c.toString(); }


The call ends up looking like so:

new_printf("the value of x: ", int(x), std::string("and"), 0.1f );


## Division By Zero Exploration

This post isn’t so much about programming as much about an earlier post about Division By Zero. So, I thought, if the reciprocal operation to 1 / 0, 0 / 1, is valid, then maybe division by zero can be postponed and considered later. Similar to the imaginary concept, i or sqrt(-1).

Using just fraction concepts I managed to accomplish a lot of progress and, not being a mathematician, I don’t know how much merit there is to it, but here we go.

First, the assumptions:

$0 * 1 = \frac{0}{1}* \frac{1}{1} = \frac{0}{1} = 0$

$0 / 1 = \frac{0}{1} * \frac{1}{1} = \frac{0}{1} = 0$

$1 / 0 = \frac{1}{1} * \frac{1}{0} = \frac{1}{0} = Q$

$Q * Q = \frac{1}{0} * \frac{1}{0} = \frac{1}{0} = Q$

$Q / Q = \frac{1}{0} * \frac{0}{1} = \frac{0}{0}$

$0 / 0 = \frac{0}{1} * \frac{1}{0} = 0 * Q$

$0 * Q = 1 \text{ : } Q = \frac{1}{0}$

$1 / Q = \frac{1}{1} * \frac{0}{1} = \frac{0}{1} = 0$

$Q * 1 = \frac{1}{0} * \frac{1}{1} = \frac{1}{0} = Q$

$Q * x = \frac{1}{0} * \frac{x}{1} = Q * x$

$Q / x = \frac{1}{0} * \frac{1}{x} = \frac{1}{0} = Q$

$1 / 0 = Q$

$0 / 0 = 1$

OK, so now something that I found immediately useful to do with Q and seems to make sense, furthermore this is what lead me to think up Q.

$y= \frac{1}{x} \text{ ; x = 0 } \\\\Q = \frac{1}{x}\\\\\frac{1}{Q} = x \\\\ 0 = x$

Consider the following:

$\frac{0}{x} = 0 * x$

True, 0 = 0; now considering the reciprocal:

$\frac{x}{0} = x * Q$

how does this resolve?

$x = x * Q * 0 \\\\x = x$

What this leads to is another assumption: algebraically, the same way Q can be considered for later, so then, can Zero be considered later.

Consider:

$a = b\\\\a^2 = ba\\\\a^2-b^2 = ba-b^2\\\\(a-b)(a+b)=b(a-b)\\\\\frac{(a-b)}{(a-b)}(a+b)=b\frac{(a-b)}{(a-b)}\\\\(a+b)=b\\\\(\frac{a}{a}+\frac{b}{a})a=b\\\\0*(\frac{a}{a}+\frac{b}{a})a=0*b$

distribute zero once and then multiply both sides by Q (divide by zero )

$Q*0*a=Q*0*b\\\\a = b$

Now – I realize – if I had distributed the zero to the a instead, the answer would have come out incorrect.

Once it is possible to divide by zero and turn a zero into a one, it seems like it’s possible to produce lots of wrong answers, but some correct ones as well, it seems like it depends on trial and error, perhaps, or additional processing of some sort.

So, I’ve been messing around with this concept using my limited math skills, and it seems like sometimes Q works, and sometimes, it doesn’t work. It’s like sometimes there’s a good solution and an infinity of wrong ones, or the opposite, or some mix of random, maybe. I just don’t know. At the end of the day, sometimes it works, so maybe there is merit.

What I’ve concluded is that with this concept, a consequence, is that it’s possible to basically make anything equal anything, and that seems like an odd thing to consider.

## Super Simple Lambda Thread Pool

I wanted to quickly create a thread pooler that could be passed lamdas. Turns out it’s super simple. Here are the main things I wanted to accomplish:

• pass lamdas as tasks to be executed asap.
• be able to wait for all tasks to execute
• create a bunch of threads at initialization and them have then wait around for tasks, since it can be time-costly to create a new thread on the fly.
• be able to create as many poolers as I want for different uses.

The first thing I needed to be able to do was store a bunch of lamdas in a single vector, and any sort of lamda. I decided the easiest way to do this was to use lamda members for the task parameters. This is what add task call could look like:

int x, y,z;

//task on x, y, z ...
});


So in a loop, I’d call, mThreadPool.addTask a bunch of times, and then I’d have to wait on them like so:

mThreadPool.wait();


To store the task lamdas inside the ThreadPool, I used an std::function:

typedef std::function<void()> Task;


This works because the task receives its parameters from the lamda itself, not from the lamda call operator. There has to be a mutex because multiple threads are going to query the mTasks vector.

I wanted to create the entire thread pool at one time, so there are initialize and shutdown functions:

std::vector<std::thread> mThreads;
std::atomic<bool> mWorking{ false };

void initialize() {

mWorking = true;
}
}
void stop(){
mWorking=false;
}
void shutdown() {
stop();
}
}


The threadWork function is where threads wait for tasks to perform and execute them as they become available:

std::atomic<bool> mWorking{ false };

while (mWorking) {

{

}
}

}
else
}
}


The number of tasks currently executing is kept track of with mTasksExecutingCount. When the ThreadPool is waiting, it waits until the count is zero and the mTasks.size is zero.

class ThreadPool {

std::atomic<bool> mWorking{ false };

while (mWorking) {

{

}
}

}
else
}
}

public:

void initialize() {

mWorking = true;
}
}

}
void wait() {
do {
{
}

} while (mWorking);
}
void stop() {
mWorking = false;
}
void shutdown() {
stop();
}
}
};



## Division By Zero

So we all know that dividing by zero is an error in programming, but do you know why?

It turns out that it not really an error as much as a math problem, and not so much a problem as much as a thing. Here’s how it goes according to my understanding:

• A function y= 1 * x has a domain of all real numbers
• A function y= 1 / x has a domain of x != 0, zero is an asymptote

If you look at the graph of the function, y = 1 / x, and you think about it, 1 / x approaches infinity as x moves toward zero, but it never actually reaches zero. Obviously we can conceive to write, y = 1 / 0, but as an actual operation, it doesn’t make sense as 0 is outside the domain of 1 / x. Why it is outside the domain of x, I’m not entirely sure, but it is, and that’s how it is. At the end of the day, it seems that x=0 and y= 1 / x each draw a separate, non-intersecting line.

y = 1 / x and x = 0

https://www.desmos.com/calculator/2bw3hvkpap

If the C++ compiler detects that you’ve done a/ 0, it will consider it undefined behavior, and make the entire thing a no-op. If it makes it a no-op, it may not tell you, and this could be a source of a bug. If you encounter a/0 during run-time, you may get some sort of crash or exception.

With the way the compiler can optimize, it can reorder some source operations to affect performance. It’s possible for the compiler to place a divide by zero error before a statement you need or would expect to be executed. You would experience undefined behavior ( x / 0 ) at a place that is not indicated via source code, and that’s bad.

Avoiding this behavior requires a check to detect division by zero before the function is attempted.

## C++ lambda and how the world is different now

So, there was the world of C++ before the lambda. And now there’s the world of C++ since the lambda – and I’ve got to say, if you aren’t using lamdas, then your’e probably still in the C++ stone age. This post is going to be all about lamdas and how rad they are. Do Java or .Net have anything as cool? Nope. C++ is where it’s at, and you need to be in the Know.

First off, C++ always had some sort of lamda ability, with functors, and indeed modern lamdas are just functor sugar. But, it’s really nice sugar and the style has gone way past mere ancient functors.

So this is an example functor:

class Functor {

int a{ 0 };
public:

int operator()() {
return ++a;
}
};

int main()
{
Functor f;

f();
f();
f();

std::cout << "value of f: " << f() << std::endl;
//writes 4
}


An object that has the call operator – can be perceived as a function. What may not be apparent though, is that it can have member variables and its own instance. The instance could be ephemeral or it could persist for a long time.

Functors ended up being needed to write a lot of algorithms in the STL, if not most of them, and they created a lot of slop.

int main()
{
std::vector<int> v{ 9,0,1,4,1,9,6,4,1 };

class MyCustomSorter{
public:
bool operator()(const int& a, const int& b) {
return a < b;
}
};

std::sort(v.begin(), v.end(), MyCustomSorter());
}


So now with lamdas we can do it like so:

 std::sort(v.begin(), v.end(), [](const int& a, const int& b)->bool {
return a < b;
});


What’s really cool about this line, is that we have actually declared a class-like object inline a function call, but with a lot less code. And what’s cool is that there are three different areas to pass arguments, the first area the example uses is the lambda parameters area, or the parameters to the call operator.

operator()(const int& a, const int& b) == (const int& a, const int& b)


The second area it uses is the return type

bool == -> bool


The third area is to pass in members of the lambda. Stuff that persists until the lamda goes out of scope. We could do the following:

std::vector<int> v{ 9,0,1,4,1,9,6,4,1 };

std::sort(v.begin(), v.end(), [&](const int& a, const int& b)->bool {
return a < b;
});


Amperstand, &, in this case, causes anything from the superscope to be passed by reference. So given that, we could have a lamda body as follows:

std::vector<int> v{ 9,0,1,4,1,9,6,4,1 };

ObjectA obja;

std::sort(v.begin(), v.end(), [&](const int& a, const int& b)->bool {

if (obja.method())
return false;

return a < b;
});


What happens in this case, is that the lamda is compiled with an ObjectA& member, which is set at the lamda creation and remains until it goes out of scope. The scope of these variables can sometimes be tricky as it is possible for a lamda to retain a reference to an out of scope variable (crash).

What we have been given the ability to do here is pretty incredible, we can define objects inline, anywhere, as lamdas. Here’s a threading example, with a lamda inside a lamda:

int result=-1;

bool working=true;
int i = 0;

auto doWork = [&]()->int {
while (working) {
if (++i > 10) working = false;
}
return i;
};

result = doWork();

});



So we can define functions inside of functions, which are basically private function methods, and that’s actually really useful. Really, this is an entirely new C++ paradigm. Consider the following as well:

	x++;
y++;
z++;
f1(x, y, z);

x++;
y++;
z++;
f1(x, y, z);
f2(x, y, z);

x++;
y++;
z++;
f1(x, y, z);


Could be improved like so:

	auto f[&]() {
x++;
y++;
z++;
f1(x, y, z);
}

f();

f();
f2(x, y, z);

f();


This is cool because on the fly, I can create a function, that is specific to my current function, will never be used outside of it – absolutely no reason to pollute the external namespace. Yet I can clean up code and make it clearer. Comment what a block of code is doing? How about put that specific block of code in a lamda, and suddenly everything is clearer – the block of code has a name that isn’t a mere comment ( and the external namespace is pristine)! And you get the cool benefit of being able to call it multiple times or comment its call out ( no reason to have huge swaths of commented-out code).

In the vein of private function methods, if-statements can sometimes gain a lot from them; consider the following:

for (auto& user : mUsers) {

auto isUserReady = [](auto& user)->bool {
//...//
return true;
};

//...//
}
}


Again, we’re saving the external namespace so much pollution! Generally when working like this, I like to declare the lamda about right before its going to be used, or near to it.

The only downside to this concept is potential code duplication – has this method already been written somewhere else? Could be hard to know all the time without getting a namespace exposure.

What’s become apparent is that we can pass lamdas as parameters to functions, this is hugely powerful as we can write our own functions that accept lamdas.

Want the user to be able to essentially modify your function but without overriding it? Lamda to the rescue.

	template<typename Action>
void clean(const Action& deleteAction) {
for (auto& item : mItems) {
deleteAction(item);
}
mItems.clear();
}


The entire STL tends to work like this.

In C++ 20, lamdas are going to be further improved. From my current project, here is an example of a templated-lamda from C++ 20 preview.

    auto createHlms = [&]<typename T>(T* &hlms) {
//...//
T::getDefaultPaths(mainPath, libraryPaths);
//...///
hlms = OGRE_NEW T(archive, &archives);
//...///
};

Ogre::HlmsUnlit* hlmsUnlit = nullptr;
createHlms(hlmsUnlit);

Ogre::HlmsPbs* hlmsPbs = nullptr;
createHlms(hlmsPbs);


## Staggering Vector

So in my current game project, I want to have a some what real-time simulation in which stuff happens. This boils down to doing stuff periodically in a game loop. In my game, the game loop is running at 30 frames per second. I was having problems maintaining this rate, and I did not want to multi-threading just yet. The solution turns out to be something popular in game programming, called time-slicing. The idea is that instead of doing all work at once, each frame, do it gradually over multiple frames. Right now I’m applying the concept to garbage collection, AI, and a problem I’m calling, “Unit Vision”.

In my game, I’ve got a bunch of Units which need to do stuff such as walk around, attack, and spot each other. Before I implemented time-slicing, part of the game loop looked something like this:

for( auto& unit : mUnits ){
unit.act();
}



In testing with like 2000+ units, this ended up taking more time than I’d like, so onto time-slicing.

The idea is pretty simple, split the vector into multiple segments and only iterate over one segment per frame. This sounded like “staggering” to me, so I wrote up what I called a “Stagger Group” or “Stagger Vector”

This is what the object’s data looks like:

template<typename T, typename Data = int>
class StaggerGroup {
public:
struct Group {
Data data{ 0 };
std::vector<T> group;
};
private:
std::vector< Group> mGroups;

std::size_t mGroupIndex{ 0 };


I found that a lot of times I wanted to associate something like a timer with a given vector, so I decided to go generic and stuck in a Data parameter with each vector. Then there are the actual set of groups, and an index to the active group.

The set of methods is pretty small a the moment:

	void setup(std::size_t numGroups, std::size_t poolSize = 500) {
mGroups.resize(numGroups);

for (auto& g : mGroups) {
g.group.reserve(poolSize);
}
}

void place(const T& value) {

std::vector<T>* selectedGroup = &mGroups[0].group;
for (auto& g : mGroups) {
if (g.group.size() < selectedGroup->size()) {
selectedGroup = &g.group;
}
}
selectedGroup->push_back(value);
}

template<typename Action>
void forEach(Action action) {
for (auto& g : mGroups) {
action(g);
}
}

Group* getNextGroup() {
if (mGroupIndex >= mGroups.size()) mGroupIndex = 0;
return &mGroups[mGroupIndex++];
}

std::size_t getSize() {
std::size_t size = 0;
for (const auto& g : mGroups) {
size += g.group.size();
}
return size;
}


So setup is called and a group of fixed size is created, place finds the lowest vector and places the new item in it. The real magic here is the getNextGroup method, here’s what it looks like in practice – the following code is called each game loop iteration:

		mUnitGroups.forEach([timeSinceLastFrameInMS](auto& group) {

group.data.elapsedTime += timeSinceLastFrameInMS;
});

auto selectedGroup = mUnitGroups.getNextGroup();

float elapsedTime = selectedGroup->data.elapsedTime;
selectedGroup->data.elapsedTime = 0.0f;

for (auto& unit : selectedGroup->group) {

unit->act(elapsedTime );
}


I also use Staggering for garbage collection like so:

		auto cleanGroup = mPendingCleanGroups.getNextGroup();
for (auto& item: cleanGroup->group) {
deleteAction(item);
}
cleanGroup->group.clear();


The garbage collection is actually done in another type of object, Recycling Vector, which I’m going to post about next.