Programming Basics with C++ explained in Assembly

Like I said earlier, this post is a sort of introduction to programming, however, it is not a typical, “Hello World” introduction. We’ll be using C++ and inline Assembly to investigate what exactly is going on in a program and C++. The following will be explained:

  • what a C++ program is and looks like to a computer
  • what variables are and how the stack and memory works
  • what functions are and how they work
  • how pointers work
  • how math and logic works, such as when you evaluate an equation

What a C++ program is and what it looks like to the computer

C++ and any programming language is just a bunch of words and syntax used to describe a process for the computer to perform.

You write a bunch of stuff, and the computer does exactly what you commanded it to. That said, while a program is a series of words and syntax to us, to the computer, it ends up being quite different. Computers only understand numbers, so somewhere between when a program is written and when it is executed by the computer, it gets interpreted from the programming language into numbers – Machine Code.

For example, the following program:

r = x + x * y / z - z * x

Looks like this in Machine Code:

Programming languages exist for a several reasons:

  • to provide an alternative to writing in Machine Code
  • to make writing maintainable and manageable code bases possible
  • to allow programmers to express themselves in different ways
  • to accomplish a task with efficiency given a language customized to that task

Finally, the basic C++ program that writes, “Hello World!”, to the console screen:

int main(){
 	std::cout << "Hello World!";
	return 0;

In Machine Code, it looks like this:

C++ is quite the improvement.

The language that translates closely to Machine Code is Assembly Language. While both Machine Code and Assembly Language are unpractical to work with, Assembly Language is actually something people do use, and we are going to use it to link what’s going on in the computer to what’s going on in C++.

Before we can move into Assembly, however, we need to review some basics of how programs work.


As we know, computers work with numbers, and a program is ultimately a bunch of numbers. The numbers are stored in the the computer in a place called memory, and lots of them, more numbers than you could count in your lifetime. It can be thought of as an array, grid or matrix where each cell contains a number and is a specific location in memory.

A visualization of memory.

These cells exist in sequence and that sequence describes its location. For instance, we could start counting memory at cell 1, and go to cell 7, and each of those locations could be referred to by their sequence number: 1 to 7.

The numbers in memory can represent one of two things:

  • a Machine Code or operand used to execute a command
  • some sort of Variable – an arbitrary value to be used by, or that has been saved by, a program

In programming languages such as C++, variables are referred to using names rather than their sequence number, and Machine Code is abstracted away with verbose language.

A snippet of C++ that declares three variables and prints them out.

The same snippet in Assembly; for glancing only.

The Stack: a type of Memory for Variables

Whenever a Variable is declared, it gets allocated in an area of Memory called the Stack. The Stack is a special type of Memory which is essential to how a C++ program executes. When a Variable enters scope, it gets allocated onto the Stack, and then later, when it exits scope, it’s removed from the Stack. The Stack grows in size as Variables enter scope and shrinks as Variables exit scope.

In C++, curly braces “{ “ and “ } “ are used to define a variable’s scope, also known as a block or epoch.

A C++ program with several blocks:

void main(){	//begin block a:

	//x is allocated, stack size goes to 1
	int x = 0;

	{	// begin block b: 
		// k, l and m are allocated, stack size goes to 4
		int  k = 2,  l = 3, m = 4;
		// an example operation
		x = k + l + m;

	}	// end of block b, stack size shrinks back to 1

	// - back to block a - y is allocated, stack size goes to 2
	int y = 0;

	{	// begin block c:
		// l and o are allocated, stack size goes to 4
		int l = 12, o = 2;

		// an example operation
		y = l * o;

	}	// end of block c, stack size shrinks back 2

	// - back to block a - z is allocated, stack size goes to 3
	int z = 0;

	// an example operation
	z = x - y;

	// write result, z, to the console
	std::cout << z << std::endl;	

}	// end of block a, stack shrinks to size 0

x86 Assembly Brief on the Stack

Assembly is really simple. Simple but lengthy and hard to work with because of its lengthiness. Lets jump right in.

A line of Assembly consists of a command and its operands. Operands are numbers of course, but these numbers can refer to a few different things: Constants ( plain numbers ), Memory locations (such as a reference to a Stack Variable), and Registers, which are a special type of Memory on the processor.

The Stack’s current location is in the esp register. The esp register starts at a high memory location, the top of the stack, such as 1000, and as it grows, the register is decremented, towards say, 900 and 500, and eventually to point where the stack can grow no further. General purpose registers include eax and are used to store whatever or are sometimes defaulted to with certain operations.

There are two commands that manipulate the stack implicitly:

  • push [operand]

    put the operand value onto the Stack, and decrement esp

  • pop [operand]

    remove a value from the Stack, store it in the operand register, and increment esp


  • esp

    the Stack’s top, moves up or down depending on pushes/pops, or plain adds/subtracts.

    • eax

      a general purpose register

    Given these two instructions we can create and destroy stack variables. For instance, similar to the program above but without the example equations:

    	push 0		//create x, with a value of 0
    	push 2		//create k
    	push 3		//create l
    	push 4		//create m
    	//do an equation
    	pop eax		//destroy m
    	pop eax		//destroy l
    	pop eax		//destroy k
    	push 0		//create y
    	push 12		//create l
    	push 2		//create o
    	//do an equation
    	pop eax		//destroy o
    	pop eax		//destroy l
    	push 0		//create z
    	//do an equation
    	pop eax		//destroy z
    	pop eax		//destroy y
    	pop eax		//destroy x

    Another way to manipulate the Stack/esp involves adding or subtracting from esp. There is an offset in bytes which is a multiple of the Stack alignment, on x86, this is 4 bytes.

    add esp, 12 	// reduce the stack by 12 bytes – same as 3 * pop
    sub esp, 4 	// increase the stack by 4 bytes – same as 1 * push

    The difference in usage is that add and sub do not copy any values around. With add, we trash the contents, and with sub, we get uninitialized space.

    Once Stack space has been allocated, it can be accessed through dereferencing esp. Dereferencing is done with brackets.

    [esp]  		// the top of the stack, the last push
    [esp + 4] 	// top + 4 bytes, the second to last push
    [esp + 8] 	// top + 8 bytes, the third to last push
    //[esp – x ] 	makes no sense

    Adding in the example equations requires some additional commands:

    • add [add to, store result], [ operand to add ]

      adds the operand to register

    • sub [subract from, store result], [ operand to subtract ]

      subtracts the operand from the register

    • mul [operand to multiply by eax, store in eax ]

      multiplies the operand by the eax register

    • mov [ copy to ], [ copy from ]

      copies a value from a Register to Memory or vice versa.

    	push 0			//create x, with a value of 0
    	push 2			//create k
    	push 3			//create l
    	push 4			//create m
    	//k + l + m;
    	mov eax, [esp + 8]	// copy k into eax
    	add eax, [esp + 4] 	// add l to eax
    	add eax, [esp]		// add m to eax
    	add esp, 12		//destroy k through m
    	//store the value in x
    	mov [esp], eax
    	push 0			//create y
    	push 12			//create l
    	push 2			//create o
    	// l * o
    	mov eax, [esp + 4]	 //copy l into eax
    	mul [esp]		 //multiply o by eax
    	add esp, 8		//destroy l and o
    	//store the value in y
    	mov [esp], eax 
    	push 0			//create z
    	//x - y;
    	mov eax, [esp + 8]
    	sub eax, [esp + 4]
    	mov [esp], eax		//write to z; z = -15
    	add esp, 12		//destroy z through x


    Functions are a way to associate several things:

    • A Stack block
    • Code
    • A useful name

    As we know, rather than writing out the same code over and over, to reuse it, we can use a function.

    But that is not the end of what functions are, they are also Named Code Segments, a segment of code with has a useful name, a name that both describes what the function does, and that be used to call that function in the language.

    In C++ functions have four properties:

    • return type
    • function name
    • parameters
    • function body

    And the following prototype:

    [return type] [name]([parameters]){[body]}

    In C++ a sum function would look like so:

    int sum( int x, int y ){
    	return x + y;

    In assembly to sum these two values we’d do the following:

    push 5 			// y = 5
    push 4 			// x = 4
    mov eax, [esp] 		//eax = x
    add eax, [esp+4]	//eax = 9
    add esp, 8		//clear stack

    You can see there are three stages here:

    • Setting up the function’s parameters using push
    • the actual function code
    • resetting the stack

    We are also missing three things, the function name, the return type, and actually calling the function. In assembly, a function name is the same as a C++ label:


    Now comes an interesting part – assembly – since we can do anything in it, is going to implement something called a: Calling Convention – how the function’s parameters are passed, how the stack is cleaned and where, and how the return value is handled.

    In C++ the default calling convention is called cdecl. In this calling convention, arguments are pushing onto the stack by the caller, and in right to left order. Integer return values are saved in eax, and the caller cleans up the stack.

    First off, to actually call a function we could use a command, jmp. Jmp is similar to the C++ goto statement. It sends the next instruction to be processed to a label. Here’s how we could implement a function using jmp:

    	jmp main
    sum:		//sum function body
    	mov eax, [esp+4]	//we have a default offset due to the return address
    	add eax, [esp+8]
    	jmp [esp]		//return to: sumreturn
    main:		//main program body
    	push 5			//we plan to call sum, so push arguments
    	push 4
    	push sumreturn		//push the return address
    	jmp sum			//call the function 
    	add esp, 12		//clean the stack
    	//result is still in eax

    To make this much easier, there are additional instructions:

    • call [operand]

      automatically pushes the return address onto the stack and then jumps to the function

    • ret [ optional operand ]

      automatically pops the return address and jumps to it

    These instructions wrap up some of the requirements of function calling – the push/pop of the return address, the jmp instructions, and the return label. With these additional instructions we’d have the following:

    	jmp main
    	mov eax, [esp+4]	//we have a default offset due the to return address
    	add eax, [esp+8]
    	push 5			//we plan to call sum, so push arguments
    	push 4
    	call sum		
    	add esp, 8		//clean the stack
    	//result is still in eax

    In other calling conventions, specifically x64 ones, most parameters are passed in registers instead of on the stack by default.

    See: https://en.wikipedia.org/wiki/X86_calling_conventions

    See: https://scc.ustc.edu.cn/zlsc/sugon/intel/compiler_c/main_cls/bldaps_cls/common/bldaps_calling_conv.htm

    With cdecl, every time we call sum, we have an equivalent assembly expansion:

    push 5			//pass an argument
    push 4			//pass an argument
    call sum		//enter function 
    add esp, 8		//clear arguments

    Recursive functions run out of stack space if they go too deep because each call involves allocations on the stack and it has a finite size.


    Pointers confuse a lot of folks. What I think may be the source of the confusion is that a pointer is a single variable with two different parts to it, and indeed, a pointer is in fact its own data type. All pointers are of the same data type, pointer. They all have the same size in bytes, and all are treated the same by the computer.

    The two parts of a pointer:

    • The pointer variable itself.
    • The variable pointed to by the pointer.

    In C++, non-function pointers have the following prototype:

    [data type]* [pointer variable name];

    So we actually have a description of two different parts here.

    • The Data Type

      this is the type of the variable that is pointed to by the pointer.

    • The Pointer Variable

      this is a variable which points to another variable.

    What a lot of folk may not realize, is that all Dynamic Memory ultimately routes back to some pointer on the Stack, and to some pointer period – if it doesn’t, that Dynamic Memory has been leaked and is lost forever.

    Consider the following C++

    int* pi = new int(5); 

    With this statement, we put data into two different parts of memory; the first part is the Stack, it now has pi on it, the second, int(5), is on the heap. The value contained by the pi variable is the address of int(5). To work with a value that is pointed-to by a pointer, we need to tell the computer to go and fetch the value at the pointed-to address, and then we can manipulate it. This is called Dereferencing.

    int i = 10;		//declare i on the stack
    int* pi = &i;		//declare pi on the stack and set it's value to the 								 
     			//address of i
    *pi = *pi + 7;		//dereference pi, add 7, assign to i							 
     // i = 17

    So in assembly, there is an instruction used to Dereference:

    • lea [operand a ] [operand b]

      Load Effective Address, copy b into a

    lea a, value performs the following process:

    • stores value in a
    • such that

      lea eax, [ eax + 1] will increment eax by 1

    • and

      lea eax, [ esp + 4 ] takes the value of esp, an address, adds 4, and stores it in eax

    _asm {
    	push 10			//a value, i
    	push 0			//a pointer, pi	
    	lea eax, [esp + 4]	//store the address of i in eax		
    	mov [esp], eax		//copy the address of i into pi
    	mov ebx, [esp] 		//dereference esp, a value on the stack, and get the value 					 
     				//of pi - the address of i 
    	add [ebx], 7		//dereference pi and add 7, i == 17
    	//[esp+4] now == 17
    	add esp, 8		//clean the stack

    So these examples pointed to memory on the stack, but the same process goes for memory on the heap as well, that memory is just allocated with a function rather than push.

    Logic and Math

    Probably the key aspect of programming is the use of if-statements. An if-statement can be a long line of code and obviously that means it turns into lots of assembly statements. What you may not realize though, is that a single C++ if-statement actually often turns into multiple assembly if-statements. So lets examine what an if-statement is in assembly.

    Consider the following C++:

    int x = 15;
    if( x == 10 ) ++x; 

    Can result in the following assembly:

    _asm {
    	push 15 // x
    	cmp [esp], 10		//compare x with 10 and store a comparison flag.
    	jne untrue		//jump to a label based on the comparison flag.
    				//in this case, ne, jump if not-equal.
    				//since 15 != 10, jump to untrue and skip the 						 
      				//addition. If x was 10, it would be executed.
    	add [esp], 1		//increment x by 1	
    	//proceed whether it was evaluated true or not
    	add esp, 4		//clean stack		

    Ifstatements generate multiple jumps when there are logical operators in the statement. This results in an if-statement property called short-circuiting. Short Circuiting is a process in which only some of the if-statement may be executed. In the case where the first failure of a logical operation evaluates the if-statement as false, the subsequent logical operations are not evaluated at all.

    An example of and logical operator:

    int x = 10, y = 15;
    if( x == 10 && y == 12) ++x; 
    _asm {
    	push 10 // x
    	push 15 // y
    	//the first statement
    	cmp [esp+4], 10			//if the current logic operation evaluates 					 
     	jne untrue			//the statement to false, then all other 							 
     					//operations are skipped.		
    					//in our case it was true so we continue to 						 		 
     	//the second statement
    	cmp [esp], 12			//this comparison is false, so we jump 
    	jne untrue				
     	//only evaluated if both and operands were true
    	add [esp+4], 1			//increment x by 1	
    	//proceed whether it was evaluated true or not
    	add esp, 8			//clear stack		

    In the above example, if x had been != 10, then y == 12 would not have been evaluated.

    An example with the or logical operator.

    int x = 10, y = 15;
    if( x == 10 || y == 12) ++x; 
    _asm {
    	push 10 // x
    	push 15 // y
    	cmp [esp+4], 10				 	
    	je true				//if the first statement is true, no need to								 
     					//evaluate the second	
    	cmp [esp], 12			//if the first statement evaluated false, 
    	jne untrue			//check the next statement
     	//evaluated if either or operand was true
    	add [esp+4], 1			//increment x by 1	
    	//proceed whether it was evaluated true or not
    	add esp, 8			//clear stack		

    Logical order of operations is going to be left to right order, which ensures that sort circuiting works as expected.

    Back to the top, we had the following equation:

    r = x + x * y / z - z * x

    This entire equation needs to be evaluated, and as you realize, this is a bunch of assembly steps. The order in which the steps execute is almost certainly not left to right, but is some sort of logical order-of-operations which is also optimized to be speedy and effecient.

    The equation has four steps:

    a = + x*y/z
    b = + x
    c = + z*x
    result = a + b  - c

    Again, the order is going to respect mathematical order of operations, and try to run efficiently on the computer. Here’s how it may turn out:

    _asm {
    	//all of these variables are integers and the operations are integer
    	push 10	//x
    	push 15	//y
    	push 2		//z
    	//cache x its used 3 times: ebx = x
    	mov ebx, [esp+8]
    	//eax = x*y/ z
    	mov eax, ebx
    	mul [esp+4]
    	div [esp]
    	//ecx = eax + x
    	add eax, ebx
    	mov ecx, eax
    	//eax = ez*ex
    	mov eax, [esp]
    	mul ebx
    	//finally, subtract: perform ( ex*ey/ez + ex ) - (ez*ex) = ecx
    	sub ecx, eax
    	//result is in ecx		
    	add esp, 12		//clear stack		

    On a final note, if you want to get output from these examples into c++, here’s an easy way for Visual Studio:

    int main(int argc, char**argv){
    	int result =0;
    		mov eax, 5
    		mov dword ptr[result], eax		
    	std::cout << result;
    	return 0;

    That’s it for this post! In the future I’ll mostly post about C++ programming itself. I would love to hear your comments below! Good Luck!

    Backpropagation is a class of random search algorithms

    This is evident when watching networks learn using my visualisation program. The backpropagation algorithm creates spontaneous types of patterns which are put into or scuplted into the network briefely or for some period, and then there is a random shift to a new concept with a new pattern for its own breifity. This process repeats until there is a clear learned functon, at which point the random search abrubtly ends. These series of processes generate animations that are explsions of movement in particular directions and angles, composed of various lines and shapes. They are common among the entire landscape of the network and working in apparent unison, therefore making their appearence and trendy behaviour obvious. I’m going to be posting more videos.

    Visualizing network training

    Here is a video of where my project is at,

    Basically, it provides a graphical visualization of the network weights. I am going to experiment with generating and examining this sort of image through training. In this video, the network is training on six problems, and it slowly learns for about 45 seconds. After that, its learning skyrockets for brief amount of seconds. What if I had decided to stop training before 45 seconds? Seems problematic.

    You can see in this image, the network is 11 pixels high, input and output rows and then 9 hidden layers. There are obvious horizontal and vertical lines and some occuring types of shapes, and lots of movement. The width of the hidden layers in the network are 100 pixels. The total image of the network is 11×100 pixels, and the network itself is 3x9x10x3 in shape.

    Intelligent sequences of numbers

    Since a large property of a neural network is indeed its initial or current random state, maybe we can consider this entire state as just a vector of randomly generated values. In reality, that’s what it is on the computer. Each of these values is generated via a random number generator in a specific sequence. A question that I want to examine is whether or not the nature of the random number generator effects a network’s capacity to learn. I’m going to write a series of random number generators and see if they affect learning capacity in some way. One generator I would like to make or find, which I have only just thought up, is a pi digit generator, where each new digit represents a random number. The idea is that maybe instead of always doing back propagation, and trial and error, maybe there is a way to generate intelligent network initial random state via specific random number initialization. I think, each digit of pi can be viewed as a random number since it is non-repeating, and given any series of values, and random, I think, if you don’t consider the digit where you started counting. At the same time, the initial or starting digit of pie is the random seed which can be use deterministically. Do random number generators affect network intelligence? Where does this go? We will see, maybe.

    Configurations of random variables

    In my neural network program, I refactored some code and produced an error which I did not notice for some time. When I would run the program, eventually, out of 100 networks, a single or few networks would learn the pathing problem. With no difference in how they are trained, they are all taught exactly the same way – and with a bug, too. All taught the same, but the results range from correct to odd, the differnence in training success is enormous. The only difference between these networks is their initial random configuration. The initial state of the network’s weights and biases, which are produced through some random number generation sequence.

    Ok, onto the bug in code. I had a function which for some reason changed through error into merely:

    LEARN_RATE = cos(trainNum)

    This causes the learn rate to vary by cos function from -1 to 1. When learn rate is above zero, a correction is made toward the correct solution, and when it is below zero, a correction is made away from the correct solution. With all networks trained exactly the same way, sometimes learning, sometimes unlearning, maybe 1% learn in a reasonable amount of time.

    What does this all mean, unlearning and learning, I don’t know! But I think it may show the profoundness of Lottery Ticket. It seems to boil down to “configurations of random variables” being an important part in learning capacity and ultimately problem solving capacity.

    Writing custom Random Number Generators and performance Comparisions

    For some reason, I became concerned about the amount of time being spent by random number generators, so I took a look into them. I wrote a program which compares the speed of std::mt19937 ( mersenne twister ), a custom “twister loop”, a custom “pcg”, and custom “xorshift” generators. The standard twister was the slowest, though I’m unsure if it matters since they’re all quite fast. The fastest turned out to be my “twister loop”, though, this is also the least random, and it’s barely the fastest.

    This code shows how to write a custom random number generator that can be used with std functions such as shuffle and distributions. Here are the results, followed by the code.

    It turns out that std::twister is pretty fast compared to the custom “faster” generators I discovered online. The fastest generator, twister_loop, is just an array filled with twister values being referenced rather than an actual generator.

    Here is the code:

    #include <random>
    #include <iostream>
    #include <chrono>
    class TwisterLoop{
    	using result_type = uint32_t;
    	static constexpr result_type(min)() { return 0; }
    	static constexpr result_type(max)() { return UINT32_MAX; }
    	std::mt19937 mTwister;
    	std::size_t mCurrentIdx{ 0 };
    	std::vector< result_type > mValues;
    	result_type generateOne() {
    		return  mTwister();
    	void refreshLoop() {
    		std::generate(mValues.begin(), mValues.end(), [&]() {
    			return generateOne();
    	TwisterLoop(result_type rdseed, std::size_t loopSize=1000){
    	void seed(result_type rdseed){
    	result_type operator()(){
    		if (mCurrentIdx >= mValues.size()) mCurrentIdx = 0;
    		return mValues[mCurrentIdx++];
    	void discard(unsigned long long n){
    		for (unsigned long long i = 0; i < n; ++i)
    class pcg{
    	using result_type = uint32_t;
    	static constexpr result_type(min)() { return 0; }
    	static constexpr result_type(max)() { return UINT32_MAX; }
    		: m_state(0x853c49e6748fea9bULL)
    		, m_inc(0xda3e39cb94b95bdbULL)
    	pcg(std::random_device& rd){
    	void seed(std::random_device& rd){
    		uint64_t s0 = uint64_t(rd()) << 31 | uint64_t(rd());
    		uint64_t s1 = uint64_t(rd()) << 31 | uint64_t(rd());
    		m_state = 0;
    		m_inc = (s1 << 1) | 1;
    		m_state += s0;
    	result_type operator()(){
    		uint64_t oldstate = m_state;
    		m_state = oldstate * 6364136223846793005ULL + m_inc;
    		uint32_t xorshifted = uint32_t(((oldstate >> 18u) ^ oldstate) >> 27u);
    		int rot = oldstate >> 59u;
    		return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
    	void discard(unsigned long long n){
    		for (unsigned long long i = 0; i < n; ++i)
    	uint64_t m_state;
    	uint64_t m_inc;
    class xorshift{
    	using result_type = uint32_t;
    	static constexpr result_type(min)() { return 0; }
    	static constexpr result_type(max)() { return UINT32_MAX; }
    	xorshift() : m_seed(0xc1f651c67c62c6e0ull) {}
    	xorshift(std::random_device& rd){
    	void seed(std::random_device& rd){
    		m_seed = uint64_t(rd()) << 31 | uint64_t(rd());
    	result_type operator()(){
    		uint64_t result = m_seed * 0xd989bcacc137dcd5ull;
    		m_seed ^= m_seed >> 11;
    		m_seed ^= m_seed << 31;
    		m_seed ^= m_seed >> 18;
    		return uint32_t(result >> 32ull);
    	void discard(unsigned long long n){
    		for (unsigned long long i = 0; i < n; ++i)
    	uint64_t m_seed;
    int main() {
    	std::random_device rd;
    	std::mt19937 twister(rd());
    	TwisterLoop twisterLoop(rd(), 10000);
    	pcg pcg1(rd);
    	xorshift shift(rd);
    	std::uniform_real_distribution<float> reals(-100.0f, 100.0f);
    	std::vector<float> results; results.resize(1000);
    	auto time = [&](auto& generator) -> std::size_t{
    		auto start = std::chrono::high_resolution_clock::now();
    		std::generate(results.begin(), results.end(), [&]() { return reals(generator); });
    		std::shuffle(results.begin(), results.end(), generator);
    		auto end = std::chrono::high_resolution_clock::now();
    		return std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
    	auto test = [&](const std::string& caption, auto& generator) {
    		std::size_t testCount = 100;
    		std::size_t avg = 0;
    		std::cout << "testing: " << caption << " " << testCount << " times" << std::endl;
    		for (std::size_t i = 0; i < testCount; ++i) {
    			avg +=  time(generator);
    		avg /= testCount;
    		std::cout << "took: " << avg << " nanoseconds" 
    			<< std::endl << std::endl;
    	test("std::twister", twister );
    	test("twister_loop", twisterLoop);
    	test("pcg", pcg1);
    	test("shift", shift);
    	std::cout << "\nprogram finished, press enter";
    	return 0;

    The pcg and xorshift generators are credited to /* Copyright (c) 2018 Arvid Gerstmann. https://gist.github.com/Leandros */

    Dumb Neural Networks

    Lottery Ticket Theory says that some networks do not train well if at all, based on random properties of the network. In my current project, I have 100 agents, each with their own neural network, same shape. Each of these networks trains exactly the same way, but some end up acting smart and others dumb. This is a pool of just 100 networks.

    In this experiment, the agents are trying to path to a given point on the terrain, outlined with a large yellow circle. As I move this circle, agents should change their direction toward the new position. What I do is I select a point, then enter the program into training mode, where all of the agents are training using exactly the same backpropagation ( but not genetics, ATM). Then, I turn off training and see how they behave.

    As you can see, a small amount of the networks have failed to learn very well.

    Neural Networks and Constants

    In my research, it is possible to train a network and have it learn about a constant such as PI, and put it to use in its function. However, if we pass PI in as an input, rather than having to ‘teach’ PI, then training is worlds faster. The network merely learns to use PI rather than derive it and use it simultaneously (though this is possible). It seems like if there are any sort of constants which could be useful to the function, then they should be passed in. Maybe this can be extended to full equations. Rather than the network learning to derive a given equation or function, those definitions can be passed into the network along with constants and other inputs instead. In this way, networks can be ‘educated’.

    Brute Force Collision Detection with Data Oriented Design and modern C++.

    I have started rewriting my game engine, and I’ve just written a de-collision algorithm for objects. It runs purely through brute-force, using no space-partitioning. In my opinion, this is a good demonstration of what’s possible through DoD and modern C++. The code for the decollision algorithm will be presented at the end, but here is the associated video: ( The black boxes at the begining are lighting issues which resolve shortly)

    When the video starts, the 5000 objects are alll intersecting multiple other objects, and they slowly vibrate outward until settling and causing no further collisions.

    Normally in a game, you want things to collide as little as possible, using pathing and collision-avoidance methods. When things colliide and intersect, you must undo this and that tends to propagate a wave of decollision through crowds, which is observable in the video. This is the worst-case scenario. In the past, this problem is mitigated using some sort of space-partitioning method to reduce the amount of collision comparisons in addition to avoidance methods. In this example, however, there is no such space partitioning and it is entirely brute-forced using mutlithreading and some Data Oriented concepts.There are further improvements that could be made still to this simple method. The de-collision routine works in real time mostly, and the performance problems in the video have more to do with the 5k lights.

    Using a collision-avoidance algorithm of some sort is far more performant than using a dumb decollision algorithm like the one used in this video. That is easy to do, too, since the objects ultimately will be managed by AI and other systems, they should only rarely collide if ever.

    The code in the decollision routine is very simple and compares all 5000 objects to all other 5000 objects and modifies their position each time a collision is detected. It is implemented fairly simply using DOD and modern c++. I am going to present this code as a snippit to go with the video but will not describe it much as it is still being developed. The main things that you may notice are the parallel for loops, and in them, looping over *iota (misspelled in code atm) indexes which is a DOD design I’m trying out. I am going to try using a mutex rather than an atomic since atomic seems to be problematic in this algorithm.

    void RtsObjectManager::decollide() {
    	std::vector< AtomicWrapper<Ogre::Vector2> > collisions; collisions.resize(mObjects.size());
    	std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& ai) {
    		auto& p = mObjects[ai].mPosition;
    		collisions[ai]._a = { p.x, p.z };
    	std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& ai) {
    		auto& a = mObjects[ai];
    		Ogre::Vector2 ap = collisions[ai]._a.load();
    		Ogre::Real ar = a.mTemplate->m2DRadius;
    		std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& bi) {
    			if (ai == bi) return;
    			auto& b = mObjects[bi];
    			Ogre::Vector2 bp = collisions[bi]._a.load();
    			Ogre::Real br = b.mTemplate->m2DRadius;
    			Ogre::Real dist = ap.distance(bp);
    			if (dist < ar + br) {
    				//push each away 
    				Ogre::Vector2 dir;
    				if (dist == 0) {
    					dir = { 1.0, 0 };
    					dist = 1.0;
    					dir = (ap - bp);
    				dir *= dist;
    				ap.x += dir.x; ap.y += dir.y;
    				collisions[ai]._a = ap;
    				bp.x -= dir.x; bp.y -= dir.y;
    				collisions[bi]._a = bp;
    		std::for_each(std::execution::par_unseq, mAtoiIndexes.begin(), mAtoiIndexes.end(), [&](auto& ai) {
    			auto& a = mObjects[ai];
    			Ogre::Vector2 p = collisions[ai]._a.load() ;
    			a.mPosition.x = p.x;
    			a.mPosition.z = p.y;

    C++ Binary Serializer

    Binary Serialization can be sort of scary, maybe, to some people. Fortunately, it is super simple, straight forward, and even elegant, in C++. Binary data is the raw representation of information on the computer, and there are easy ways to work with it in C++. The benefits of using Binary are that it can save a lot of space compared to ascii, which will give speed benefits on the computer and accross the network.

    Here is an example of ASCII data:

    Nick is nearly 40. PI is 3.14159265359

    There are two types of data here, text and number. In binary, Text looks like Text. Numbers, however, look quite different.

    PI for instance, in binary, looks like the following:

    ê.DTû!	@

    This is a double and represented by eight bytes or characters. As a float, we get four bytes or characters.


    This is considered a non-human readable format and common editors would be hex editors. In a hex editor, the float looks like so:

    DB 0F 49 40

    OK, onto C++.

    There are two simple says to work with Binary in C++. With streams and with strings. To start, we will examine an fstream object.

    We open the fstream using the binary and write flags:

    class BinaryWriter {
    	std::ofstream fout;
    	BinaryWriter(const std::string& fileName) {
    		fout.open(fileName, std::ofstream::binary | std::ofstream::out);

    Next, there is a straight forward templated method for primitives and simple objects:

    	template<typename T>
    	BinaryWriter& operator<<(const T& data) {
    		std::size_t size = sizeof(data);
    		for (int i = 0; i < size; ++i) {
    			fout.put(reinterpret_cast<const char*>(&data)[i]);
    		return *this;

    we can overload this method for more complicated or special data types:

    	BinaryWriter& operator<<(const std::string& data) {
    		fout.write(data.data(), data.size() );
    		return *this;

    Using these methods is simple and could look like so:

    		BinaryWriter bw("out.bin");
    		float pi = 3.14159265359;
    		bw << pi;
    		std::string str("nick");
    		bw << str;

    The generated binary file looks like so:


    Normally, when working with ascii files, content tends to be white-space delimited, or null-terminated. In binary, this is not the case, we can have white spaces and null characters anywhere. This means that all fields must be constant in size, the size of the field must be detailed somewhere in the file, or in a file-format.

    To read the binary file, we have a straight forward solution:

    class BinaryReader {
    	std::ifstream fin;
    	BinaryReader(const std::string& fileName) {
    		fin.open(fileName, std::ifstream::binary | std::ifstream::in );
    	~BinaryReader() {
    	bool fail() {
    		return fin.fail();
    	template<typename T>
    	BinaryReader& operator>>( T& data) {
    		std::size_t size = sizeof(data);
    		for (int i = 0; i < size; ++i) {
    		return *this;
    	BinaryReader& operator>>(std::string& data) {
    		fin.read(data.data(), data.size());
    		return *this;

    Keeping in mind that in our case the string field is a constant size which is not defined in the file, here is the reading code:

    		BinaryReader br("out.bin");
    		float pi;
    		br >> pi;
    		std::string str; str.resize(4);
    		br >> str;

    This has been an example of working with binary using fstream. Many folks do not realize that you can also store binary data in std::string. std::string is not any sort of c-string, it can store any sort of characters. Unless you call the c_str method, you do not need to work with a c-string.

    Here is an example of using std::string with c-str vs binary:

    		std::string stra = "\0\0\0b\0e"; //this is a c-str assignment, does not work
    		std::string strb = { '\0','\0','\0','b', '\0', 'e' }; //binary-ok initialization
    		std::cout << strb; //works with cout, writes: be, all the data is present in the string/array

    As long as the string.size() is correct, all binary data in the string will be preserved, and can be worked with considering it contains binary data and like any other array. It is possible to use std::string to store simple BLOBS, or any sort of binary, for use over the network or in any other way.

    Lottery Ticket Hypothesis in action

    I have written an application which creates pools of 1000 neural networks. One test performs backpropagation training on them. A second test performs backpropagation, and a genetic algorithm. The amount of times training is called for each test is the same. The genetic algorithm seems to actually be able to converge on a lottery ticket and seems to always outperform backpropagation training alone.

    Here is example output which will be explained:

    In this example there is a total network poolsize of 1000 random networks. This network pool size is used for both the tests. To compare networks, their radius is examined, and the smaller it is, the better.

    In the first test, backpropagation, the best network converges pretty well, to 0.0022~. The wost network, however, converges to 0.031~, a much worse solution. The amount of training called on the networks is constant, and yet there is a vast difference in performance based on Lottery Ticket, in a pool of just 1000 networks.

    Lottery Ticket means that more ideal networks can be found and distributed periodically. The thing is, backpropagation, in its most simple form, is not a Lottery Ticket Algorithm. Genetic Algorithms are indeed a Lottery Ticket Algorithm, and it seems that backpropagation can be put into them naturally. This causes networks to evolve and change in ways that backpropagation does not even consider, for example: neuron biases. It also makes backpropagation consider itself, as it is performed socially among a pool of existent networks.

    The second test is genetic. It starts from scratch with 1000 new random networks, runs for 400 generations, and calls training (backpropagation) the same amount of times as the earlier test. On generation 65, a Lotto is won for a particular network, and it ends up outperforming all others found in either tests. The amount of trainings on the winning network is the sum of its parents and the remaining trainings it did. That winning network, went into all future generations along with various other algorithm and mutations of networks. I guess I am going to consider this a Lottery Network.

    What we need to do is somehow, find these networks that exist, maybe, or could, but are hard to find, and the search could never end. I think to do both backpropagation and genetic algorithms and other functions such as pruning, growth, or other mutation. What else can be used? It seems like there are likely concepts in math or other sciences which could contribute. Recently, in creating mutation functions, I applied a normal distribution function and am performing math on normal distributions, which is new to me, or what else could be available or how this should evolve is unknown.

    the basic normal distribution function I’ve made looks like the following on desmos:


    written in C++ like:

    std::normal_distribution<long double> normal_dist(0.22);
    auto normal_mostlyZero_seldomOne = [&]()->long double {
    		return std::abs((normal_dist(random)/2.0));

    It is used to ‘decide’ when to mutate something and by what magnitude. Most of the time this magnitude is small and otherwise approaches one. It seems like evolutionary algorithms could be benefited with the study of probability maths or other maths and I may do this at some point.

    Next is to add methods of pruning and growth to networks to further investigate Lottery Ticket.