The smallest xor network and lottery ticket

I am creating networks that are: two input nodes, x*y hidden nodes, and one output node. The ability to create these networks has to do with some odds which are based on the initial state of the starting network pool. I have a pool of 2000 networks which are randomly generated, and I try to train them to learn xor in real time. I train few or zero xor networks in real time. After this, I train on some other function for some period of real time. After returning to xor, xor trains in real time. At first, xor is learned by some small number, say one to five percent of networks out of 2000. After this, I return to some other problem, a math problem, a geometry problem, some arbitrary function, and then return to xor, and then, there are more trained xor networks trained in real time. This process continues and more and more xor networks are trained. This number of xor networks is far greater then the number of networks found only by training xor alone, which could be zero.

Real time, in this case, I create 3×3 xor networks in a pool of 2000, in fewer than 10k training of specific xor function, and they continue to be created at a rate greater than just xor training.

This leads to a process I call network formatting, which is the concept of starting out with an existant network data to train some arbirary other type of network. I want to create small xor networks which have had their solution space expanded dramatically beyond xor. Then, in some way, use these existant networks to do other things instead, that is the idea.

So one thing I have learned is that there there is a whole dimension of output in just one class of output. All numbers we can represent in that space on computers, could be entirely different distinct outputs. One thing I do with these networks are math functions, with two input nodes, x*y hidden, and one output, I can create the following: f(x, y) = z. Using this format I can perform math, memorization, constant passing and manipulating, geometry and line functions, all using these small networks intended for xor or something else eventually.

Another thing I’ve learned is that a remarkable amount of networks are convergent on many problems starting from their completely random state. As the difficulty of the initial problem increases, the number found goes down. That is, I can process a signal such that, on many problems, completely randomly generated networks, have convergence indicated through this signal.

Another thing I’ve found is that I had been using smaller radius to measure a networks “convergence”. It turns out, a network with a larger radius can beat a smaller radius network, potentially, and I’m calling this measurement the super radius.

Share this:

Related

Leave a comment Cancel reply