I have been studying neural networks for some time, and recently during a YSU hackthon, I managed to make interesting progress. After about a year long break, I return to this code and make large amounts of progress and a number of topics have presented in C++ software.
I’m going to describe some of my journey into AI in C++ and talk about AI in blog format from the persepective of a modern C++ developer.
Neural networks are in concept similar to brains. What they are is a pattern processing algorithm. In typical neural networks, there are layers of patterns, each modified by the previous in sequence. When this sequence goes forward, we do something called Feed-forward, which is taking a pattern, running it through the network, and producing a pattern-deduced result. So, a network could take input images/patterns of dogs, and output whether it is a dog or not. Feed Forward can be reversed in a way through something called Back Propagation. During Back Propagation, Calculus is used to adjust the pattern’s error throughout the network. The result is training to recognize the pattern, making Feed Forward less error-prone.
This, at least, was my understanding after completing the hackthon code. What I had done during the hackthon was reference a blog-paper which describes the calculus of backpropagation and produce a seemingly functional algorithm in C++.
These sequenced Neural Networks are easily represented as series of vectors and matrices, and Feed Forward is as simple as performing maths across the series to its end, the output. Feed-forward can produce accurate results and the effect of pattern recognition. A thought is the vector result of Feed Forward for any arbitrary input. The thought about each training data represents the network’s accuracy about the training data, and is said to converge to some accuracy. Matrices may not be the best way to represent this data in light of modern Data-Oriented concepts.
In my study of Backpropagation, it seems like the Backpropagation function produces a magnitude of error. That magnitude then is used to adjust existing error. Adjusting by this magnitude does not train the network in a single iteration, (for reasons that will be explained), therefore Backpropagation is done iteratively.
One reason it is done iteratively, has to do with the initial solution surface of the network, the initial configuration. The initial configuration of the network can have beneficial or detrimental effects on how the network converges if at all. This seems problematic and so we could move the problem from training networks toward finding solution surfaces for initial or final configurations of networks. In searching vast random solution space there is an immediate ally, genetic algorithms. Backpropagation and Genetic Algorithms combine to make an error direction and ability to move across solution surfaces efficiently with averaging and bounded random movement. This sort of algorithm also scales by increasing parallel hardware, which is ideal. The question is whether there is a better way to search for ideal networks. Out of a potential infinity of vast solution space, what is the most efficient way to converge on an ideal network? How long should it take? What is the ideal network, or how could it emerge?
This leads to something called: Lottery Ticket Hypothesis. The idea goes, for a given network, there is a smaller network that converges the same or similarly. Finding this network, that is winning the lottery. The question is how to find it, and in the end: it could be easy to find, stumbled upon eventually, or entirely impossible. In the algorithm for this theory, a network is trained using backpropagation. Next, it is pruned by the least contributing factors and then trained more, in a repeated fashion. I think what must happening, is that when a least contributing factor is eliminated, existing factors must take up slack or the network will not converge.
The existence of a converging network or its ability to take up slack, depends on the numbers that can be found within it. Future computers are going to need ever better and more accurate floating-point number representation, in order to make finding these networks more likely. It is entirely possible that the existence of networks has to do about their ultimate hardware representation and they may not be very portable.
Comments, Concerns, feel free to post replies