So, in my earlier post, I said I had a “convergent mnist network”. At the time I was excited and I wrote that in haste. What that network had been doing, it had been trained on null and digits, but only one image for each of these digit was actually ever trained into the network. This network could recognize more than ten images, around 60, out of a total test dataset of 1000.
Since then, I have been trying to train on as much and types of data as possible, and I would not say that I have a convergent mnist network, yet.
What I have produced and learned is about tension inside the network. I can see why there is a ml library called “Tensorflow”, though I have not studied it, yet.
For each class of data the network recognizes or thinks about, there is a tension which spreads across the network. So, in my network of 11 classes, there are eleven of these tensors which I can measure. As the network is trained in some way, these tensors change in relation to each other. Some of the tensors have a sharp pull and are highly convergent, accurate to 100% of test data. Other tensors, may not be convergent at all, and are a source of confusion and change in the network’s space. Many tensors can be highly convergent while others are not convergent at all.
The way I am sort of viewing the network tensors right now, is that each highly convergent tensor is a sort of “black hole” in the network’s space. The relationships in the space between all of these objects affects the network’s capacity to learn and I think could even prevent learning.
One thing I am experimenting with is layer width and layer depth. Mainly I am engineering algorithm to make convergence more likely for any given network. I am working on the theory that great engineering skill can compensate for lesser science and mathematical skills. I hope this turns out true in the future. And I do think there is cutting edge math and science being done that I can’t really expect myself to follow very effectively or find access to.
I am still not on GPU because I feel there is much more progress to be made in cpu/c++ as far as technology is concerned. However, my test times on CPU make me think training anything larger than mnist will require bigger compute ability.
I have imaged a network which is highly convergent though not on all digits, and the image seems to be dramatically orders of magnitude more complex than the images seen before. I have no video yet though there could be one soon.